Skip to main content

Tiered Storage Assumptions

💡 Facts

When using Tiered Storage, we assume:

  • 💡 pretty low local retention time - 60 minutes.

We don't see a good reason to have it set to something high. Under our other assumptions, most traffic is reading from the tail of the log (last few minute max, usually seconds). It should probably be set even lower.

Note that in practice this can translate to a higher local retention time, due to the segment size and partition count assumptions.

The low local disk usage gives us the luxury of being able to deploy SSDs in a cost-effective way. In these cases, it never makes sense to run with HDDs unless you really want to push the savings out to the very last cent, at the expense of performance. Even then, the price difference will be relatively insignificant. I calculated it as around 1-2% in GCP, less-so in AWS. The only case where it has a significant difference is Azure, because network costs do not dominate there (read: do not exist).

Which brings me to:

  • 💡 we use SSDs when Tiered Storage is enabled

This gives us much better performance and a lot of extra (and better guaranteed) IOPS, so we don't have to worry about performance as far as the hardware is concerned at least.

  • 💡 no compacted topics are assumed in our calculations

Kafka's open source Tiered Storage implementation doesn't yet support compacted topics and there don't seem to be any public plans for it as of December 2024.

If you need to add such topics, you would need to provision extra storage. Depending on the amount of storage, you may need to decide to switch back to HDDs.

  • 💡 we assume a default segment size.

Because Kafka can delete a segment only once it's closed - this can result in larger-than-expected disk usage. With the default settings, a segment will close only after 1 GiB of data is reached, which assuming 1000 partitions and 1024 MiB/s - would take around 17 minutes. See the Partition section for more detail.

They were the only ones kind enough to contribute such a plugin to the community! It supports all three major cloud providers, is battle-hardened in their (and others') production environment and to our knowledge - no alternative exists.

PS: You could tweak its configs to reduce the PUT request costs that the cloud provider will charge you by 10x.


🤗 Comments Welcome!

Leave feedback, report bugs or just complain at: