IOPS
Write/Read IOPS are all assumed as 128 KiB per IO.
This is extremely conservative, because:
- Kafka is designed to read and write linearly from disk, which allows the OS to use large IOs sizes as it batches up data in pagecache before writing onto disk.
- AWS EBS has its own feature where HDD IO operations get coalesced (batched) aggressively into 1 MiB IOs. especially in AWS where HDD IOs get coalesced aggressively into 1 MiB IOs. There are more HDD Perf assumptions here - see Notes below.
When initially implementing Tiered Storage into the calculator, I didn't consider the IOPS limits at all. This used to work because without TS you'd need to deploy large disks to handle the hardcoded 7-day retention storage anyway, so IOPS was always sufficient.
This completely broke the naive implementation that just tried to capture the storage.
Without accounting for IOPS, the calculator would give me some outrageous deployments like a 432GB st1 EBS supporting 240MB/s of disk writes.
Of course, the right solution completely changed the deployment and raised the price
It's precisely the Tiered Storage cases where the local retention is very low that you need to ensure you have ample IOPS, since you can get away with small disks capacity-wise and those don't give you too much IOPS in the cloud. This isn't a concern today because as mentioned there - we just deploy SSDs when using tiered storage and they always have ample IOPS. Combine that with the conservative 128 KiB IO size and you can rest assured the IOPS situation is well handled.
Why Do We Need IOPS?
If you run out of IOPS, the cloud disks will start throttling you and your writes will grow significantly in latency. This can result in the partitions' follower replicas to not be able to write to disk in time, therefore falling out of sync. Kafka yet doesn't have a way to prioritize replication traffic versus produce traffic (even though it ought to).
Your cluster will then have under-replicated partitions and some producers may be unable to write due to the min in-sync replicas setting.
Not to mention your latency will shoot through the roof.
When Does Kafka Use a lot of IOPS?
My understanding is that Kafka uses a lot of IOPS when there are many partitions that are being written to concurrently.
To that point, the calculator already assumes a fixed set - just 1000 partitions.
Future Work
The calculator doesn't yet deploy with an IOPS buffer.
It indirectly has a buffer because it undercounts IO sizes (set to the small 128 KiB as mentioned), but ideally it'd try to reach e.g a target 50% IOPS utilization rate.
Today it indirectly covers that due to that way it works - Tiered Storage deployments provision SSDs with way too much IOPS, non-tiered deployments indirectly handle IOPS by provisioning large HDDs that have ample IOPS as well.
In any case, the calculator doesn't allow you to go below the IOPS requirements. It just doesn't provision extra. By virtue of the IO size being 2-4x smaller than what it ought to be - that should work out well.