Produce/Consumer per vCPU ποΈ
- 1 vCPU allows a broker to do 5 MiB/s Produce and 15 MiB/s Consume (at the same time)
AKalculator assumes an overlapping supported value of 5 MiB/s produce & 15 MiB/s consume for one vCPU.
In other words, we assume that 1 vCPU supports 5 MiB/s producer clients throughput and 15 MiB/s consumer clients throughput at the same time.
It could probably support more, especially if it's only one type of throughput that's high (e.g 1 MiB/s produce & 40 consume), but it can also support less - e.g if the clients are not optimized at all.
We find that this is a conservative estimate that's a good enough assumption and is in line with industry standards.
There is a high chance this is over-estimating the consume requirement, so we may end up over-provisioning based on cpu count if the fanout ratio is above 3. We should probably change this.
There are definitely cases where it feels like AKalculator over-deploys some instances based on this. But we figure the conservative approach is better than showing unrealistic results, and instance cost is not a significant percentage of total cost unless the deployment is hyper-optimized (Tiered Storage on Azure where network is free).
What affects vCPU usageβ
CPU attribution is impossible to get right. It's a multi-dimensional problem that depends on:
-
client batching & behavior (how often do they refresh metadata, reconnect, etc.)
-
how many clients there are
-
cluster size. the more brokers, the more inter-broker connections and replication overhead (you'd be surprised of the impact)
-
partition count (more overhead in replication but also controllers)
-
other unexpected things like consumer offset commits, SSL handshakes depending on Java version, etc.
-
the underlying CPU (see paragraph π).
1 vCPU != 1 vCPUβ
The main gotcha behind the vCPU abstraction is that it's incredibly inaccurate in representing performance.
- TurboBoost a feature where the CPU dynamically increases its clock speed (perf per core) when itβs under its power budget (the other cores arenβt used)
-
cpu architecture - arm vs x86
-
cpu micro architecture / generation (broadwell, skylake, alder lake, etc.)
-
different cloud providers
-
each cloud provider uses different hypervisor software with different configuration β diff. outcome/overhead
- certain hypervisor tasks - machine management, logging, perf data collection β can take resources from you
- temperature
- if CPUs get too hot, they throttle their perf to not overheat (e.g ACs failing in the data center)
- other reasons we're missing here
See - https://browser.geekbench.com/search?utf8=%E2%9C%93&q=Amazon for some vCPU comparisons, for example
Conclusionβ
In general, it's safe to say that this is a very vague assumption and it's impossible to get right theoretically.
The best AKalculator can do is give you a decent start point, and then you inevitably need to test in your setup.
Nevertheless, once you have real-world results - there are always ways to improve it.
Feel free to contact us for help in both testing and improvement.