Skip to main content

Produce/Consumer per vCPU 🏎️

πŸ’‘ Fact
  • 1 vCPU allows a broker to do 5 MiB/s Produce and 15 MiB/s Consume (at the same time)

AKalculator assumes an overlapping supported value of 5 MiB/s produce & 15 MiB/s consume for one vCPU.

In other words, we assume that 1 vCPU supports 5 MiB/s producer clients throughput and 15 MiB/s consumer clients throughput at the same time.

It could probably support more, especially if it's only one type of throughput that's high (e.g 1 MiB/s produce & 40 consume), but it can also support less - e.g if the clients are not optimized at all.

We find that this is a conservative estimate that's a good enough assumption and is in line with industry standards.

Note

There is a high chance this is over-estimating the consume requirement, so we may end up over-provisioning based on cpu count if the fanout ratio is above 3. We should probably change this.

There are definitely cases where it feels like AKalculator over-deploys some instances based on this. But we figure the conservative approach is better than showing unrealistic results, and instance cost is not a significant percentage of total cost unless the deployment is hyper-optimized (Tiered Storage on Azure where network is free).

What affects vCPU usage​

CPU attribution is impossible to get right. It's a multi-dimensional problem that depends on:

  • client batching & behavior (how often do they refresh metadata, reconnect, etc.)

  • how many clients there are

  • cluster size. the more brokers, the more inter-broker connections and replication overhead (you'd be surprised of the impact)

  • partition count (more overhead in replication but also controllers)

  • other unexpected things like consumer offset commits, SSL handshakes depending on Java version, etc.

  • the underlying CPU (see paragraph πŸ‘‡).

1 vCPU != 1 vCPU​

The main gotcha behind the vCPU abstraction is that it's incredibly inaccurate in representing performance.

  1. instance size. Sometimes eight 10 vCPU machines perform better than one 80 vCPU machines
  • TurboBoost a feature where the CPU dynamically increases its clock speed (perf per core) when it’s under its power budget (the other cores aren’t used)
  1. cpu architecture - arm vs x86

  2. cpu micro architecture / generation (broadwell, skylake, alder lake, etc.)

  3. different cloud providers

  4. each cloud provider uses different hypervisor software with different configuration β†’ diff. outcome/overhead

  • certain hypervisor tasks - machine management, logging, perf data collection β†’ can take resources from you
  1. temperature
  • if CPUs get too hot, they throttle their perf to not overheat (e.g ACs failing in the data center)
  1. other reasons we're missing here

See - https://browser.geekbench.com/search?utf8=%E2%9C%93&q=Amazon for some vCPU comparisons, for example

Conclusion​

In general, it's safe to say that this is a very vague assumption and it's impossible to get right theoretically.

The best AKalculator can do is give you a decent start point, and then you inevitably need to test in your setup.

Nevertheless, once you have real-world results - there are always ways to improve it.

Feel free to contact us for help in both testing and improvement.


πŸ€— Comments Welcome!

Leave feedback, report bugs or just complain at: