Cockroach Labs just released their 2022 Cloud Report—a detailed benchmark comparison of what you can expect from the top public cloud providers worldwide: Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). The full report is nearly 70 pages, with tons of technical benchmarking data, so let’s cover the broad strokes and big takeaways.
The 2022 Cloud Report focused on online transaction processing (OLTP) performance. OLTP is all about processing database transactions—a change, insertion, deletion, or query from a database—as quickly and reliably as possible. These transactions also need to be atomic in that they either succeed or fail entirely so that the database isn’t left in an uncertain state, which could corrupt user data.
As a user or consumer, you’re most likely to interact with OLTP when you do online banking, buy something online, or book a reservation with an airline, to name a few. These are significant transactions, which means organizations are always looking to optimize performance or improve the reliability of the underlying process and respond infrastructure.
The 2022 Cloud Report uses Cockroach Labs’ product—the open-source, cloud-native, distributed SQL database called CockroachDB—as the foundation to run benchmarks across the three cloud providers and nodes of various sizes. The steps to reproduce all their benchmarks are open source, which means you can try them out on your instances if you’d like to see how your infrastructure stacks up.
Those behind the 2022 Cloud Report say they were driven by one core question: Do database users get more performance out of many smaller nodes or fewer large ones?
The benchmarks showed that regardless of the instance type, CPU architecture, or cloud provider, smaller instances produced a higher per-vCPU performance. They calculated this as the new-order transactions per minute (TPM) against the number of vCPUs in the instance. GCP’s t2d-standard-8 instance led the pack, with AWS’ m6i.2xlarge coming in second. The highest TPM per vCPU performers were the small eight vCPU nodes except for GCP’s 32 vCPU nodes on AMD Milan and Ice Lake.
For those looking to improve their OLTP performance, it might be worth investigating whether balancing the load across multiple nodes might impact transaction times and potentially reduce your cost. Even in infrastructure, bigger might not always be better.
But the insights aren’t behind the report found some other interesting details along the way.
The vCPU:RAM ratio matters a lot. All the best-performing instances, both small and large, had at least 4GB of RAM per vCPU. For example, GCP’s t2d-standard-8 instance has 8 vCPUs and 32GB of RAM, while AWS’ m6i.8xlarge has 32 vCPUs and 128GB of RAM, and they both performed admirably well—better than similar instances with a smaller ratio between RAM and vCPU count.
High-performance storage is often not worth the cost. Cockroach Labs found that the total cost of a particular workload is mostly driven by storage costs, not costs for the instance itself. That’s particularly true when opting for the high-performance storage features that all three cloud providers offer. At that point, you’re spending nearly 70% of your overall costs just on storage, which might not be advantageous for your overall OLTP performance. In fact, the benchmarks uncovered no single combination of instance + storage combination where paying for top-tier storage was more cost-effective than general-purpose block storage. They recommend paying more only if your workload demands higher-than-average IOPS or extremely low latency.
There’s a difference between speed you pay for and speed that’s promised for free. The benchmarks showed that if you pay extra for a higher tier of performance across all three cloud providers, you get what you’re promised, whether that’s additional networking bandwidth or IOPS throughput. But be wary when providers make performance promises billed as value-adds—Cockroach Labs found obvious cases of throttling, which means you’re only getting those free promises on occasion.
There’s no clear winner in the public cloud performance race. The report says that AWS, Azure, and GCP all put at least one instance in the top 5 for per-vCPU TPM, and all have both low-price and high-performance options at similar price ranges. If you’re looking to improve performance for OLTP applications, you’re probably not going to find tons of improvement on speed or cost just by migrating from one cloud to another. Instead, you’ll likely find more immediate performance improvements with multiple smaller nodes to load-balance your OLTP applications.
The response to any benchmarking report shouldn’t be to immediately start migrating services or launching new instances to optimize your infrastructure prematurely. Instead, take your time on a more holistic approach, which investigates your infrastructure and code. Use these results not as gospel but as investigative avenues you might want to explore on your own.