Essay No. 060  ·  Hyperscaler Silicon & Cloud Economics

AWS Graviton Nitro Trainium Custom Silicon Hyperscalers Arm AI Infrastructure Cloud Computing Chiplets Google Axion Microsoft Cobalt

AWS Turned the CPU Into Cloud Infrastructure. Original analysis Not investment advice

In 2021, Graviton3 looked like an Arm server CPU story. In 2026, the bigger lesson is clearer: AWS was not just building cheaper CPUs. It was vertically integrating the whole cloud computer.

PM
Pugalenthi Magendran
Published May 27, 2026
14 min read
Thesis

AWS did not commoditize the CPU by building the world's fastest core. It commoditized general-purpose cloud compute by controlling the chip, package, server, Nitro card, storage, networking, software stack and workload placement.

The easy way to misunderstand Graviton3 was to compare it only against Intel Xeon and AMD EPYC.

That misses the point. Graviton3 was not designed to win the server CPU beauty contest. It was designed to win the cloud economics contest. AWS does not sell chips. AWS rents compute by the hour, by the second, by the request, by the container, by the database, by the endpoint, and by the workload. The economics are different. AWS cares about useful compute per dollar, per watt, per rack, and per fleet.

The real story was not Arm versus x86. It was merchant silicon versus vertically integrated cloud infrastructure.

AWS does not sell chips. AWS rents compute.


Section 01 What the 2021 Graviton3 article got right

The 2021 SemiAnalysis piece on Graviton3 is the historical anchor for this essay[1]. It started with the point that AWS's hardware journey began with the acquisition of Annapurna Labs in 2015, and that Nitro was the first big in-housing effort. Nitro offloaded hypervisor, networking, storage, and security functions away from the host CPU. That freed more CPU cores to be rented directly to customers instead of being consumed by AWS management overhead.

The piece then walked through the system that was being built around Nitro. AWS announced a custom SSD controller to reduce performance variability and bring SSD controller margins in-house. Graviton3 was described as a seven-die chiplet design with about 55B transistors, 25% higher performance per core versus Graviton2, and 50% more DDR bandwidth, with DDR5 and PCIe Gen5 blocks around the central compute die. AWS designed the chip, package, and motherboard to fit three Graviton3 sockets per server, with one Nitro card managing three sockets. The C7g system landed with 64 cores, 2.6 GHz, 300 GB/s memory bandwidth, and roughly 50B transistors, and the focus was explicitly TCO at the server and rack level, not just peak CPU performance[1].

Source notes — 2021 SemiAnalysis Graviton3 claims (historical)
  • AWS acquired Annapurna Labs in 2015.
  • Nitro offloaded VPC networking, EBS, Nitro SSDs, AQUA, and system control off the host CPU.
  • Custom SSD controllers framed as a way to in-house storage variability and controller margin.
  • Graviton3 described as a seven-die chiplet design, ~55B transistors, +25% per core vs Graviton2, +50% DDR bandwidth.
  • DDR5 and PCIe Gen5 controllers as separate tiles around the central compute die.
  • BGA packaging instead of socketed CPU packaging.
  • Chip, package, and motherboard designed together for three Graviton3 sockets per server with one Nitro card.
  • C7g system: 64 cores, 2.6 GHz, 300 GB/s memory bandwidth, ~50B transistors.
  • Optimization target framed as TCO at server and rack level, not peak per-socket performance.

The 2021 article was right that Graviton3 was a system-level move, not just an Arm CPU launch. The next four years extended that move into the rest of the cloud computer.


Section 02 Nitro was the real foundation

Nitro is the control-plane offload layer. It moves work away from the host CPU: networking, storage, EBS, VPC data plane, virtualization, security, encryption, and system control. AWS's own Nitro page describes Nitro Cards offloading and accelerating IO for VPC, EBS, and instance storage, a Nitro Security Chip that offloads virtualization and security functions, a locked-down security model that prohibits administrative access including by Amazon employees, and a Nitro Hypervisor that supports near bare-metal performance[2].

The strategic effect is subtle but huge. Before AWS could make a cheaper CPU, it first removed work from the CPU. Every workload that runs on Nitro instead of the host CPU is a workload that no longer needs as many Xeon, EPYC, or Graviton cycles, and one that AWS can price and scale independently of the merchant CPU roadmap.

Do not just make the CPU better. Remove everything from the CPU that should not be on the CPU.


Section 03 Custom SSD controllers complete the pattern

The 2021 article did not only discuss Graviton. It also discussed AWS's custom SSD controller[1]. That matters because storage performance in cloud environments is not only about NAND. It is about consistency, fleet control, garbage collection, wear leveling, trim, latency, and software-managed behavior over the long tail of customer workloads.

Merchant SSD controllers are built for a broad market. AWS can design controllers for AWS's exact fleet behavior. That reduces variability, lets AWS standardize storage behavior across datacenters, and brings controller margin in-house. The pattern is the same as with Nitro and Graviton: in-house the parts of the cloud computer where AWS's fleet-level data and software stack let it make better, more consistent choices than a merchant vendor can.

AWS was not just in-housing chips. It was in-housing sources of variance.


Section 04 Graviton3 was designed for cloud economics, not CPU vanity

Intel and AMD sell socketed CPUs through an ecosystem of server OEMs, distributors, and customers with different requirements. AWS builds for itself. That changes every design choice. Graviton3 used BGA packaging instead of socketed CPU packaging, which reduced cost, complexity, failure points, and motherboard space. AWS could do that because it does not need field-replaceable CPUs sold through server OEMs. It could design the chip, the package, the motherboard, and the Nitro card as one system, and it could pack three Graviton3 CPUs into an air-cooled server unit[1].

AWS could make choices that merchant CPU vendors could not make. Those choices add up at fleet scale, where small per-server optimizations turn into large differences in cost per customer workload.


Section 05 Chiplets and advanced packaging were the quiet weapon

Graviton3 separated compute, memory controllers, and PCIe IO into separate dies, in a seven-die chiplet design with a central 64-core compute die and DDR5 and PCIe 5.0 controllers as separate tiles[1]. That is more than a yield trick. It let AWS move faster on IO and memory standards while controlling power, and it let the company refresh different parts of the chip on different cadences.

For a hyperscaler, this means the package can be optimized for fleet needs rather than generic server market needs. The package became part of the product.


Section 06 The real metric is useful compute per fleet dollar

Cloud economics are measured differently from the way benchmark charts measure CPUs. The questions a hyperscaler asks itself are not "what is the SPECrate score" but "how much customer-rentable compute can I produce, per dollar, per watt, per rack, per fleet, while keeping the software experience clean."

Cloud economics metric map — what hyperscalers actually optimize
Cost per vCPU-hourWhat it costs to provide one rentable core for one hour.
Watts per useful workloadEnergy per unit of customer-visible compute.
Cores rented per serverEffective core yield to customers after Nitro and OS overhead.
Server & rack densityHow many CPUs fit in one chassis, one rack, one row.
Memory bandwidth per workloadRight-sized bandwidth for cloud workloads, not just peaks.
Network cost per CPUCost of attached networking per rentable core.
Nitro offloadWork removed from the host CPU and run on dedicated cards.
Storage consistencyVariance in latency and throughput across the fleet.
Fleet utilizationHow well the workload mix fills installed capacity.
Operational simplicityHow much human cost the platform absorbs vs creates.

A CPU that is not the fastest per socket can still be the best cloud CPU if it improves fleet economics.


Section 07 Graviton5 proves this was not a niche experiment

AWS introduced Graviton5 as its most powerful and advanced CPU for broad cloud workloads, with Graviton5-based M9g instances delivering up to 25% higher performance than the previous generation, 192 cores per chip, and a 5x larger cache. AWS said more than half of new CPU capacity added to AWS has been Graviton-powered for three years in a row, and that 98% of the top 1,000 EC2 customers have benefited from Graviton price-performance advantages[3]. Amazon's FY2025 results added that Graviton is used by over 90% of the top 1,000 AWS customers and can be up to 40% more price-performant than leading x86 processors[4].

Graviton stopped being an Arm experiment. It became AWS capacity strategy.


Section 08 The rest of the industry copied the direction

Google introduced Axion, its first custom Arm-based CPU for the datacenter, with a clear message that general-purpose compute remains important even as accelerators grow[6]. Google's C4A instance documentation positions C4A on Axion alongside Titanium offloads for networking, security, and storage, reinforcing the offload-plus-custom-CPU pattern[7]. Microsoft followed with Cobalt 100, a fully custom Arm-based Azure CPU, with Cobalt 100 VMs framed as offering up to 50% better price-performance than previous Azure Arm-based VMs, and the broader value framed as optimization across silicon, servers, and services[8].

AWS was early, but the pattern became hyperscaler-wide. Three of the largest cloud platforms now run custom Arm CPUs alongside dedicated offload hardware, with x86 as one of several choices rather than the default.


Section 09 From Graviton to Trainium, the same playbook expanded into AI

Graviton is the general-purpose compute version of the strategy. Trainium is the AI accelerator version. Nitro is the control-plane and infrastructure version. Neuron is the software layer. AWS wants to control unit economics across CPU, AI training, AI inference, networking, storage, and fleet software.

Amazon's FY2025 results put numbers on the AI side of the pattern. AWS revenue reached $128.7B in 2025, with AWS operating income of $45.6B. Trainium and Graviton combined had an annual revenue run rate above $10B and were growing at triple-digit percentages year over year. Trainium2 was fully subscribed with 1.4M chips landed. Project Rainier used more than 500,000 Trainium2 chips. Amazon expected roughly $200B in capital expenditures across Amazon in 2026, reflecting AI, chips, and infrastructure demand[4].

Graviton taught AWS how to turn silicon into cloud economics. Trainium applies that lesson to AI.

AWS custom silicon map — one playbook, many layers
Nitro
Offload, security, networking, storage; the control-plane layer that frees the host CPU.
Graviton
General-purpose CPU economics, BGA-packed, chiplet-based, designed for fleet TCO.
Trainium
AI training and inference; HBM-rich, UltraServer-scale, NeuronLink-stitched compute.
Inferentia
AI inference for high-volume customer-facing workloads at lower unit cost.
Neuron
Compiler and software stack that hides hardware choice from model developers.
Custom SSD controllers
Storage consistency and cost control; in-housing the long-tail variance.

Section 10 Trainium3 shows where this goes next

Trainium3 is AWS's first 3 nm AI chip. Trn3 UltraServers scale up to 144 Trainium3 chips and deliver up to 362 FP8 or MXFP8 PFLOPs. Each Trainium3 chip has 144 GB HBM3e and 4.9 TB/s memory bandwidth, with Trn3 UltraServers delivering up to 20.7 TB HBM3e and 706 TB/s aggregate memory bandwidth. The system stitches together with NeuronSwitch and NeuronLink, positioned for agentic, reasoning, and video-generation workloads[5].

This is the same vertical-integration pattern as Graviton, but at AI scale. The chip is not enough. The system includes HBM, interconnect, compiler, runtime, model support, Bedrock, SageMaker, EKS, networking, and cluster scheduling. In AI, the accelerator is only one layer of the cloud computer.

Cloud computer stack — what hyperscalers actually own

Customer workload

Web apps, databases, training jobs, inference endpoints, agentic pipelines.

EC2 instance

Sized and scheduled across the fleet; abstracted from the underlying silicon.

Hypervisor

Nitro Hypervisor; near bare-metal isolation between tenants on shared hardware.

Nitro cards

VPC, EBS, instance storage, security; offloads work from the host CPU.

Graviton (or x86)

General-purpose compute; sized for fleet TCO, not benchmark wins.

Custom SSD controller

Predictable storage variance across the fleet; in-housed long-tail behavior.

Networking

ENA, EFA, custom switching; bandwidth and latency tuned to cloud topology.

Storage

EBS, S3, instance storage; consistent semantics across regions.

Fleet scheduler

Workload placement, autoscaling, spot, and capacity reservations.

Datacenter power & cooling

Real-world constraints that ultimately bound everything above.

Section 11 Why merchant silicon vendors should care

Intel and AMD still matter. They will continue to lead many workloads. x86 compatibility remains important, and many enterprise applications still run best or easiest on x86. But hyperscaler custom CPUs reduce the default assumption that cloud CPUs must be merchant silicon. The risk to merchant vendors is not that every CPU disappears. It is that hyperscalers internalize high-volume workloads, merchant CPUs become one option among many, hyperscaler custom CPUs become a price-performance lever inside the cloud, and the cloud vendor controls which CPU customers actually see by default.

Merchant CPU

Sold to many customers

  • Sold through OEMs and channel partners.
  • Socketed for field replacement.
  • Optimized for peak performance per SKU.
  • Built for broad compatibility across workloads.
  • Roadmap responds to the whole server market.
  • Margin goes to the chip vendor.
Hyperscaler CPU

Built for one cloud

  • Built and deployed by the cloud itself.
  • BGA packaging possible; no field replacement.
  • Optimized for fleet TCO at server and rack level.
  • Integrated with Nitro, storage, and networking.
  • Integrated with cloud services like Bedrock and EKS.
  • Margin and price-performance levers stay inside the cloud.

The threat is not that every CPU disappears. The threat is that the hyperscaler decides which CPU matters.


Section 12 What people got wrong in 2021

The weak interpretation of the 2021 article was that Graviton3 was Amazon's Arm CPU. The better interpretation is that Graviton3 was a fleet economics instrument. The CPU was one part of a much longer list: Arm cores, chiplets, advanced packaging, BGA, DDR5, PCIe 5.0, Nitro, custom SSD control, server density, rack-level TCO, AWS software services, and workload placement. The chip was the visible part. The system was the moat.

The chip was the visible part. The system was the moat.

2021 thesis

Graviton3 attacks server TCO

Graviton3 used chiplets, advanced packaging, DDR5, PCIe 5.0, BGA, and Nitro to attack server- and rack-level TCO; the 2021 piece framed AWS as targeting cost per useful unit of customer compute, not benchmark wins.

2026 reality

A capacity strategy across CPU and AI

Graviton5 became large-scale AWS capacity strategy with 192 cores and price-performance leadership claims, while Trainium3 and Trn3 UltraServers extended the same vertical-integration playbook into 3 nm AI chips, 144-chip UltraServers, and HBM3e at fleet scale.

2015

AWS acquires Annapurna Labs

In-house silicon design becomes part of AWS, setting up Nitro, Graviton, and Trainium over the next decade[1].

2017-2019

Nitro becomes core AWS infrastructure

Nitro Cards and Nitro Hypervisor offload VPC, EBS, instance storage, security, and virtualization from the host CPU[2].

2021

Graviton3 launches with chiplets, DDR5 and PCIe 5.0

Seven-die chiplet design, BGA packaging, three sockets per server with one Nitro card, designed for fleet TCO[1].

2024

Google Axion and Microsoft Cobalt validate the trend

Custom Arm CPUs at Google and Microsoft, paired with Titanium and Azure-native offloads, follow the AWS pattern[6][7][8].

2025

Graviton5 introduced

M9g instances, 192 cores per chip, 5x larger cache, with AWS framing it as the most powerful Graviton yet for broad cloud workloads[3].

2025

AWS says Graviton powers more than half of new CPU capacity for three years

Graviton becomes the default rather than the exception inside AWS's net-new capacity story[3].

2025

Project Rainier uses 500,000+ Trainium2 chips

Trainium2 fully subscribed with 1.4M chips landed; combined Trainium and Graviton run rate above $10B and growing triple-digit percentages YoY[4].

2026

Trainium3 and Trn3 UltraServers push the model into AI scale

First 3 nm AWS AI chip, up to 144 Trainium3 chips per UltraServer, 362 PFLOPs FP8/MXFP8, 20.7 TB HBM3e, 706 TB/s aggregate memory bandwidth[5].


Section 13 Risks and limits

The argument above blends the 2021 SemiAnalysis frame, AWS materials, and other hyperscaler announcements. It is worth being explicit about where the case can break.

Risk 01

Graviton is strong where software is portable and price-performance matters, but x86 remains important for many workloads.

Risk 02

Some workloads depend on x86-specific software, licensing, tuning, or vendor support that does not translate cleanly to Arm.

Risk 03

AWS official performance claims are company claims and should not be treated as independent benchmarks.

Risk 04

Trainium adoption depends on compiler maturity, framework support, model support, developer experience, and customer willingness to move from NVIDIA.

Risk 05

Custom silicon increases capex and execution risk; the strategy is not free.

Risk 06

Vertical integration reduces supplier dependence but increases responsibility for every layer.

Risk 07

Hyperscaler CPUs are not necessarily available outside their clouds, which limits some workload portability.

Risk 08

Google Axion and Microsoft Cobalt show the strategy is not unique to AWS, which compresses differentiation over time.

Risk 09

Intel and AMD can still compete with better products, packaging, accelerators, and ecosystem support.

Risk 10

This essay is industry analysis, not investment advice; hyperscaler unit economics depend on factors beyond silicon.

The point is not that AWS made merchant CPUs obsolete. The point is that AWS made the CPU subordinate to cloud economics.


Section 14 Final verdict

The 2021 Graviton3 article was right because it saw the system. AWS was not just building a cheaper Arm CPU. It was vertically integrating the cloud computer: Nitro for offload, custom SSD control for storage consistency, Graviton for general-purpose compute, BGA and server design for density, Trainium for AI, Neuron for software, and AWS services for workload placement. The old CPU market was about selling chips. The hyperscaler CPU market is about renting useful compute.

The CPU did not disappear. It became part of the hyperscaler control plane.


Section 15 Evidence ledger and source notes

Evidence ledger — load-bearing claims with sources
SourceClaimWhy it matters
SemiAnalysis (2021)Annapurna 2015; Nitro offload; 7-die Graviton3 chiplets; ~55B transistors; DDR5 and PCIe 5.0; BGA; 3 sockets per server with 1 Nitro card; C7g 64 cores at 2.6 GHz.Anchors the system-level reading of Graviton3.
AWS Nitro pageNitro Cards offload VPC, EBS, instance storage; Nitro Security Chip; locked-down admin model; Nitro Hypervisor near bare-metal.Confirms Nitro as the control-plane offload layer.
AWS Graviton5 releaseM9g up to 25% higher perf vs previous gen; 192 cores; 5x larger cache; >50% of new CPU capacity Graviton-powered for 3 years; 98% of top 1,000 EC2 customers benefit.Quantifies Graviton's move from experiment to default.
Amazon FY2025 resultsAWS revenue $128.7B; AWS operating income $45.6B; Trainium+Graviton run rate >$10B at triple-digit growth; Trainium2 fully subscribed with 1.4M chips landed; Project Rainier 500,000+ Trainium2; ~$200B Amazon capex in 2026; Graviton at 90%+ of top 1,000 customers, up to 40% better price-performance vs leading x86.Confirms the AI extension of the playbook and the capex commitment.
AWS Trn3 UltraServersTrainium3 first 3 nm AWS AI chip; up to 144 Trainium3 per UltraServer; 362 PFLOPs FP8/MXFP8; 144 GB HBM3e per chip; 20.7 TB HBM3e and 706 TB/s aggregate; NeuronSwitch and NeuronLink.Shows where the vertical-integration playbook lands in AI hardware.
Google Axion blogCustom Arm-based datacenter CPU; framing of general-purpose compute remaining critical alongside accelerators.Validates the hyperscaler custom CPU trend at Google.
Google C4A / Axion docsC4A on Axion paired with Titanium offloads for networking, security, and storage.Reinforces the Nitro-style offload pattern at Google.
Microsoft Cobalt 100Custom Arm-based Azure CPU; Cobalt 100 VMs up to 50% better price-performance vs previous Azure Arm VMs; optimization across silicon, servers, and services.Validates the pattern across the third major hyperscaler.

Footnotes & sources

  1. SemiAnalysis, “Amazon Graviton 3 Uses Chiplets & Advanced Packaging To Commoditize High Performance CPUs — The First PCIe 5.0 And DDR5 Server CPU,” 2021 (PDF supplied by author). Source for the Annapurna 2015 acquisition framing, Nitro offload model, custom SSD controller logic, the seven-die Graviton3 chiplet design, ~55B transistors, +25% per-core vs Graviton2, +50% DDR bandwidth, DDR5 and PCIe Gen5 tiles, BGA packaging, three sockets per server with one Nitro card, the C7g system at 64 cores and 2.6 GHz with 300 GB/s memory bandwidth, and the server- and rack-level TCO framing.
  2. AWS, “AWS Nitro System,” aws.amazon.com/ec2/nitro. Source for Nitro Cards offloading and accelerating IO for VPC, EBS, and instance storage, the Nitro Security Chip's offload of virtualization and security functions, the locked-down security model that prohibits administrative access including by Amazon employees, and the Nitro Hypervisor's near bare-metal performance positioning.
  3. AWS, “Introducing AWS Graviton5,” aboutamazon.com/news/aws/aws-graviton-5-cpu-amazon-ec2. Source for Graviton5-based M9g delivering up to 25% higher performance than the previous generation, the 192-core configuration, the 5x larger cache claim, the "more than half of new CPU capacity Graviton-powered for three years in a row" statement, and the "98% of the top 1,000 EC2 customers have benefited from Graviton price-performance" framing.
  4. Amazon Investor Relations, “Amazon.com Announces Fourth Quarter Results (FY2025),” ir.aboutamazon.com/…/Amazon-com-Announces-Fourth-Quarter-Results. Source for AWS FY2025 revenue of $128.7B, AWS operating income of $45.6B, the Trainium and Graviton combined annual revenue run rate above $10B with triple-digit YoY growth, Trainium2 being fully subscribed with 1.4M chips landed, Project Rainier using 500,000+ Trainium2 chips, the ~$200B Amazon capex outlook for 2026, and the Graviton at 90%+ of top 1,000 customers with up to 40% better price-performance vs leading x86 processors framing.
  5. AWS, “Amazon EC2 Trn3 Instances,” aws.amazon.com/ec2/instance-types/trn3. Source for Trainium3 as AWS's first 3 nm AI chip, Trn3 UltraServers scaling up to 144 Trainium3 chips, up to 362 FP8 or MXFP8 PFLOPs, 144 GB HBM3e per chip, 4.9 TB/s per-chip memory bandwidth, 20.7 TB HBM3e and 706 TB/s aggregate memory bandwidth at UltraServer scale, NeuronSwitch and NeuronLink, and the agentic, reasoning, and video-generation workload positioning.
  6. Google Cloud, “Introducing Google's New Arm-Based CPU,” cloud.google.com/blog/…/introducing-googles-new-arm-based-cpu. Source for Google Axion as the first custom Arm-based Google datacenter CPU and for the general-purpose-compute-still-matters framing.
  7. Google Cloud, “Arm-based Compute Engine instances,” docs.cloud.google.com/compute/docs/instances/arm-on-compute. Source for C4A as the Axion-based instance family and for the Titanium offload pairing across networking, security, and storage.
  8. Microsoft Azure, “Azure Cobalt 100-based Virtual Machines Are Now Generally Available,” azure.microsoft.com/…/azure-cobalt-100-based-virtual-machines-are-now-generally-available. Source for Cobalt 100 as Microsoft's fully custom Arm-based Azure CPU, the up to 50% better price-performance claim vs previous Azure Arm-based VMs, and the optimization-across-silicon-servers-and-services framing.