← Back to blog

Essay No. 021 · AI Infrastructure · Melbourne, Australia

AI Infrastructure Google Argos VCU Custom Silicon Hyperscalers TPUs Axion Ironwood Meta MSVP AWS Trainium Microsoft Maia Cloud Infrastructure

The Custom Silicon Flywheel.Original analysisNot investment advice

How Google’s Argos VCU became the blueprint for the domain-specific cloud.

Pugalenthi Magendran

March 2026 · Melbourne, Australia

12 min read

Google’s Argos VCU was not just a YouTube video chip. It was an early warning that at hyperscaler scale, massive workloads eventually become silicon. In 2026, that logic has moved from video to CPUs to AI inference.

In 2021, Google’s Argos VCU looked like a strange niche chip. It was not a CPU. It was not a GPU. It was not a TPU. It was a video coding unit, built for one of the most specific workloads on the internet: encoding video at warehouse scale.

The uploaded SemiAnalysis piece argued that Argos could replace millions of Intel CPUs because YouTube video transcoding had become too large, too repetitive, and too expensive to run on general-purpose processors. Under the article’s assumptions, Google could avoid roughly 10 million Intel Skylake CPU purchases by using Argos VPUs instead.¹ That sounded dramatic in 2021. In 2026, it looks like the beginning of a much bigger story.

Argos was not really about video. Argos was about the end of the purely general-purpose cloud.

The old cloud was built around CPUs. Buy servers, virtualize them, rent compute, scale horizontally. That worked when workloads were broad, messy, and not always large enough to justify custom silicon. Hyperscalers are different. Google does not have one workload. It has planet-scale workloads: Search, YouTube, Gmail, Photos, Maps, Ads, Android, Cloud, DeepMind, Gemini, and Workspace. When one of those workloads becomes large enough, the economics change.

A small efficiency gain becomes massive at Google scale. A codec improvement saves storage and bandwidth. A better accelerator reduces power. A custom CPU reduces cloud operating cost. A custom AI chip improves inference economics. That is the custom silicon flywheel.

Key idea

The correct claim is not that custom silicon replaces every CPU and GPU. The correct claim is that hyperscalers selectively replace merchant silicon wherever workload scale, software control, and TCO justify the design cost. Argos was the video version. TPU was the AI version. Axion is the CPU version. Ironwood is the inference version.

I. The 2021 thesis

In June 2021, Dylan Patel published a SemiAnalysis piece on Google’s Argos VPU. The framing was concrete. Google built a Video Coding Unit for YouTube-scale transcoding because VP9 reduced bandwidth and storage but was much harder for CPUs to encode. The TCO comparison favored a domain-specific chip. The article estimated the avoided Intel CPU purchases at roughly ten million units under its assumptions and walked through Google’s chip-to-cluster design from encoder cores up through PCIe cards, server nodes, clusters, and regions.¹

I revisited that piece because the analytical move it made aged into the dominant pattern of cloud computing. The video number was striking. The structural argument was the real takeaway.

2021 thesis

When a hyperscaler owns a workload that is huge, repetitive, expensive, and vertically controlled, general-purpose CPUs become economically wrong. The workload becomes a chip.

II. Why video became silicon

A creator uploads one video. Google has to turn it into many versions for many devices, networks, and quality levels. That fan-out is the workload that made Argos make sense.

Diagram 01 · One upload, many outputs

Single upload Original master file

240pH.264

480pH.264

720pVP9

1080pVP9

4KVP9 / AV1

AudioOpus / AAC

VP9 saves bandwidth and storage versus H.264 but costs much more CPU effort. AV1 saves more again, at higher compute complexity. At YouTube scale, those compute differences become billions of CPU hours.

Argos made sense because Google controlled both sides: the incoming video workload and the serving infrastructure. The chip’s job was not to be a great CPU. Its job was to be a great YouTube encoder.

Google did not build Argos because CPUs were bad. It built Argos because YouTube was too large for “good enough” CPUs to remain economically good enough.

III. The real lesson was system co-design

Argos was not just a chip. Google’s own research paper on Warehouse-Scale Video Acceleration frames it as co-design and deployment in the wild, emphasising balanced systems at data-center scale and co-design with distributed software systems.² The chip would have been a curiosity without the software, the scheduler, the deployment fabric, and the fleet that surrounded it.

Diagram 02 · Workload to product co-design

Workload

Software stack

Scheduler

Accelerator card

Server

Cluster & region

Product experience

A custom chip is only as powerful as the system that can keep it utilized. Hyperscaler silicon wins because hyperscalers own every layer of this chain.

Hyperscaler silicon works when the company controls the workload, the software stack, the compiler and runtime, the scheduler, the deployment environment, fleet utilisation, data-center power and cooling, and the product requirements that define what good even means. Anyone missing two or three of those pieces ends up building a museum piece.

IV. The custom silicon flywheel

The deeper pattern in Argos was a repeatable cycle that Google has been running across multiple workloads ever since.

Diagram 03 · The custom silicon flywheel

Own a workload

Massive, vertically controlled, growing.

Measure the wall

CPU / GPU cost, power, latency limits.

Build domain silicon

Designed for the specific job.

Deploy at scale

Warehouse-scale rollouts.

Co-design software

Compiler, runtime, scheduler.

Improve TCO & product

Cost, power, latency, quality.

Reuse as cloud lever

Sell capacity or experience.

Pick next workload

Repeat across video, CPU, AI, etc.

↻ The flywheel turns once. It then turns again, faster.

Argos was the video turn of this loop. TPU was the AI turn. Axion is the CPU turn. Ironwood is the inference-era AI turn.

V. Argos was the video version. Ironwood is the AI version.

The same loop that produced Argos in 2021 produced Ironwood by 2026. Ironwood is Google’s seventh-generation TPU, framed as built for the age of inference. Google says the platform scales to 9,216 chips per pod and reaches 42.5 exaflops at the pod level, with 192 GiB HBM3E per chip and 7.4 TB/s peak HBM bandwidth. A full pod has 1.77 PB of directly accessible HBM.⁴⁵

The shape of the bet is the same as Argos. Own the workload (inference for Gemini, Search, AdSense, YouTube ML, and Cloud TPU customers). Measure the wall (GPU pricing, power, and supply at hyperscaler scale). Build the domain chip. Co-design the compiler and the scheduler. Deploy across regions. Improve token economics. Reuse the capacity as cloud leverage. Move on to the next workload.

The question moved from “how do we encode YouTube cheaper?” to “how do we generate tokens cheaper?”

VI. Axion shows the CPU layer is also being rebuilt

The third turn of the flywheel is the boring layer: general-purpose CPUs. Google introduced Axion as its first custom Arm-based CPU for the data center, explicitly placing it in a long line of custom Google silicon that includes TPUs, VCUs, and Tensor. Google claims Axion offers up to 30% better performance than the fastest general-purpose Arm-based cloud instances and up to 50% better performance with up to 60% better energy efficiency than comparable current-generation x86-based instances.³

The point is not that CPUs vanish. The point is that even general-purpose compute is being customised by the largest cloud operators.

The cloud is not abandoning CPUs. It is making CPUs hyperscaler-specific.

VII. Google is not alone

The same pattern is visible across the other hyperscalers, each adapted to that company’s biggest internal workloads.

Diagram 04 · Google’s custom silicon portfolio

Video

Argos / VCU

Warehouse-scale video encoding for YouTube and other media workloads.²

AI training

TPU family

Multi-generation Tensor Processing Units for ML training and inference.

AI inference

Ironwood TPU

7th-gen TPU built for the age of inference, scaling to 9,216 chips per pod.⁴⁵

CPU

Axion

Custom Arm-based data-center CPU for general cloud workloads.³

Mobile

Tensor

SoC for Pixel devices and on-device AI features.

Security

Titan

Secure microcontroller family used across Google infrastructure.

Five workloads, five chips. Same flywheel. Same playbook.

Table · Hyperscaler custom silicon by domain

Hyperscaler AI / Accelerator CPU Video / Media Security / Misc

Google

TPU / Ironwood⁴

Axion³

VCU / Argos²

Titan · Tensor (mobile)

AWS

Trainium + Inferentia⁷⁸

Graviton⁹

Nitro media offload

Nitro security chip

Microsoft

Maia AI Accelerator¹⁰¹¹

Cobalt Arm CPU¹⁰

Azure media services HW

Pluton security

VIII. Why merchant silicon still matters

This is not a story about Intel, AMD, and Nvidia disappearing. It is a story about where the biggest workloads sit, and who can justify the design cost to absorb them.

Merchant silicon wins when…

Broad markets & flexibility

Workloads are broad across many customers.
Software ecosystem and standards matter (CUDA, x86).
Workloads change quickly over short cycles.
Utilisation is uncertain for a given customer.
Buyers don’t own enough scale to justify ASIC design cost.
Time-to-market matters more than per-unit cost.

Custom silicon wins when…

Single buyer, huge scale

Workload is huge at one operator’s fleet.
Workload is stable enough to design hardware around.
Software and deployment are controlled end-to-end.
TCO savings justify multi-year design cost.
Power and latency are first-order constraints.
Utilisation can be engineered by the operator.

Custom silicon does not replace merchant silicon everywhere. It eats the parts of the cloud where the workload is large enough to become its own market.

IX. Strategic pressure on Intel, AMD, and Nvidia

The right way to read this is not as a death blow to merchant vendors but as a quiet cap on their largest accounts. Intel feels it through fewer generic CPU cycles inside hyperscalers as Axion, Graviton, and Cobalt absorb general-purpose workloads. AMD competes well in CPUs and GPUs but faces the same hyperscaler internal pressure on its largest customers. Nvidia remains dominant in AI, but TPU, Trainium, Maia, and other custom accelerators reduce complete dependence on Nvidia inside the operators that control the most compute.

A custom chip can be worse in general and still better for the exact workload that matters.

X. The physical base layer

Custom silicon does not escape the physical world. Every Argos, TPU, Axion, Maia, and Trainium is fabricated, packaged, and powered using the same supply chain that builds Nvidia’s systems. TSMC’s 2025 annual report frames robust AI-related demand throughout 2025 and only mild recovery in non-AI markets, with advanced nodes and packaging investment as the structural drivers.¹² ASML’s 2025 strategic report reinforces the same picture from the lithography side: AI requires leading-edge processor chips and a significant increase in DRAM compared with traditional compute architectures.¹³

Hyperscalers can design their own chips, but they still depend on the same physical semiconductor bottlenecks.

Quick terms

XI. What could break the thesis

A serious piece needs counterarguments. The custom silicon flywheel has plausible failure modes.

Risks & counterarguments

Design cost. Custom silicon is expensive and slow to design, verify, and deploy. Cost overruns can crush ROI.
Workload drift. Workloads can change faster than chips can be designed. A bad bet ages quickly.
Merchant catch-up. GPUs and CPUs may improve enough to reduce the custom ROI gap, especially as Nvidia ships new generations.
Software moat. Nvidia’s CUDA ecosystem remains hard to displace at the workload level, even when custom silicon is faster on paper.
Scale floor. Smaller cloud providers cannot justify the design cost. The flywheel needs scale to spin.
Utilisation risk. Underused custom silicon can be worse than flexible merchant silicon. The chip is only as good as the scheduler.
Supply chain. Custom chips still compete for HBM, CoWoS, substrates, and advanced nodes. The base layer remains shared.
External adoption. Internal chips may be great for internal workloads but unattractive to external cloud customers used to merchant ecosystems.
Compiler / runtime maturity. A great chip without a great software toolchain is a museum piece.
People. Hyperscaler silicon teams compete for the same chip-design talent as merchant vendors and each other.

The mistake is not believing in custom silicon. The mistake is believing every workload deserves a custom chip.

XII. The domain-specific cloud

Google Argos was not just a YouTube chip. It was a warning. At hyperscaler scale, a workload does not stay software forever. If it is big enough, expensive enough, stable enough, and strategically important enough, it becomes silicon.

That was true for video. It became true for AI. It is becoming true for CPUs. It will become true for networking, security, inference, search, recommendation, and media generation.

The cloud is not becoming less specialized. It is becoming more specialized. The old cloud was general purpose. The new cloud is a portfolio of domain-specific machines.

That is the custom silicon flywheel.

¹ Patel, D. (Jun 2021). Google New Custom Silicon Replaces 10 Million Intel CPUs | Google Argos VPU. SemiAnalysis. Historical anchor for the Argos / VCU thesis, the 10-million Intel CPU estimate under the article’s assumptions, VP9 / H.264 workload framing, and chip-to-cluster co-design argument. Used as inspiration only. No content, structure, or charts reproduced.

² Google Research. Warehouse-Scale Video Acceleration: Co-design and Deployment in the Wild. Official Google research framing of the VCU as a warehouse-scale video acceleration platform, with co-design across distributed software systems.

³ Google Cloud. Introducing Google’s new Arm-based CPU (Axion). Google framing of Axion as the company’s first Arm-based data-center CPU, situating it alongside TPUs, VCUs, and Tensor as part of a long line of custom Google silicon, with up to 30% better performance versus fastest general-purpose Arm-based cloud instances and up to 50% performance plus 60% energy-efficiency gains versus comparable current-generation x86 instances.

⁴ Google. Ironwood TPU: built for the age of inference. Ironwood as the 7th-generation TPU framed around inference workloads, with pod-scale headline numbers.

⁵ Google Cloud. Inside the Ironwood TPU and co-designed AI stack. 192 GiB HBM3E per chip, 7.4 TB/s peak HBM bandwidth, 9,216 chips per pod, 42.5 exaflops per pod, and 1.77 PB of directly accessible HBM per pod.

⁶ Meta. Meta Scalable Video Processor (MSVP). Meta’s first in-house ASIC for video processing, targeting video-on-demand and live-streaming workloads, with dedicated hardware framed as the best solution for compute power and efficiency at Meta scale.

⁷ AWS. AWS Trainium. Trainium as a purpose-built AI accelerator family for high-performance, cost-efficient training and inference.

⁸ AWS. AWS Inferentia. Inference-focused AWS custom silicon for deep learning and generative AI workloads.

⁹ AWS. AWS Silicon Innovation. Portfolio context for Graviton, Trainium, and Inferentia.

¹⁰ Microsoft. In-house chips: silicon to service. Microsoft Azure Maia AI Accelerator and Cobalt Arm CPU framed as part of a silicon-to-service systems approach.

¹¹ Microsoft. Maia 200: the AI accelerator built for inference. Maia 200 framing as an inference-focused AI accelerator in the Azure custom silicon portfolio.

¹² TSMC. 2025 Annual Report. Robust AI-related demand, mild non-AI recovery, advanced-node and packaging investment as structural drivers.

¹³ ASML (2025). 2025 Annual Report, strategic report section. AI requires leading-edge, high-performance processor chips and a significant increase in DRAM compared with traditional compute architectures.