The Custom Silicon Flywheel.Original analysisNot investment advice
Google’s Argos VCU was not just a YouTube video chip. It was an early warning that at hyperscaler scale, massive workloads eventually become silicon. In 2026, that logic has moved from video to CPUs to AI inference.
In 2021, Google’s Argos VCU looked like a strange niche chip. It was not a CPU. It was not a GPU. It was not a TPU. It was a video coding unit, built for one of the most specific workloads on the internet: encoding video at warehouse scale.
The uploaded SemiAnalysis piece argued that Argos could replace millions of Intel CPUs because YouTube video transcoding had become too large, too repetitive, and too expensive to run on general-purpose processors. Under the article’s assumptions, Google could avoid roughly 10 million Intel Skylake CPU purchases by using Argos VPUs instead.1 That sounded dramatic in 2021. In 2026, it looks like the beginning of a much bigger story.
Argos was not really about video. Argos was about the end of the purely general-purpose cloud.
The old cloud was built around CPUs. Buy servers, virtualize them, rent compute, scale horizontally. That worked when workloads were broad, messy, and not always large enough to justify custom silicon. Hyperscalers are different. Google does not have one workload. It has planet-scale workloads: Search, YouTube, Gmail, Photos, Maps, Ads, Android, Cloud, DeepMind, Gemini, and Workspace. When one of those workloads becomes large enough, the economics change.
A small efficiency gain becomes massive at Google scale. A codec improvement saves storage and bandwidth. A better accelerator reduces power. A custom CPU reduces cloud operating cost. A custom AI chip improves inference economics. That is the custom silicon flywheel.
The correct claim is not that custom silicon replaces every CPU and GPU. The correct claim is that hyperscalers selectively replace merchant silicon wherever workload scale, software control, and TCO justify the design cost. Argos was the video version. TPU was the AI version. Axion is the CPU version. Ironwood is the inference version.
I. The 2021 thesis
In June 2021, Dylan Patel published a SemiAnalysis piece on Google’s Argos VPU. The framing was concrete. Google built a Video Coding Unit for YouTube-scale transcoding because VP9 reduced bandwidth and storage but was much harder for CPUs to encode. The TCO comparison favored a domain-specific chip. The article estimated the avoided Intel CPU purchases at roughly ten million units under its assumptions and walked through Google’s chip-to-cluster design from encoder cores up through PCIe cards, server nodes, clusters, and regions.1
I revisited that piece because the analytical move it made aged into the dominant pattern of cloud computing. The video number was striking. The structural argument was the real takeaway.
When a hyperscaler owns a workload that is huge, repetitive, expensive, and vertically controlled, general-purpose CPUs become economically wrong. The workload becomes a chip.
II. Why video became silicon
A creator uploads one video. Google has to turn it into many versions for many devices, networks, and quality levels. That fan-out is the workload that made Argos make sense.
Argos made sense because Google controlled both sides: the incoming video workload and the serving infrastructure. The chip’s job was not to be a great CPU. Its job was to be a great YouTube encoder.
Google did not build Argos because CPUs were bad. It built Argos because YouTube was too large for “good enough” CPUs to remain economically good enough.
III. The real lesson was system co-design
Argos was not just a chip. Google’s own research paper on Warehouse-Scale Video Acceleration frames it as co-design and deployment in the wild, emphasising balanced systems at data-center scale and co-design with distributed software systems.2 The chip would have been a curiosity without the software, the scheduler, the deployment fabric, and the fleet that surrounded it.
Hyperscaler silicon works when the company controls the workload, the software stack, the compiler and runtime, the scheduler, the deployment environment, fleet utilisation, data-center power and cooling, and the product requirements that define what good even means. Anyone missing two or three of those pieces ends up building a museum piece.
IV. The custom silicon flywheel
The deeper pattern in Argos was a repeatable cycle that Google has been running across multiple workloads ever since.
Own a workload
Massive, vertically controlled, growing.
Measure the wall
CPU / GPU cost, power, latency limits.
Build domain silicon
Designed for the specific job.
Deploy at scale
Warehouse-scale rollouts.
Co-design software
Compiler, runtime, scheduler.
Improve TCO & product
Cost, power, latency, quality.
Reuse as cloud lever
Sell capacity or experience.
Pick next workload
Repeat across video, CPU, AI, etc.
V. Argos was the video version. Ironwood is the AI version.
The same loop that produced Argos in 2021 produced Ironwood by 2026. Ironwood is Google’s seventh-generation TPU, framed as built for the age of inference. Google says the platform scales to 9,216 chips per pod and reaches 42.5 exaflops at the pod level, with 192 GiB HBM3E per chip and 7.4 TB/s peak HBM bandwidth. A full pod has 1.77 PB of directly accessible HBM.45
The shape of the bet is the same as Argos. Own the workload (inference for Gemini, Search, AdSense, YouTube ML, and Cloud TPU customers). Measure the wall (GPU pricing, power, and supply at hyperscaler scale). Build the domain chip. Co-design the compiler and the scheduler. Deploy across regions. Improve token economics. Reuse the capacity as cloud leverage. Move on to the next workload.
The question moved from “how do we encode YouTube cheaper?” to “how do we generate tokens cheaper?”
VI. Axion shows the CPU layer is also being rebuilt
The third turn of the flywheel is the boring layer: general-purpose CPUs. Google introduced Axion as its first custom Arm-based CPU for the data center, explicitly placing it in a long line of custom Google silicon that includes TPUs, VCUs, and Tensor. Google claims Axion offers up to 30% better performance than the fastest general-purpose Arm-based cloud instances and up to 50% better performance with up to 60% better energy efficiency than comparable current-generation x86-based instances.3
The point is not that CPUs vanish. The point is that even general-purpose compute is being customised by the largest cloud operators.
The cloud is not abandoning CPUs. It is making CPUs hyperscaler-specific.
VII. Google is not alone
The same pattern is visible across the other hyperscalers, each adapted to that company’s biggest internal workloads.
TPU family
Ironwood TPU
Tensor
Titan
VIII. Why merchant silicon still matters
This is not a story about Intel, AMD, and Nvidia disappearing. It is a story about where the biggest workloads sit, and who can justify the design cost to absorb them.
Broad markets & flexibility
- Workloads are broad across many customers.
- Software ecosystem and standards matter (CUDA, x86).
- Workloads change quickly over short cycles.
- Utilisation is uncertain for a given customer.
- Buyers don’t own enough scale to justify ASIC design cost.
- Time-to-market matters more than per-unit cost.
Single buyer, huge scale
- Workload is huge at one operator’s fleet.
- Workload is stable enough to design hardware around.
- Software and deployment are controlled end-to-end.
- TCO savings justify multi-year design cost.
- Power and latency are first-order constraints.
- Utilisation can be engineered by the operator.
Custom silicon does not replace merchant silicon everywhere. It eats the parts of the cloud where the workload is large enough to become its own market.
IX. Strategic pressure on Intel, AMD, and Nvidia
The right way to read this is not as a death blow to merchant vendors but as a quiet cap on their largest accounts. Intel feels it through fewer generic CPU cycles inside hyperscalers as Axion, Graviton, and Cobalt absorb general-purpose workloads. AMD competes well in CPUs and GPUs but faces the same hyperscaler internal pressure on its largest customers. Nvidia remains dominant in AI, but TPU, Trainium, Maia, and other custom accelerators reduce complete dependence on Nvidia inside the operators that control the most compute.
A custom chip can be worse in general and still better for the exact workload that matters.
X. The physical base layer
Custom silicon does not escape the physical world. Every Argos, TPU, Axion, Maia, and Trainium is fabricated, packaged, and powered using the same supply chain that builds Nvidia’s systems. TSMC’s 2025 annual report frames robust AI-related demand throughout 2025 and only mild recovery in non-AI markets, with advanced nodes and packaging investment as the structural drivers.12 ASML’s 2025 strategic report reinforces the same picture from the lithography side: AI requires leading-edge processor chips and a significant increase in DRAM compared with traditional compute architectures.13
Hyperscalers can design their own chips, but they still depend on the same physical semiconductor bottlenecks.
Quick terms
- ASIC
- Application-specific integrated circuit.
- VCU
- Video coding unit. An accelerator for video encoding and transcoding.
- TPU
- Tensor processing unit. Google’s custom AI accelerator family.
- CPU
- General-purpose processor.
- GPU
- Parallel accelerator, originally for graphics, now also for AI.
- TCO
- Total cost of ownership.
- Codec
- Format / algorithm used to compress and decompress video.
- H.264
- Widely supported video codec.
- VP9
- Google-backed codec, better compression than H.264 at higher compute cost.
- AV1
- Royalty-free codec with even better compression, higher compute complexity.
- Hyperscaler
- Large-scale cloud / platform operator (Google, AWS, Microsoft, Meta).
- Domain-specific
- A chip built for a narrow workload rather than general compute.
- Utilisation
- How much of the hardware is actually kept busy.
- HBM
- High-bandwidth memory.
- Exaflop
- One quintillion floating-point operations per second.
XI. What could break the thesis
A serious piece needs counterarguments. The custom silicon flywheel has plausible failure modes.
- Design cost. Custom silicon is expensive and slow to design, verify, and deploy. Cost overruns can crush ROI.
- Workload drift. Workloads can change faster than chips can be designed. A bad bet ages quickly.
- Merchant catch-up. GPUs and CPUs may improve enough to reduce the custom ROI gap, especially as Nvidia ships new generations.
- Software moat. Nvidia’s CUDA ecosystem remains hard to displace at the workload level, even when custom silicon is faster on paper.
- Scale floor. Smaller cloud providers cannot justify the design cost. The flywheel needs scale to spin.
- Utilisation risk. Underused custom silicon can be worse than flexible merchant silicon. The chip is only as good as the scheduler.
- Supply chain. Custom chips still compete for HBM, CoWoS, substrates, and advanced nodes. The base layer remains shared.
- External adoption. Internal chips may be great for internal workloads but unattractive to external cloud customers used to merchant ecosystems.
- Compiler / runtime maturity. A great chip without a great software toolchain is a museum piece.
- People. Hyperscaler silicon teams compete for the same chip-design talent as merchant vendors and each other.
The mistake is not believing in custom silicon. The mistake is believing every workload deserves a custom chip.
XII. The domain-specific cloud
Google Argos was not just a YouTube chip. It was a warning. At hyperscaler scale, a workload does not stay software forever. If it is big enough, expensive enough, stable enough, and strategically important enough, it becomes silicon.
That was true for video. It became true for AI. It is becoming true for CPUs. It will become true for networking, security, inference, search, recommendation, and media generation.
That is the custom silicon flywheel.
1 Patel, D. (Jun 2021). Google New Custom Silicon Replaces 10 Million Intel CPUs | Google Argos VPU. SemiAnalysis. Historical anchor for the Argos / VCU thesis, the 10-million Intel CPU estimate under the article’s assumptions, VP9 / H.264 workload framing, and chip-to-cluster co-design argument. Used as inspiration only. No content, structure, or charts reproduced.
2 Google Research. Warehouse-Scale Video Acceleration: Co-design and Deployment in the Wild. Official Google research framing of the VCU as a warehouse-scale video acceleration platform, with co-design across distributed software systems.
3 Google Cloud. Introducing Google’s new Arm-based CPU (Axion). Google framing of Axion as the company’s first Arm-based data-center CPU, situating it alongside TPUs, VCUs, and Tensor as part of a long line of custom Google silicon, with up to 30% better performance versus fastest general-purpose Arm-based cloud instances and up to 50% performance plus 60% energy-efficiency gains versus comparable current-generation x86 instances.
4 Google. Ironwood TPU: built for the age of inference. Ironwood as the 7th-generation TPU framed around inference workloads, with pod-scale headline numbers.
5 Google Cloud. Inside the Ironwood TPU and co-designed AI stack. 192 GiB HBM3E per chip, 7.4 TB/s peak HBM bandwidth, 9,216 chips per pod, 42.5 exaflops per pod, and 1.77 PB of directly accessible HBM per pod.
6 Meta. Meta Scalable Video Processor (MSVP). Meta’s first in-house ASIC for video processing, targeting video-on-demand and live-streaming workloads, with dedicated hardware framed as the best solution for compute power and efficiency at Meta scale.
7 AWS. AWS Trainium. Trainium as a purpose-built AI accelerator family for high-performance, cost-efficient training and inference.
8 AWS. AWS Inferentia. Inference-focused AWS custom silicon for deep learning and generative AI workloads.
9 AWS. AWS Silicon Innovation. Portfolio context for Graviton, Trainium, and Inferentia.
10 Microsoft. In-house chips: silicon to service. Microsoft Azure Maia AI Accelerator and Cobalt Arm CPU framed as part of a silicon-to-service systems approach.
11 Microsoft. Maia 200: the AI accelerator built for inference. Maia 200 framing as an inference-focused AI accelerator in the Azure custom silicon portfolio.
12 TSMC. 2025 Annual Report. Robust AI-related demand, mild non-AI recovery, advanced-node and packaging investment as structural drivers.
13 ASML (2025). 2025 Annual Report, strategic report section. AI requires leading-edge, high-performance processor chips and a significant increase in DRAM compared with traditional compute architectures.
- Nvidia Built the AI Factory Anyway. Companion essay on Nvidia’s vertically integrated CPU-GPU-DPU stack, the other end of the custom silicon story.
- Nvidia’s Earnings Quality Test. AI capex, customer concentration, and the durability of Nvidia’s AI revenue.
- The AI Memory Tax. How AI servers are repricing DRAM, NAND, and consumer electronics.
- The AI Memory Wall. DRAM, HBM, packaging, and semicap as the new center of computing.
- The Boring Back-End Boom. Why mature nodes, wirebonding, and packaging are becoming strategic again.
- The Density Illusion. Why Moore’s Law became a system problem.
- The Modem-to-Antenna War. Apple unbundling Qualcomm’s modem-RF stack.
- MediaTek and the Fragmented Compute War. A neutral fabless platform in a bifurcated compute world.
- The Dry Resist War. Patterning as a strategic process technology for AI-era chipmaking.
- The AI Field Manual. Reference layer for the AI stack: hardware, memory, models, agents, safety, economics.
This is Essay No. 021. The topics: intelligence, AI, systems, knowledge, and the questions underneath the questions everyone else is asking. If you read this far and disagreed with any part of it, write to me. I read everything.