Essay No. 061 · Advanced Packaging & AI Scaling
The Package Became the Computer. Original analysis Not investment advice
In 2021, advanced packaging looked like a way to escape pad-limited designs and chiplet economics. In 2026, the bigger story is clear: AI scaling now depends on CoWoS, HBM, SoIC, hybrid bonding, die-to-die IO, thermals and packaging capacity as much as transistor scaling.
The package is no longer the thing around the chip. The package is the computer. Moore's Law did not disappear; it moved into the package.
For decades, the semiconductor industry had a simple story.
Shrink the transistor, put more transistors on the chip, improve performance, reduce power, lower cost per transistor, repeat. That story is not dead, but it is no longer enough. The modern AI chip is no longer just a die. It is a packaged system of logic dies, HBM stacks, interposers, bridges, substrates, power delivery, thermal solutions, die-to-die IO, and, increasingly, optical IO.
The package became the computer.
The package became the computer.
Section 01 What the 2021 advanced packaging article got right
The 2021 SemiAnalysis piece on advanced packaging is the historical anchor for this essay[1]. It argued that the industry was moving toward advanced packaging because transistor scaling alone was no longer enough, and that real chip density had not improved as quickly as transistor density because chips were increasingly limited by IO, SRAM scaling, power delivery, and heat. Datacenter network bandwidth scaling and IO switch limits framed the problem on page 3, with transistor density historically improving much faster than IO data rates, which were doubling roughly every four years.
The piece walked through the packaging baseline. Traditional flip-chip packaging used bump pitches around 150 to 200 microns. TSMC N7 fine-pitch flip chip improved to around 130 microns. Intel 10 nm improved to around 100 microns. The article argued these packaging improvements were tiny relative to transistor count increases. The core-limited versus pad-limited design diagram on page 7, the AMD "while costs continue to increase" chart on page 18, and the chiplet yield-economics table on page 23 comparing AMD Milan and Intel Ice Lake all reinforced the same point: the cost and IO walls were rising faster than transistor density could compensate, and advanced packaging was the escape route[1].
- Real chip density limited by IO, SRAM scaling, power delivery, and heat.
- Transistor density improved much faster than IO data rates (~doubling every 4 years).
- Traditional flip-chip bump pitches around 150-200 microns.
- TSMC N7 fine-pitch flip chip ~130 microns; Intel 10 nm ~100 microns.
- Core-limited vs pad-limited design diagram on page 7.
- Rising yielded cost per area at advanced nodes (AMD chart, page 18).
- Chiplet yield economics table comparing AMD Milan and Intel Ice Lake (page 23).
- Fan-out at ~90-60 micron pitch; 2.5D at ~55-50 micron pitch.
- AMD 3D V-Cache at ~17 micron pitch; Sony image sensors reaching ~6.3 micron pitch.
- HBM described as a step-function increase in memory bandwidth, designed for in-package co-location.
The 2021 article was right that advanced packaging was not a back-end footnote. It was the escape route from IO limits, pad limits, yield limits, and rising die costs.
Section 02 Pad-limited designs, the quiet bottleneck
In a core-limited design, the logic core determines die size. In a pad-limited design, IO pads determine die size. Even when the logic shrinks on a new process node, the chip cannot shrink as much because it still needs physical connections to the outside world. The 2021 piece called pad-limited designs frequent and explained why moving older chips to newer nodes does not always work economically: IO pads can stop the die from shrinking enough to justify the cost of the move[1].
Logic decides the die
Gate count and routing determine die size. The chip can shrink on a new node because the IO interface fits comfortably in the available perimeter.
IO decides the die
IO pad count determines die size even after the core shrinks. Moving to a new node saves logic area but leaves the chip held open by the pad ring, often killing the economics of the shrink.
The transistor can shrink, but the chip still has to talk to the world.
Section 03 IO became the real scaling wall
Computation is useless without data movement. AI accelerators need HBM bandwidth. GPUs need memory and interconnect. CPUs need cache and memory. Networking chips need huge IO. Chiplets need dense die-to-die links. The package becomes the place where those connections happen.
The 2021 piece framed it bluntly. IO is the lifeblood of computation. Bringing memory on-die helps, but is limited. Chips need more points of communication to keep up. The last major step-change in package IO was flip-chip packaging in the 1990s, more than two decades before this conversation became urgent[1]. Modern chips do not fail only because they lack arithmetic units. They fail because data movement becomes too expensive.
Section 04 More cache and heterogeneous compute were partial answers
One answer to IO limits is to keep more data on chip. AMD used Infinity Cache to reduce memory bandwidth requirements. Apple used large on-chip caches and specialized blocks. Heterogeneous compute added accelerators for specific workloads, with Google TPU, Amazon custom silicon, and Intel's heterogeneous taxonomy all pointing in the same direction[1].
Cache and accelerators reduce data movement, but they do not eliminate the need for dense packaging. They postpone the IO problem. They do not solve it.
Section 05 Why giant monolithic dies hit the wall
Another answer is to make the chip bigger. But reticle limits, yield, and cost stop that path. The 2021 piece noted that Nvidia and Intel datacenter lineups had been near reticle limit for more than five years, that die shrinks had slowed, that chip sizes could not grow much larger, and that designs were becoming pad-limited. The accompanying charts showed Moore's Law slowing, yielded cost per area rising at advanced nodes, and N5 wafer cost above N7 with the reversal in cost-per-transistor trends[1].
The monolithic die became too expensive, too risky, and too hard to feed with data.
Section 06 Chiplets changed the yield equation
A defect on a large die can destroy a huge amount of silicon. Splitting the design into smaller chiplets improves yield economics because smaller dies are less likely to contain defects, and chiplets let companies reuse designs across multiple products. The 2021 piece illustrated this with chiplet yield economics comparing AMD Milan to Intel Ice Lake and argued that AMD could use a small number of chiplet designs across a large portion of the market[1].
Chiplets are not only a technical architecture. They are an economic response to rising defect risk.
Section 07 But chiplets create a new IO problem
Chiplets solve yield and reuse problems, but they create more die-to-die communication. If the interconnect between chiplets is too slow, too power-hungry, or too large, the chiplet architecture loses its advantage. The 2021 piece called this directly: chiplets are great but not sufficient in isolation, and chiplets can become pad-limited because they need more IO to interface with other chips. The piece asked "what's the solution?" and answered with advanced packaging[1].
Chiplets make advanced packaging necessary.
Section 08 Fan-out, 2.5D and 3D, the packaging ladder
Advanced packaging is best understood as a ladder. Fan-out takes a die and places it on a reconstituted wafer or panel, spreading IO outward and enabling denser bumps than traditional flip chip; the 2021 piece pegged fan-out at roughly 90 to 60 micron pitch with around 8x higher bump density versus traditional flip chip. 2.5D places active chips on a passive silicon interposer or bridge with no transistors, mostly routing, and is the basis for HBM-attached high-end GPUs. The 2021 piece put 2.5D at roughly 55 to 50 micron pitch with around 16x higher bump density. 3D stacks active dies vertically using TSVs and bonding; AMD 3D V-Cache used 17 micron pitch (~138x higher IO density versus standard flip chip), and certain Sony image sensors had reached 6.3 micron pitch (~567x higher IO density)[1].
Each rung in this ladder buys more IO density per unit of package area. None of these rungs replaces the others; modern AI accelerators use several of them in the same package, with different rungs solving different problems.
Section 09 CoWoS became AI infrastructure
The TSMC 2026 Technology Symposium made the AI-packaging linkage explicit. TSMC is producing 5.5-reticle-size CoWoS and plans 14-reticle-size CoWoS in 2028, with the larger version expected to integrate around 10 large compute dies and 20 HBM stacks. TSMC is extending SoIC, with A14-to-A14 SoIC targeted for production in 2029 at 1.8x higher die-to-die IO density than N2-on-N2 SoIC, and the COUPE co-packaged optics solution is moving toward production[2]. TSMC's official 3DFabric portfolio frames this stack as a system-level design and mini-chip integration capability built around SoIC, CoWoS, and InFO[3].
This is not normal packaging. This is package-scale system construction. A future AI accelerator is a compute-memory-interconnect system built inside the package. CoWoS became one of the most important words in AI infrastructure because AI chips are memory-bandwidth machines.
Section 10 HBM forced the package to become the system
HBM gives huge bandwidth by placing stacked DRAM close to compute inside the package. HBM works because it uses a very wide bus over short distances, which creates enormous IO density requirements. The 2021 piece described HBM as a step-function increase in memory bandwidth above traditional DRAM, designed from the ground up to be co-located in the same package[1]. The current AI generation has run with that premise: Nvidia GPUs use CoWoS with HBM, AMD MI300 integrates CPU chiplets, GPU chiplets, and HBM, and TSMC's 14-reticle CoWoS plan calls for around 20 HBM stacks per package[2].
Without advanced packaging, HBM is not an AI scaling layer. It is just memory with nowhere close enough to go.
Compute dies
HBM stacks
Interposer or bridge
Substrate
Power delivery
Thermal solution
Die-to-die IO
Co-packaged optics
Section 11 AMD MI300, the thesis in product form
AMD's MI300A is the thesis in product form. AMD describes MI300A as a 3D chiplet design that integrates Zen 4 CPU chiplets, CDNA 3 GPU chiplets, and HBM in one package[6]. The AMD CDNA 3 white paper describes MI300 as integrating accelerator complex dies, IO dies, and HBM stacks into a heterogeneous package, with the architecture treating the package as a system rather than a substrate[7].
MI300 is not just a GPU. It is a heterogeneous compute system packaged together. That is exactly where the industry is going.
Section 12 Intel EMIB and Foveros, another route to the same destination
Intel describes EMIB and Foveros as advanced packaging technologies for heterogeneous systems[9]. EMIB provides high-density die-to-die interconnect through embedded bridges, and Foveros enables 3D stacking. Intel's Data Center GPU Max used many active tiles and multiple process nodes inside a single product, framing the package as the unit of design.
Intel's packaging route differs from TSMC's, but the direction is the same. The industry is building systems out of many dies. The packaging approaches differ. The destination is the same: package-scale computers.
Section 13 ASML's framing, 2D plus 3D
ASML's 2026 AGM material makes the macro framing explicit. ASML says AI compute demand has outpaced Moore's Law, that Moore's Law alone is not sufficient to meet future AI training compute requirements, and that future scaling depends on 2D scaling plus 3D integration, with advanced packaging exposure tools and 3D applications included in the equipment roadmap[4].
Even the lithography ecosystem now frames scaling as more than shrinking transistors. EUV still matters, but the system also needs 3D integration and advanced packaging. The scaling roadmap became two-dimensional and three-dimensional at the same time.
Section 14 Nvidia and the CoWoS bottleneck
Reuters reported in 2025 that Nvidia's advanced packaging needs at TSMC are changing, not disappearing, with Blackwell using more CoWoS-L while Hopper continues to use CoWoS-S, and with packaging continuing to act as a bottleneck even as capacity improved[5]. The reporting is careful, and the supply detail is dynamic, but the larger point is durable.
AI bottlenecks are not only wafer starts. They are also HBM supply, packaging capacity, substrates, thermal solutions, and assembly and test throughput. The AI supply chain bottleneck moved from fabs into packages.
The old package connected one chip to a board. The new package connects many chips to each other, to HBM, to power and eventually to optics.
Section 15 Packaging became geopolitical
Reuters reported in April 2026 that TSMC plans CoWoS and 3D-IC capability in Arizona before 2029[10]. That matters because logic wafers made in the United States still need advanced packaging to become AI chips. Without packaging, onshore wafer production is incomplete.
Semiconductor sovereignty is not only about fabs. It is also about advanced packaging capacity. A wafer fab without advanced packaging is not the full AI chip supply chain.
Section 16 What people got wrong in 2021
The weak interpretation in 2021 was that advanced packaging was a back-end manufacturing detail. The better interpretation is that advanced packaging is the new scaling layer. Packaging determines memory bandwidth, die-to-die IO, power delivery, package size, thermal limits, chiplet economics, reticle-limit escape, heterogeneous integration, and ultimately the system size of an AI accelerator.
Escape route from pad limits
Advanced packaging solves pad limits, chiplet yield economics, and IO bottlenecks. Fan-out, 2.5D, and 3D are tools in a back-end ladder. CoWoS, HBM, and 3D V-Cache are early proof points.
The main AI scaling layer
Advanced packaging became the main AI scaling layer across CoWoS, HBM, SoIC, hybrid bonding, thermals, substrates, and packaging capacity. The package is part of the architecture, not just the back end.
Monolithic die
Chiplets
Fan-out
2.5D interposer
HBM in-package
3D stacking
SoIC or hybrid bonding
Co-packaged optics
Flip chip becomes a major step-change in package IO
The last major package-level IO transition before the modern advanced-packaging cycle[1].
Fan-out and 2.5D scale in mobile and HPC
InFO-class fan-out and CoWoS-class 2.5D move advanced packaging from labs to production.
Sony image sensors demonstrate very dense 3D TSV integration
Sony reaches ~6.3 micron pitch in TSV-based stacks, proving extreme IO density is feasible[1].
AMD chiplets and 3D V-Cache prove the economics
Chiplet yield economics and 3D V-Cache at ~17 micron pitch demonstrate the path[8].
SemiAnalysis frames advanced packaging as the escape route
Pad limits, IO walls, and yield economics push the industry toward a packaging-led design renaissance[1].
AMD MI300 shows heterogeneous CPU, GPU and HBM integration
A single package combines CPU chiplets, GPU chiplets, and HBM as a coherent compute system[6][7].
CoWoS capacity becomes an AI bottleneck
AI demand exceeds packaging capacity even as fab capacity expands; packaging joins HBM as a supply gate[5].
TSMC produces 5.5-reticle CoWoS and outlines 14-reticle CoWoS
Roadmap commits the industry to package-scale system construction[2].
14-reticle CoWoS with ~10 compute dies and 20 HBM stacks
TSMC's stated target makes the package the integration unit of the future AI accelerator[2].
TSMC targets A14-to-A14 SoIC production at 1.8x higher die-to-die IO density
The 3D path continues to scale die-to-die bandwidth inside the package[2].
Section 17 Risks and limits
The argument above blends 2021 SemiAnalysis history with TSMC, AMD, Intel, ASML, and Reuters materials. It is worth being explicit about where the case can break.
Advanced packaging does not replace leading-edge transistor scaling; it amplifies it.
CoWoS is not the only packaging technology; Intel EMIB and Foveros, Samsung 2.5D, and others matter.
TSMC company roadmap claims are targets, not guaranteed outcomes.
Packaging capacity is hard to expand quickly because it depends on equipment, substrates, HBM, yield, thermal design, assembly, and test.
HBM supply and packaging supply are related but separate bottlenecks and can move independently.
Hybrid bonding is powerful but difficult to manufacture at scale.
Chiplets create software, interconnect, validation, and yield-management challenges that take years to absorb.
More dies in one package can improve die-level yield economics but complicate package-level yield and test.
Thermal density can become the limiting factor even when IO improves.
This essay is industry analysis, not investment advice; the packaging ecosystem can reorder quickly with one supply or yield surprise.
The point is not that packaging solves every scaling problem. The point is that scaling no longer works without it.
Section 18 Final verdict
The 2021 article was right. The industry was already running into pad limits, IO limits, yield limits, reticle limits, and cost-per-transistor limits. Chiplets helped. Fan-out helped. 2.5D helped. 3D stacking helped. But in 2026, AI turned advanced packaging from an engineering workaround into the center of the hardware roadmap. The package now connects logic, memory, IO, power, and cooling into one system. The package is no longer around the computer. The package is the computer.
Moore's Law did not disappear. It moved into the package.
Section 19 Evidence ledger and source notes
| Source | Claim | Why it matters |
|---|---|---|
| SemiAnalysis (2021) | IO scaling lags transistor density; flip chip 150-200 microns; N7 ~130, Intel 10 nm ~100; fan-out ~90-60; 2.5D ~55-50; 3D V-Cache ~17 micron; Sony ~6.3 micron. | Anchors the historical pad-limit and IO-density argument. |
| TSMC 2026 Symposium | 5.5-reticle CoWoS in production; 14-reticle CoWoS by 2028 with ~10 compute dies and 20 HBM stacks; A14-to-A14 SoIC 2029 at 1.8x IO density vs N2-on-N2; COUPE CPO toward production. | Commits the industry to package-scale system construction. |
| TSMC 3DFabric | 3DFabric portfolio includes SoIC, CoWoS, and InFO, framed as system-level design and mini-chip integration. | TSMC's official packaging taxonomy. |
| ASML 2026 AGM | AI compute demand outpaces Moore's Law; future scaling needs 2D scaling plus 3D integration, including advanced packaging. | Macro framing from the lithography ecosystem. |
| Reuters (Jan 2025) | Nvidia's advanced packaging needs are changing; Blackwell uses more CoWoS-L; Hopper continues CoWoS-S; packaging remained a bottleneck. | Real-world bottleneck context for AI packaging. |
| AMD MI300 blog | MI300A is a 3D chiplet design integrating Zen 4 CPU chiplets, CDNA 3 GPU chiplets, and HBM in one package. | Product proof of heterogeneous package-level compute. |
| AMD CDNA 3 white paper | MI300 integrates accelerator complex dies, IO dies, and HBM stacks into a heterogeneous package. | Technical support for the package-as-system framing. |
| AMD 3D V-Cache | Cache stacked on logic at ~17 micron pitch. | 3D stacking and cache-on-logic proof. |
| Intel Advanced Packaging | EMIB high-density die-to-die bridges; Foveros 3D stacking; heterogeneous tile-based products. | Bridge-based and 3D alternatives to TSMC's ecosystem. |
| Reuters (Apr 2026) | TSMC plans CoWoS and 3D-IC capability in Arizona before 2029. | Packaging is now part of semiconductor sovereignty. |
Footnotes & sources
- SemiAnalysis, “Advanced Packaging Part 1 — Pad Limited Designs, Breakdown Of Economic Semiconductor Scaling, Heterogeneous Compute, and Chiplets,” 2021 (PDF supplied by author). Source for the IO-vs-transistor density framing, the 150-200 micron flip-chip baseline, ~130 micron N7 and ~100 micron Intel 10 nm, the core-limited vs pad-limited diagram, the AMD yielded-cost-per-area chart, the AMD Milan vs Intel Ice Lake chiplet yield table, the fan-out 90-60 micron class, 2.5D 55-50 micron class, AMD 3D V-Cache 17 micron pitch and ~138x IO density, and Sony image sensors at 6.3 micron pitch and ~567x IO density.
- TSMC, “TSMC 2026 Technology Symposium,” pr.tsmc.com/english/news/3302. Source for 5.5-reticle CoWoS in production today, 14-reticle CoWoS planned by 2028 with around 10 compute dies and 20 HBM stacks, SoIC extension, A14-to-A14 SoIC production target in 2029 with 1.8x higher die-to-die IO density vs N2-on-N2 SoIC, and COUPE co-packaged optics moving toward production.
- TSMC, “Advanced Packaging Services,” tsmc.com/…/services/advanced-packaging. Source for the 3DFabric portfolio including SoIC, CoWoS, and InFO, and the framing of advanced packaging as enabling system-level design and mini-chip integration.
- ASML, 2026 AGM Presentation, ourbrand.asml.com/…/2026_-AGM-_presentation.pdf. Source for the statement that AI compute demand has outpaced Moore's Law, that Moore's Law alone is not sufficient to meet future training compute requirements, and that future scaling combines 2D scaling with 3D integration including advanced packaging.
- Reuters, “Nvidia CEO Says Its Advanced Packaging Technology Needs Are Changing,” January 2025, reuters.com. Source for Nvidia's evolving advanced packaging needs at TSMC, Blackwell using more CoWoS-L, Hopper continuing CoWoS-S, and packaging remaining a bottleneck even as capacity improved. Treated as reporting on Nvidia and TSMC dynamics, not as a static supply claim.
- AMD, “Introducing the AMD Instinct MI300 Series Accelerators,” amd.com/en/blogs/2023/…. Source for MI300A as a 3D chiplet design and for the integration of Zen 4 CPU chiplets, CDNA 3 GPU chiplets, and HBM in one package.
- AMD, “AMD CDNA 3 White Paper,” amd.com/…/amd-cdna-3-white-paper.pdf. Source for the description of MI300 integrating accelerator complex dies, IO dies, and HBM stacks into a heterogeneous package.
- AMD, “3D V-Cache Technology,” amd.com/…/3d-v-cache. Source for AMD's 3D V-Cache as a 3D stacking and cache-on-logic proof point. Used with care; exact pitch and bonding details are taken from the 2021 SemiAnalysis article.
- Intel, “Intel Advanced Packaging,” intel.com/…/foundry/packaging. Source for EMIB as a high-density die-to-die interconnect via embedded bridges, Foveros for 3D stacking, and the framing of Intel's heterogeneous tile-based products.
- Reuters, “TSMC Plans to Open Chip Packaging Plant in Arizona by 2029,” April 2026, reuters.com. Source for the plan to bring CoWoS and 3D-IC packaging capability to Arizona before 2029 as part of the U.S. advanced-packaging footprint.