← Back to blog
Essay No. 012  ·  AI Infrastructure  ·  Melbourne, Australia
AI Infrastructure Semiconductors DRAM HBM Memory Semicap TSMC ASML Micron Samsung SK hynix Data Centers

The AI Memory Wall.Original analysis

Why DRAM, HBM, packaging, and semicap became the new center of computing.
PM
Pugalenthi Magendran
March 2026  ·  Melbourne, Australia
12 min read

The AI boom is usually described as a GPU shortage. That is only half true. The deeper bottleneck is memory: bandwidth, capacity, power, packaging, and the equipment needed to manufacture it.

A GPU without enough memory bandwidth is like a supercar with a blocked fuel line. The engine can be powerful, but the system cannot feed it fast enough. As models grow, inference becomes always-on infrastructure, context windows expand, agents call tools repeatedly, and data centers become power constrained, the bottleneck shifts from pure compute to the full memory system around compute.

This is the AI memory wall.

Key idea

The AI race is described as a race for GPUs. The deeper race is for memory bandwidth, advanced packaging, lithography, and the equipment that produces all three. Moore’s Law did not die. It fragmented across the stack.


I. The 2020 thesis still holds. AI multiplied it.

In May 2020, Dylan Patel published a piece on SemiAnalysis arguing that Moore’s Law was effectively over for DRAM, and that this was excellent news for semiconductor capital equipment makers.1 The argument was tidy. Cell shrink in DRAM had slowed. Each new node was buying less density than the previous one. Yet demand for memory was not slowing. The math left only one place for the missing bits to come from: more wafers, more cleanroom space, more equipment.

I revisited that piece because it has aged better than almost any other infrastructure call from that period. AI did not just confirm it. AI multiplied it. Memory went from a background commodity into a strategic bottleneck inside two years.

2020 thesis

When DRAM density scaling slows, bit growth becomes more dependent on wafers, fabs, cleanroom space, and semiconductor capital equipment.

The thesis was conservative in 2020. It assumed steady AI demand. The actual 2024 to 2026 cycle delivered an AI capex wave large enough to reorder the priorities of every memory maker, every foundry, and every advanced packaging line on the planet.


II. From shrink economics to system economics

The old semiconductor story was simple. Shrink transistors. Increase density. Reduce cost per function. Repeat every two years. Most of the industry, most of the writing about the industry, and most of the strategy work around the industry assumed this would continue.

The new AI infrastructure story has more parts. The accelerator still matters. So does the memory next to it. So does the package that connects them. So does the power that feeds the rack. So does the equipment that produces every layer below. The economics moved from one variable to a system.

Old scaling model

Cheaper transistors

Logic shrink → better compute → lower cost per transistor. One axis. One target.
New AI infrastructure model

Feeding compute

Compute + memory bandwidth + packaging + power + interconnect + equipment capacity. Many axes. One system goal: tokens per second per watt per dollar.

III. Why DRAM scaling is hard

DRAM stores each bit as a small amount of charge on a capacitor next to a transistor. Shrink the cell and you fight five forces at once. Leakage gets worse because thinner dielectrics let charge escape. Refresh gets more expensive because cells must be re-energized more often to avoid losing data. Noise margins shrink. Reliability gets harder because every process variation matters more at small dimensions. Yield gets harder because more masks, more steps, and more EUV exposures stack up.

Density gains are still happening. They are just incremental, expensive, and capital intensive. Each new node consumes more equipment per bit than the last. That is the missing context behind most coverage of the memory cycle: bit growth has become a wafer story, not a node story.


IV. AI exposed the memory wall

An AI accelerator can perform trillions of operations per second. To do that, it needs to move model weights, activations, embeddings, and KV cache through its memory system. When the calculation is fast and the data movement is slow, raw FLOPS stop mattering. The bottleneck becomes bandwidth and power per bit moved. This is not new physics. It is what every systems engineer has tried to explain for years. AI just made the cost of getting it wrong large enough to show up on a balance sheet.

Inference economics makes this worse than training. Training is a finite project. Inference is a recurring tax on every product. Every additional user, every longer context, every retrieval, every multimodal input, every tool call from an agent: each one is a memory event.

The future of AI infrastructure is not only about how fast we can calculate. It is about how cheaply and reliably we can move data into the calculation.

V. HBM is not just better DRAM

HBM stacks DRAM dies vertically, wires them together through silicon, and places the resulting tower millimeters away from the accelerator on an advanced package. The result is a wide, short, parallel memory path that no DDR5 module on a motherboard can match. The product is not faster DRAM. It is a different topology.

The three HBM4 leaders entering 2026 made the topology jump explicit:

  • Samsung began commercial HBM4 shipment with a 4nm logic base die, a transfer speed of 11.7 Gbps and up to 13 Gbps in flexible deployments, and a stack bandwidth up to 3.3 TB/s.7
  • Micron went into high-volume production of 36 GB 12-high HBM4 designed into NVIDIA’s Vera Rubin platform, citing more than 2.8 TB/s per stack and 20% better power efficiency than its prior generation.5
  • SK hynix completed HBM4 development with 2,048 I/O terminals, more than 10 Gbps operating speed, and more than a 40% power efficiency improvement, and prepared for mass production.9
  • Samsung and AMD expanded their collaboration so HBM4 sits inside the AMD Instinct MI455X accelerator and DDR5 sits inside EPYC at rack scale, with new partnerships announced for next-generation AI memory.8

None of those numbers come free. Each requires advanced packaging, through-silicon vias, base die logic, thermal management, careful testing, and yield learning that took years to build. HBM is not a memory chip. It is a system integration product.

Diagram 01 · The HBM topology
HBMstack 1
HBMstack 2
HBMstack 3
HBMstack 4
Accelerator
GPU / AI ASIC
Compute die on advanced package
HBMstack 5
HBMstack 6
HBMstack 7
HBMstack 8
Interposer · CoWoS / advanced packaging substrate
Bandwidth comes from proximity, parallelism, and packaging. The memory does not sit on a board. It sits on the substrate, next to the accelerator, behind thousands of short, wide connections.

VI. Packaging became part of Moore’s Law

For decades, Moore’s Law felt like a transistor story. Smaller, denser, cheaper, with the action happening at the wafer. In AI infrastructure, the action moved into the package and into the system.

TSMC’s CoWoS family integrates the logic die and the HBM stacks onto an interposer that handles the high-bandwidth wiring between them. CoWoS-L extends the approach with larger interposers and more advanced interconnects for HPC products. TSMC has also pushed System-on-Wafer work that integrates logic and HBM at wafer scale for data center compute, blurring the line between packaging and silicon.6

The implication is structural. The new scaling stack is no longer just “the chip.” It is a layered system where the bottleneck can be anywhere from the model down to the grid.

Diagram 02 · The new scaling stack
08
Application / model
Demand
07
Inference serving
Token econ
06
Accelerator (GPU / ASIC)
Compute
05
HBM & high-bandwidth memory
Bandwidth
04
CoWoS / advanced packaging
Integration
03
Lithography & process (EUV)
Print
02
Semicap tools (WFE, packaging, test)
Production
01
Data center power, cooling, grid
Energy
The bottleneck can sit anywhere in this stack. In 2026 it sits mostly in layers 03 to 05: HBM, packaging, and the equipment that produces them.

VII. Semicap is the hidden beneficiary

If memory demand rises faster than density scaling, the equipment industry has to absorb the difference. Every missing bit of density has to be repaid in wafers, process steps, lithography exposures, deposition cycles, etch passes, metrology checks, packaging operations, and tester time. That is exactly the structural picture the 2020 thesis predicted, just larger.

Three concrete signals from the equipment side:

  • SEMI projects DRAM equipment sales to grow 15.4% to $22.5 billion in 2025, then 15.1% in 2026, then 7.8% in 2027, driven by HBM ramps and advanced DRAM nodes for AI and data center demand.3
  • ASML expects DRAM EUV lithography spending to compound at 15% to 25% per year from 2025 through 2030, citing AI-driven advanced logic and DRAM as the structural driver.2
  • Applied Materials extended partnerships with Micron and SK hynix on next-generation DRAM, HBM, NAND, materials, process integration, and 3D packaging through its EPIC Center.1011
Chart 01 · DRAM equipment growth, SEMI projections
2025
+15.4%
2026
+15.1%
2027
+7.8%
Year-over-year DRAM equipment sales growth, three consecutive years of double-digit and high single-digit expansion driven by HBM and advanced nodes. Source: SEMI World Fab Forecast and equipment outlook releases.3

If you take the 2025 baseline at $22.5 billion and compound the projected growth, the run rate by the end of 2027 is meaningfully above $28 billion in DRAM equipment alone. That is one slice of the broader WFE market. The same logic applies, with different coefficients, to advanced logic equipment, packaging tools, EUV lithography, and test.


VIII. Inference makes this worse

Training is episodic. A frontier lab spends a fixed amount of capital, trains a model, and writes off the cost over the model’s useful life. Inference is the opposite. It is continuous. Once AI is embedded into search, coding, agents, customer support, enterprise software, robots, and scientific workflows, every generated token becomes an infrastructure event.

Each lever pushes in the same direction:

  • More users means more inference per second.
  • Longer context windows mean larger KV caches per request.
  • Agents mean more repeated calls per task.
  • Retrieval-augmented generation means more storage and more memory pressure.
  • Multimodal models mean larger inputs and larger activations.
  • Real-time systems mean latency pressure, which favors larger memory and faster bandwidth over deeper batching.
  • Data center power is the hard cap on top of all of this.

Micron made this explicit in its FY2026 Q2 prepared remarks: AI is driving DRAM and NAND data center bit TAM above 50% of total industry TAM for the first time in calendar 2026, with AI and traditional server demand running into inadequate DRAM and NAND supply, and with HBM, LPDRAM, DDR, and SSDs increasingly designed in as a portfolio for AI inference architectures optimized around token economics.4

When more than half of the data center memory market is being shaped by AI workloads, memory is no longer a commodity input. It is part of the product.


IX. Who benefits

This is a value-chain map, not investment advice. The point is to show that the AI boom is not one company’s story. It is a stack story, and the value spreads across that stack in proportion to how scarce each layer becomes.

Diagram 03 · The value chain, layer by layer
Memory makers
Micron, Samsung, SK hynix. They turn the HBM topology into a product line.
Foundries & packaging
TSMC, Samsung Foundry. CoWoS and System-on-Wafer integration sit here.
Lithography
ASML. EUV intensity rises for both advanced logic and advanced DRAM.
Wafer fab equipment
Applied Materials, Lam Research, Tokyo Electron, KLA. Deposition, etch, metrology.
Packaging & test
ASE, Amkor, Teradyne, Advantest. HBM bring-up, KGD test, and final packaging.
AI accelerators
Nvidia, AMD, custom ASIC teams at hyperscalers and startups.
Data center operators
Microsoft, Google, Amazon, Meta, Oracle, CoreWeave and the new build-to-suit AI data center class.
Power & cooling
Liquid cooling, electrical distribution, grid capacity. Often the rate-limiting step.
Each box can become a bottleneck on its own. The AI memory wall is not a single shortage. It is a shifting list of which layer is tightest this quarter.

X. What could break the thesis

A serious piece needs counterarguments. The case for memory as the new center of computing has several honest failure modes.

  • AI capex could slow. If frontier training plateaus or unit economics disappoint, HBM and packaging capacity that was built for a 2027 demand curve could meet a 2026 demand curve. Memory is famously cyclical. Every up-cycle has been followed by inventory pain.
  • Models could become more memory-efficient. Quantization, sparsity, mixture-of-experts routing, better KV cache management, and learned compression could reduce memory per token. Some of this is already happening.
  • Custom ASICs could rewire the architecture. If hyperscaler chips converge on different memory topologies, the HBM cluster that benefits today may face design changes that shift demand to other forms of memory.
  • Attention could change. More efficient attention mechanisms or new architectures could reduce KV cache pressure, which is one of the largest drivers of HBM scaling.
  • Geopolitics could distort demand. Export controls, on-shoring incentives, and regional fab build-outs can pull supply and demand apart in ways that obscure the underlying trend.
  • Packaging bottlenecks could resolve. CoWoS capacity has been the headline constraint for two years. If it eases, the chain rebalances and another link becomes the bottleneck.

None of these are reasons to ignore the thesis. They are reasons to hold it loosely and watch the signals.


XI. The final thesis

The old world was about cheaper transistors. The new world is about feeding compute.

Moore’s Law did not simply die. It fragmented. Part of it moved into EUV. Part of it moved into HBM. Part of it moved into CoWoS. Part of it moved into power delivery. Part of it moved into semicap. Part of it moved into the economics of every generated token.

The AI memory wall is not one shortage. It is a permanent reshuffling of where value sits in the computing stack.

That is why memory may be one of the most important infrastructure stories of 2026. Not the chips on the headlines. The bandwidth, packaging, lithography, and equipment under them. If you want to understand where the AI economy actually runs, look two layers below the GPU and one layer above the grid.


1 Patel, D. (May 2020). Moore’s Law is Dead for DRAM, and That is Great for SemiCap. SemiAnalysis. Historical anchor for the thesis that slower DRAM density scaling shifts bit growth onto wafer capacity and semiconductor capital equipment. Used as inspiration only. Original work, no charts or wording copied.

2 ASML (2025). 2025 Annual Report, financial performance section. AI-driven advanced logic and DRAM, DRAM EUV lithography spending CAGR of 15% to 25% from 2025 to 2030, lithography intensity discussion.

3 SEMI (2025). Global Semiconductor Equipment Sales Projected to Reach a Record of $156 Billion in 2027. Source for the global equipment forecast and the DRAM equipment growth path of 15.4% to $22.5 billion in 2025, 15.1% in 2026, and 7.8% in 2027, with HBM and AI/data center demand cited as the drivers.

4 Micron Technology (2026). FY2026 Q2 prepared remarks. AI demand driving DRAM and NAND data center bit TAM above 50% of total industry TAM in calendar 2026; AI and traditional server demand constrained by inadequate DRAM and NAND supply; HBM, LPDRAM, DDR DRAM, and SSDs framed as a portfolio for AI inference architectures optimized for token economics.

5 Micron Technology (2026). Micron Begins High-Volume Production of HBM4 Designed into NVIDIA Vera Rubin. 36 GB 12-high HBM4, more than 2.8 TB/s bandwidth per stack, 20% better power efficiency than the previous generation.

6 TSMC. Annual Reports. CoWoS packaging integrating SoCs and HBM stacks, CoWoS-L for large HPC products using advanced interposer and interconnect approaches, and System-on-Wafer work that integrates logic and HBM at wafer scale for data center computing.

7 Samsung Electronics (2025). Samsung Ships Industry-First Commercial HBM4 with Ultimate Performance for AI Computing. Mass production HBM4 at 11.7 Gbps transfer speed, up to 13 Gbps configurations, up to 3.3 TB/s per stack, on a 4nm logic base die.

8 Samsung Electronics (2026). Samsung and AMD Expand Strategic Collaboration on Next-Generation AI Memory Solutions. HBM4 for the AMD Instinct MI455X, DDR5 for EPYC, rack-scale AI platform memory needs.

9 SK hynix (2025). SK hynix Completes World’s First HBM4 Development and Readies Mass Production. 2,048 I/O terminals, more than 10 Gbps operating speed, more than 40% power efficiency improvement, mass production readiness.

10 Applied Materials (2026). Applied Materials and Micron Partner to Advance U.S. Innovation in Next-Generation AI Memory. Next-generation DRAM, HBM, NAND research and development, EPIC Center collaboration, AI memory process innovation.

11 Reuters (2026). Applied Materials, SK hynix partner on next-generation AI memory development. Applied Materials partnerships with Micron and SK hynix, EPIC Center context, 3D packaging and process integration.

Further reading
*   *   *

This is Essay No. 012. The topics: intelligence, AI, systems, knowledge, and the questions underneath the questions everyone else is asking. If you read this far and disagreed with any part of it, write to me. I read everything.

Pugalenthi Magendran