← Back to blog

Essay No. 038 · AI Infrastructure · Melbourne, Australia

AI Infrastructure Tesla Dojo Advanced Packaging TSMC InFO_SoW Fan-Out Wafer AI5 AI6 Samsung CoWoS SoIC SoW-X Robotics Power Delivery Cooling

The Wafer-Scale Training Bet.Original analysisNot investment advice

How Tesla’s Dojo showed that AI scaling is a packaging, power, cooling, and software problem, not just a chip problem.

Pugalenthi Magendran

April 2026 · Melbourne, Australia

12 min read

Dojo was not just a custom chip. It was Tesla’s extreme bet that fan-out wafer packaging, dense interconnect, SRAM-heavy compute, water cooling, power delivery, and custom software could create a training computer tuned for autonomy. In 2026, the dedicated Dojo path looks less central as Tesla shifts toward AI5 and AI6. But the core lesson aged well: AI scaling is moving from chip-level performance to package-level and rack-level integration.

In 2021, Tesla teased a strange object. It did not look like a normal GPU. It did not look like a normal accelerator card. It looked like a compute slab. A cold plate. A carrier. A 5×5 grid of chips. Connectors. Power delivery. BGA pads. No visible HBM.

The uploaded SemiAnalysis article looked at that image and made the important guess: this might not be a normal chip package at all. It might be Tesla’s first move toward a system-on-wafer training engine, possibly using TSMC’s InFO_SoW packaging technology.¹

That was the real insight. Not “Tesla made a chip.” The deeper point was that Tesla was trying to turn packaging into a training architecture.

Key idea

Tesla’s Dojo was an early, extreme version of the problem every AI hardware company now faces: scaling AI compute is not just about faster chips. It is about packaging, interconnect, memory, power, cooling, software, and workload fit. The dedicated Dojo path looks less central after Tesla shifted focus toward AI5 and AI6, but the system-level lesson aged well. AI hardware is becoming package-scale and rack-scale, and Tesla saw that early.

I. The 2021 thesis was about the package

In August 2021, Dylan Patel published a SemiAnalysis piece reading the Tesla AI Day teaser image as a package-scale system rather than a normal chip. The visible structure showed a 5×5 chip grid, a cold plate, connectors, a carrier, BGA pads, and power delivery. The piece compared it to TSMC’s InFO_SoW research and argued that the fan-out wafer-scale structure could serve as the carrier itself, enabling close-packed chip arrays with lower latency, higher bandwidth density, and lower power-delivery impedance than CoWoS-style interposers or flip-chip MCMs. It also noted the absence of visible HBM or DRAM and speculated the design relied heavily on on-die SRAM, somewhat in the spirit of Cerebras.¹

2021 thesis

Dojo was not trying to build one huge chip. It was trying to make many chips behave like one huge training surface.

Diagram · Schematic of a 25-die training tile (original visual)

Training tile · 25 dies as one surface

cold plate fan-out tile power delivery

25 known-good D1 dies on a fanout-wafer process; 9 PFLOPS; 36 TB/s off-tile bandwidth (Tesla / Cadence framing).²

A schematic, original visual. The Tesla teaser image is not reproduced; cell counts and frame elements are stylised to communicate the package-as-system idea.

II. InFO_SoW, explained simply

InFO_SoW stands for Integrated Fan-Out System-on-Wafer. Instead of mounting many chips onto a normal package substrate or PCB, the wafer-scale fan-out structure becomes part of the system. Chips can be placed close together, connected densely, powered more efficiently, and cooled as one integrated structure. The 2021 SemiAnalysis piece was speculating from the teaser image, but later Dojo technical coverage confirmed the important direction: fan-out wafer-scale integration.¹²

Diagram · Substrate / PCB vs fan-out wafer-scale

Traditional

Substrate + PCB

Many dies mounted on a normal package substrate and PCB. Chip-to-chip paths are longer, power-delivery impedance is higher, and cooling is added on top of an already-complex stack.¹

Fan-out wafer-scale

Tile as carrier

The fan-out structure becomes the carrier itself. Chip-to-chip paths get shorter, bandwidth density rises, PDN impedance drops, and the tile is cooled as one integrated system.¹²

A simplified, original split. The exact InFO_SoW label matters less than the architecture — fan-out wafer-scale integration as a package-as-system bet.

Tesla did not need the exact InFO_SoW label for the thesis to be right. The important part was fan-out wafer-scale integration.

III. Why normal multi-chip packages hit limits

AI training wants many chips to behave like one machine. Normal packaging has tradeoffs.

Diagram · CoWoS / interposer vs flip-chip MCM vs fan-out wafer-scale

Approach 01

CoWoS / interposer

High bandwidth, strong HBM integration, but interposer-size and stitching constraints as systems grow.

reticle-bound

Approach 02

Flip-chip MCM

Avoids interposer limits, but lower wire density and higher power cost per chip-to-chip transfer.

density-limited

Approach 03

Fan-out wafer-scale

Keeps chips close and communication dense; reduces power and latency penalties. Creates packaging, yield, and thermal challenges.¹

tile-as-system

A simplified, original three-column compare. The packaging problem is really an interconnect problem; the interconnect problem becomes a training problem.

The packaging problem is really an interconnect problem. The interconnect problem becomes a training problem.

IV. Tesla later confirmed the important part

Cadence’s technical summary of Tesla’s AI Day material describes 25 known-good D1 dies integrated onto a fanout-wafer process that preserves bandwidth between adjacent D1 chips, delivering 9 PFLOPS per training tile with 36 TB/s of off-tile bandwidth.² The exact brand-name path (InFO_SoW or another flavour) matters less than the architecture: many dies connected as one training tile.

The package was the computer.

V. Power and cooling were not afterthoughts

The 2021 SemiAnalysis piece emphasised the cold plate, cited InFO_SoW’s ability to support ~7,000W of power, and compared that with Nvidia A100 configurations as high as 500W, noting multiple inlets and outlets consistent with water cooling.¹ Cadence’s summary reports Tesla’s training tile taking 52V DC, drawing 18,000 A, dissipating 15kW of heat, and delivering 9 PFLOPS in less than one cubic foot.²

Diagram · D1 dies → tile → cabinet

D1 dies

25 known-good D1²

Fan-out tile

9 PF / 36 TB/s²

Power delivery

52V, 18,000A²

Cold plate

15kW dissipated²

Cabinet / ExaPOD

training system

A simplified, original 5-step flow. Power, cooling, and the tile are the same architecture.

At Dojo scale, the chip is not the product. The powered, cooled, connected tile is the product.

VI. SRAM instead of HBM was the other clue

The 2021 piece noted no visible HBM or DRAM in the teaser and speculated the design relied heavily on on-die SRAM, somewhat like Cerebras.¹ Most leading AI accelerators lean on HBM because training workloads are memory-hungry. Tesla’s Dojo design appeared to make a different bet: keep compute tiles tightly connected, use large on-die local memory, stream data through the training fabric, and optimise around Tesla’s specific vision workload.

Diagram · HBM-heavy vs SRAM-heavy memory strategy

Mainstream

HBM-heavy

High-bandwidth memory stacks beside the accelerator. Broad workload fit, deep ecosystem, dominant for general-purpose training.

Dojo-style

SRAM-heavy

Large on-die / on-tile SRAM with tight inter-die fabric. Narrower workload fit, but lower memory-movement penalty if the workload maps cleanly to the tile.¹

A simplified, original split. Architectural emphasis, not absolute absence of external memory.

Dojo was not a generic GPU clone. It was a workload-specific memory and interconnect bet.

VII. Dojo was Tesla’s most extreme vertical-integration bet

Dojo made sense in Tesla logic. Tesla has fleet video, autonomy training, perception models, planning models, robotics ambitions, closed-loop data generation, and the ability to tune hardware and software together. The dream was lower dependence on Nvidia, lower training cost, faster model iteration, and a system tuned to Tesla’s own data loop. But this required Tesla to own or coordinate silicon, packaging, power delivery, cooling, compiler, runtime, training software, data-center system design, reliability engineering, supply chain, and multiple generations of hardware.

Vertical integration gives control. It also gives you every problem.

VIII. The 2026 update is not a clean Dojo victory lap

Reuters reported that Bloomberg said Tesla was disbanding its Dojo supercomputer team, citing Musk saying it did not make sense for Tesla to divide resources across two very different AI chip designs and that Tesla would focus effort on AI5, AI6, and later chips, which he described as excellent for inference and at least pretty good for training.³

Reading the shift

Dojo’s reorganisation is not a verdict on packaging.

The reporting describes a streamlining of Tesla’s AI chip teams toward AI5 / AI6, not a rejection of fan-out wafer-scale ideas industry-wide. It tells you about Tesla’s priorities. It does not tell you that the package-as-system thesis is wrong; TSMC’s SoW-X roadmap suggests the opposite direction.⁷

Dojo showed the architecture problem. AI5 and AI6 show the product-priority problem.

IX. AI5 and AI6 are the new center of gravity

Reuters reported Tesla signed a ~$16.5B chip-supply deal with Samsung, with Musk saying Samsung’s Taylor, Texas factory would make Tesla’s next-generation AI6 chip; Samsung currently makes Tesla’s AI4 chips, and TSMC was slated to make AI5 first in Taiwan and then Arizona according to Musk.⁴ Reuters also reported Musk saying Tesla may tape out AI6 in December 2026, with a Samsung executive saying Tesla chips based on Samsung’s advanced 2nm process were planned for production in the second half of 2027.⁵

Diagram · Dojo → AI5 / AI6 strategy shift

2021

Dojo teaser

5x5 tile, fan-out direction.¹

2022–24

Dojo build-out

25 D1 dies, 9 PF tile, ExaPOD.²

2024–25

Cost / priority test

Custom path competes with GPU spend.

2025

Dojo streamlined

Resources move to AI5 / AI6.³

2026–27

AI5 / AI6

TSMC + Samsung; inference-first chips.⁴⁵

A simplified, original timeline. Years are approximate; key milestones per cited reporting.

Dojo was a training-supercomputer bet. AI5 and AI6 are deployment-scale AI-chip bets.

X. Why the packaging thesis aged better than the Dojo thesis

TSMC’s 2026 North America Technology Symposium materials describe a packaging roadmap with 5.5-reticle CoWoS today, 14-reticle CoWoS by 2028 supporting roughly 10 large compute dies and 20 HBM stacks, and a 40-reticle SoW-X System-on-Wafer technology targeted for 2029, alongside SoIC 3D stacking and COUPE co-packaged optics.⁷

Diagram · TSMC packaging roadmap, simplified

Today

CoWoS 5.5R

compute + HBM⁷

2028

CoWoS 14R

~10 dies + 20 HBM⁷

Cross-cut

SoIC + COUPE

3D stacking, optics⁷

2029

SoW-X 40R

system-on-wafer⁷

A simplified, original visual of TSMC’s public roadmap framing. The package-as-system idea Tesla previewed is now the industry direction.

Tesla may not have won the Dojo war, but it was early to the packaging battlefield.

XI. The package is becoming the computer

AI scaling used to look like: faster chip → faster training. Now it looks like: chip + HBM + package + interconnect + power + cooling + rack + software partitioning. The bottlenecks are chip-to-chip bandwidth, HBM capacity and bandwidth, package size, reticle limits, substrate / interposer complexity, power delivery, cooling, optical I/O, rack networking, compiler / runtime, and workload partitioning.

Diagram · Old scaling lens vs new scaling lens

Old lens

Chip → server

Bottleneck · single-chip performance.
Scaling unit · one accelerator.
Constraint · transistor count.
Result · servers full of independent chips.

New lens

Die → package → tile → rack

Bottleneck · chip-to-chip bandwidth, power, cooling.
Scaling unit · package + tile + rack.⁷
Constraint · interconnect, memory, energy.
Result · AI factory designed as one system.

A simplified, original split. The unit of AI scaling moved from chip to package to rack.

The package is becoming the unit of AI scaling.

XII. Dojo vs Nvidia: the real comparison

The Dojo-vs-Nvidia comparison is not "custom chip vs GPU." It is platform vs vertical integration.

Dimension

Nvidia path

Tesla Dojo path

Hardware

Mature GPU + HBM + NVLink + DGX rack-scale systems.

Custom D1 dies + fan-out training tile + custom cabinet.²

Memory

HBM-heavy.

SRAM-heavy / on-tile memory bet.¹

Networking

NVLink + InfiniBand + Spectrum-X.

On-tile fabric + custom interconnect.²

Software

CUDA + libraries + ecosystem.

Tesla compiler / runtime / training stack.

Workload fit

Broad, flexible.

Optimised for Tesla’s vision data loop.

Iteration

Through external supplier scale.

Through internal control.

Nvidia sells a platform. Dojo was Tesla trying to build a machine for one company’s data loop.

XIII. The software problem

A beautiful tile is not enough. To make Dojo work, Tesla needed a compiler, runtime, training-framework support, model partitioning, debugging tools, reliability management, fault tolerance, scheduling, data pipeline, developer productivity, and a migration path from existing GPU workflows. None of those are easy. None of them get cheaper at custom-hardware scale.

A custom accelerator fails quietly when the software team cannot make the hardware convenient.

XIV. The business lesson

Hardware ambition is not enough. The architecture has to map to business leverage. Dojo had a clear technical reason: Tesla-specific training. But the business test is whether it reduced cost enough, sped training enough, justified a separate team, kept up with Nvidia’s roadmap, justified custom software, scaled reliably, and helped cars and robots ship faster. Those are a lot of bars to clear in parallel.

Dojo’s enemy was not only Nvidia. Dojo’s enemy was the cost of becoming Nvidia, TSMC packaging, a compiler company, and a data-center operator at the same time.

XV. What could break the thesis?

Dojo showed technical ambition, but not enough proven business leverage to keep the dedicated path central.

Bear case · what could break the thesis

Custom too narrow. Dojo may have been too custom for Tesla’s evolving AI roadmap.³
GPU still wins on cost. External GPUs and AI5 / AI6 may be better uses of Tesla resources.
Training ↔ inference shift. Dedicated training chips can become obsolete if inference dominates.³
Schedule risk. Samsung 2nm or TSMC AI5 schedules may slip.⁵
Nvidia ecosystem. Nvidia’s software / network / cloud stack remains very hard to beat.
Software burden. Custom stacks are expensive to maintain.
Fan-out hard problems. Yield and thermal challenges scale with package size.¹
Compiler ceiling. A beautiful package is wasted if the compiler / runtime is not productive.
Autonomy ≠ compute-bound. Tesla’s autonomy progress may not be bottlenecked by training hardware alone.

XVI. What could break the bear case?

Even if Dojo changes form, the lesson survives.

Bull case · what could break the bear

Early lesson advantage. Tesla learned package-scale AI hardware earlier than most.¹
Reusable IP. Dojo lessons feed AI5, AI6, robotics, autonomy, and future infrastructure.
Data + workload control. Tesla has unique data and workload control.
Training ↔ inference convergence. Future chips may serve both well.³
Package-scale mainstreams. SoW-X / CoWoS 14R / SoIC validate the direction.⁷
Power / cooling reusable. Lessons travel even if the tile architecture changes form.²
Robotics scale. Optimus and physical AI add new training-compute demand.
Vertical integration still rare. Few companies can co-design data, model, and silicon.

Even if Dojo changes form, the lesson survives: AI compute is constrained by bandwidth, power, cooling, packaging, and software.

XVII. What to watch

What to watch

Whether Tesla continues using Dojo systems internally.³
AI5 tape-out and production timing.⁴
AI6 tape-out timing.⁵
Samsung 2nm yield and schedule.⁵
TSMC AI5 production in Taiwan / Arizona.⁴
Tesla’s actual training-compute spending.
Nvidia usage inside Tesla.
AMD usage inside Tesla, if any.
Whether Dojo software survives in AI5 / AI6 workflows.
Optimus training requirements.
Autonomy model size and data growth.
Tesla’s FSD training cadence.
Package-scale AI hardware trends.⁷
TSMC SoW-X roadmap.⁷
CoWoS capacity and package size evolution.⁷
HBM dependence vs SRAM-heavy architectures.
Power and cooling architecture for AI clusters.²
Whether custom accelerators beat GPU clusters in narrow workloads.
Vertical-integration economics across hyperscalers and OEMs.
Optical I/O and COUPE adoption.⁷

Glossary

A short reference for the vocabulary used above. Definitions are simplified.

Glossary

Dojo: Tesla’s custom AI training supercomputer project.
D1: Tesla’s Dojo training chip.
Training tile: Package-level unit containing multiple D1 dies.
Known-good die: A chip die tested before integration.
Fan-out wafer: Packaging process that redistributes connections outward from dies to enable dense integration.
InFO_SoW: TSMC Integrated Fan-Out System-on-Wafer technology.
CoWoS: TSMC advanced packaging technology using interposers, often for AI / HPC chips with HBM.
Interposer: Intermediate layer connecting dies and memory at high bandwidth.
Reticle limit: Maximum lithography exposure field size.
MCM: Multi-chip module.
HBM: High-bandwidth memory.
SRAM: Fast on-chip memory.
PDN: Power delivery network.
Cold plate: Liquid-cooling structure used to remove heat.
AI5 / AI6: Tesla’s next-generation AI chip roadmap, per public reporting.
Tape-out: Final chip design handoff before manufacturing.
System-on-wafer: Approach where many chips or chiplets are integrated across wafer-scale structures.
Rack-scale AI: AI compute designed at rack or data-center scale rather than single-chip scale.

XVIII. The wafer-scale training bet

Dojo was not just a chip.

It was a bet that Tesla could make a training computer out of packaging, power delivery, cooling, interconnect, SRAM, and software.

In 2026, the dedicated Dojo story looks weaker because Tesla shifted toward AI5 and AI6. But the core insight aged well: AI scaling is moving from chip-level performance to package-level and rack-level integration. The package is becoming the computer.

The Dojo path may not have produced Tesla’s Nvidia killer. It did produce an early proof that AI hardware is a system problem, not a chip problem. Tesla’s reorganisation around AI5 and AI6 is a priority shift, not a refutation of the packaging thesis. TSMC’s CoWoS, SoIC, COUPE, and SoW-X roadmap continues exactly where the 2021 Tesla teaser pointed: toward larger packages, denser interconnect, and rack-level integration.

The companies that win the next AI cycle will be the ones that treat the package, the rack, and the software stack as one design problem. Tesla saw that early. Others are catching up now.

That is the wafer-scale training bet.

¹ Patel, D. (Aug 2021). Tesla AI Day Supercomputer Chip Teaser | Is This The First Deployment Of TSMC InFO_SoW? SemiAnalysis. Historical anchor for the package-scale framing — 5x5 chip array, cold plate, carrier, BGA pads, power delivery, InFO_SoW speculation, fan-out wafer-scale framing, comparison with CoWoS-style interposers and flip-chip MCMs, reticle-limit context, ~7,000W vs Nvidia A100 ~500W power framing, water-cooling implications, and SRAM-heavy speculation. Used as inspiration only. No content, structure, or charts reproduced.

² Cadence. Not chips: Tesla’s Dojo. Technical summary covering 25 known-good D1 dies on a fanout-wafer process, ~9 PFLOPS per training tile, ~36 TB/s off-tile bandwidth, ~52V DC, ~18,000A current draw, ~15kW heat dissipation, and the tile / cabinet / ExaPOD framing.

³ Reuters (Aug 2025). Tesla to streamline its AI chip design work, Musk says. Bloomberg-via-Reuters reporting that Tesla was disbanding its Dojo supercomputer team, with Musk saying it did not make sense to divide resources across two very different AI chip designs and that Tesla would focus on AI5, AI6, and later chips described as excellent for inference and at least pretty good for training.

⁴ Reuters (Jul 2025). Tesla’s $16.5B Samsung chip supply deal. Samsung’s Taylor, Texas factory framed to make Tesla’s AI6; Samsung currently making AI4; TSMC slated to make AI5 first in Taiwan and then Arizona per Musk; self-driving / Optimus / broader AI context.

⁵ Reuters (Mar 2026). Musk says Tesla may tape out AI6 in December 2026. Samsung executive cited saying Tesla chips based on Samsung’s advanced 2nm process were planned for production in the second half of 2027.

⁶ Tesla AI Day 2021 and subsequent Tesla technical presentations remain the original sources for D1 chip, training tile, cabinet, ExaPOD, and software stack framing. This essay relies on the Cadence technical summary (fn2) and SemiAnalysis 2021 framing (fn1) for the specific numbers used; no Tesla material is reproduced.

⁷ TSMC (2026). 2026 North America Technology Symposium. CoWoS expansion (5.5-reticle today, 14-reticle by 2028, ~10 dies + 20 HBM at 14R), SoIC 3D stacking, COUPE co-packaged optics, and 40-reticle SoW-X System-on-Wafer targeted for 2029.

⁸ Public Hot Chips and other credible Dojo technical coverage is referenced in this essay only at the level the Cadence summary (fn2) and SemiAnalysis 2021 framing (fn1) already disclose. Specific microarchitecture claims are made only where verifiable.

⁹ Public Nvidia developer / DGX materials are referenced only as comparative platform context. This essay does not reproduce Nvidia visuals or marketing.

¹⁰ TSMC packaging materials (CoWoS, InFO, SoIC, SoW) provide additional context for the system-level packaging direction. The essay uses the 2026 symposium framing (fn7) as the primary citation rather than restating individual product pages.