The Wafer-Scale Training Bet.Original analysisNot investment advice
Dojo was not just a custom chip. It was Tesla’s extreme bet that fan-out wafer packaging, dense interconnect, SRAM-heavy compute, water cooling, power delivery, and custom software could create a training computer tuned for autonomy. In 2026, the dedicated Dojo path looks less central as Tesla shifts toward AI5 and AI6. But the core lesson aged well: AI scaling is moving from chip-level performance to package-level and rack-level integration.
In 2021, Tesla teased a strange object. It did not look like a normal GPU. It did not look like a normal accelerator card. It looked like a compute slab. A cold plate. A carrier. A 5×5 grid of chips. Connectors. Power delivery. BGA pads. No visible HBM.
The uploaded SemiAnalysis article looked at that image and made the important guess: this might not be a normal chip package at all. It might be Tesla’s first move toward a system-on-wafer training engine, possibly using TSMC’s InFO_SoW packaging technology.1
That was the real insight. Not “Tesla made a chip.” The deeper point was that Tesla was trying to turn packaging into a training architecture.
Tesla’s Dojo was an early, extreme version of the problem every AI hardware company now faces: scaling AI compute is not just about faster chips. It is about packaging, interconnect, memory, power, cooling, software, and workload fit. The dedicated Dojo path looks less central after Tesla shifted focus toward AI5 and AI6, but the system-level lesson aged well. AI hardware is becoming package-scale and rack-scale, and Tesla saw that early.
I. The 2021 thesis was about the package
In August 2021, Dylan Patel published a SemiAnalysis piece reading the Tesla AI Day teaser image as a package-scale system rather than a normal chip. The visible structure showed a 5×5 chip grid, a cold plate, connectors, a carrier, BGA pads, and power delivery. The piece compared it to TSMC’s InFO_SoW research and argued that the fan-out wafer-scale structure could serve as the carrier itself, enabling close-packed chip arrays with lower latency, higher bandwidth density, and lower power-delivery impedance than CoWoS-style interposers or flip-chip MCMs. It also noted the absence of visible HBM or DRAM and speculated the design relied heavily on on-die SRAM, somewhat in the spirit of Cerebras.1
Dojo was not trying to build one huge chip. It was trying to make many chips behave like one huge training surface.
II. InFO_SoW, explained simply
InFO_SoW stands for Integrated Fan-Out System-on-Wafer. Instead of mounting many chips onto a normal package substrate or PCB, the wafer-scale fan-out structure becomes part of the system. Chips can be placed close together, connected densely, powered more efficiently, and cooled as one integrated structure. The 2021 SemiAnalysis piece was speculating from the teaser image, but later Dojo technical coverage confirmed the important direction: fan-out wafer-scale integration.12
Substrate + PCB
Many dies mounted on a normal package substrate and PCB. Chip-to-chip paths are longer, power-delivery impedance is higher, and cooling is added on top of an already-complex stack.1
Tesla did not need the exact InFO_SoW label for the thesis to be right. The important part was fan-out wafer-scale integration.
III. Why normal multi-chip packages hit limits
AI training wants many chips to behave like one machine. Normal packaging has tradeoffs.
CoWoS / interposer
High bandwidth, strong HBM integration, but interposer-size and stitching constraints as systems grow.
reticle-boundFlip-chip MCM
Avoids interposer limits, but lower wire density and higher power cost per chip-to-chip transfer.
density-limitedFan-out wafer-scale
Keeps chips close and communication dense; reduces power and latency penalties. Creates packaging, yield, and thermal challenges.1
tile-as-systemThe packaging problem is really an interconnect problem. The interconnect problem becomes a training problem.
IV. Tesla later confirmed the important part
Cadence’s technical summary of Tesla’s AI Day material describes 25 known-good D1 dies integrated onto a fanout-wafer process that preserves bandwidth between adjacent D1 chips, delivering 9 PFLOPS per training tile with 36 TB/s of off-tile bandwidth.2 The exact brand-name path (InFO_SoW or another flavour) matters less than the architecture: many dies connected as one training tile.
The package was the computer.
V. Power and cooling were not afterthoughts
The 2021 SemiAnalysis piece emphasised the cold plate, cited InFO_SoW’s ability to support ~7,000W of power, and compared that with Nvidia A100 configurations as high as 500W, noting multiple inlets and outlets consistent with water cooling.1 Cadence’s summary reports Tesla’s training tile taking 52V DC, drawing 18,000 A, dissipating 15kW of heat, and delivering 9 PFLOPS in less than one cubic foot.2
At Dojo scale, the chip is not the product. The powered, cooled, connected tile is the product.
VI. SRAM instead of HBM was the other clue
The 2021 piece noted no visible HBM or DRAM in the teaser and speculated the design relied heavily on on-die SRAM, somewhat like Cerebras.1 Most leading AI accelerators lean on HBM because training workloads are memory-hungry. Tesla’s Dojo design appeared to make a different bet: keep compute tiles tightly connected, use large on-die local memory, stream data through the training fabric, and optimise around Tesla’s specific vision workload.
HBM-heavy
High-bandwidth memory stacks beside the accelerator. Broad workload fit, deep ecosystem, dominant for general-purpose training.
SRAM-heavy
Large on-die / on-tile SRAM with tight inter-die fabric. Narrower workload fit, but lower memory-movement penalty if the workload maps cleanly to the tile.1
Dojo was not a generic GPU clone. It was a workload-specific memory and interconnect bet.
VII. Dojo was Tesla’s most extreme vertical-integration bet
Dojo made sense in Tesla logic. Tesla has fleet video, autonomy training, perception models, planning models, robotics ambitions, closed-loop data generation, and the ability to tune hardware and software together. The dream was lower dependence on Nvidia, lower training cost, faster model iteration, and a system tuned to Tesla’s own data loop. But this required Tesla to own or coordinate silicon, packaging, power delivery, cooling, compiler, runtime, training software, data-center system design, reliability engineering, supply chain, and multiple generations of hardware.
Vertical integration gives control. It also gives you every problem.
VIII. The 2026 update is not a clean Dojo victory lap
Reuters reported that Bloomberg said Tesla was disbanding its Dojo supercomputer team, citing Musk saying it did not make sense for Tesla to divide resources across two very different AI chip designs and that Tesla would focus effort on AI5, AI6, and later chips, which he described as excellent for inference and at least pretty good for training.3
Dojo’s reorganisation is not a verdict on packaging.
The reporting describes a streamlining of Tesla’s AI chip teams toward AI5 / AI6, not a rejection of fan-out wafer-scale ideas industry-wide. It tells you about Tesla’s priorities. It does not tell you that the package-as-system thesis is wrong; TSMC’s SoW-X roadmap suggests the opposite direction.7
Dojo showed the architecture problem. AI5 and AI6 show the product-priority problem.
IX. AI5 and AI6 are the new center of gravity
Reuters reported Tesla signed a ~$16.5B chip-supply deal with Samsung, with Musk saying Samsung’s Taylor, Texas factory would make Tesla’s next-generation AI6 chip; Samsung currently makes Tesla’s AI4 chips, and TSMC was slated to make AI5 first in Taiwan and then Arizona according to Musk.4 Reuters also reported Musk saying Tesla may tape out AI6 in December 2026, with a Samsung executive saying Tesla chips based on Samsung’s advanced 2nm process were planned for production in the second half of 2027.5
Cost / priority test
Dojo was a training-supercomputer bet. AI5 and AI6 are deployment-scale AI-chip bets.
X. Why the packaging thesis aged better than the Dojo thesis
TSMC’s 2026 North America Technology Symposium materials describe a packaging roadmap with 5.5-reticle CoWoS today, 14-reticle CoWoS by 2028 supporting roughly 10 large compute dies and 20 HBM stacks, and a 40-reticle SoW-X System-on-Wafer technology targeted for 2029, alongside SoIC 3D stacking and COUPE co-packaged optics.7
Tesla may not have won the Dojo war, but it was early to the packaging battlefield.
XI. The package is becoming the computer
AI scaling used to look like: faster chip → faster training. Now it looks like: chip + HBM + package + interconnect + power + cooling + rack + software partitioning. The bottlenecks are chip-to-chip bandwidth, HBM capacity and bandwidth, package size, reticle limits, substrate / interposer complexity, power delivery, cooling, optical I/O, rack networking, compiler / runtime, and workload partitioning.
Chip → server
- Bottleneck · single-chip performance.
- Scaling unit · one accelerator.
- Constraint · transistor count.
- Result · servers full of independent chips.
Die → package → tile → rack
- Bottleneck · chip-to-chip bandwidth, power, cooling.
- Scaling unit · package + tile + rack.7
- Constraint · interconnect, memory, energy.
- Result · AI factory designed as one system.
The package is becoming the unit of AI scaling.
XII. Dojo vs Nvidia: the real comparison
The Dojo-vs-Nvidia comparison is not "custom chip vs GPU." It is platform vs vertical integration.
Nvidia sells a platform. Dojo was Tesla trying to build a machine for one company’s data loop.
XIII. The software problem
A beautiful tile is not enough. To make Dojo work, Tesla needed a compiler, runtime, training-framework support, model partitioning, debugging tools, reliability management, fault tolerance, scheduling, data pipeline, developer productivity, and a migration path from existing GPU workflows. None of those are easy. None of them get cheaper at custom-hardware scale.
A custom accelerator fails quietly when the software team cannot make the hardware convenient.
XIV. The business lesson
Hardware ambition is not enough. The architecture has to map to business leverage. Dojo had a clear technical reason: Tesla-specific training. But the business test is whether it reduced cost enough, sped training enough, justified a separate team, kept up with Nvidia’s roadmap, justified custom software, scaled reliably, and helped cars and robots ship faster. Those are a lot of bars to clear in parallel.
Dojo’s enemy was not only Nvidia. Dojo’s enemy was the cost of becoming Nvidia, TSMC packaging, a compiler company, and a data-center operator at the same time.
XV. What could break the thesis?
Dojo showed technical ambition, but not enough proven business leverage to keep the dedicated path central.
- Custom too narrow. Dojo may have been too custom for Tesla’s evolving AI roadmap.3
- GPU still wins on cost. External GPUs and AI5 / AI6 may be better uses of Tesla resources.
- Training ↔ inference shift. Dedicated training chips can become obsolete if inference dominates.3
- Schedule risk. Samsung 2nm or TSMC AI5 schedules may slip.5
- Nvidia ecosystem. Nvidia’s software / network / cloud stack remains very hard to beat.
- Software burden. Custom stacks are expensive to maintain.
- Fan-out hard problems. Yield and thermal challenges scale with package size.1
- Compiler ceiling. A beautiful package is wasted if the compiler / runtime is not productive.
- Autonomy ≠ compute-bound. Tesla’s autonomy progress may not be bottlenecked by training hardware alone.
XVI. What could break the bear case?
Even if Dojo changes form, the lesson survives.
- Early lesson advantage. Tesla learned package-scale AI hardware earlier than most.1
- Reusable IP. Dojo lessons feed AI5, AI6, robotics, autonomy, and future infrastructure.
- Data + workload control. Tesla has unique data and workload control.
- Training ↔ inference convergence. Future chips may serve both well.3
- Package-scale mainstreams. SoW-X / CoWoS 14R / SoIC validate the direction.7
- Power / cooling reusable. Lessons travel even if the tile architecture changes form.2
- Robotics scale. Optimus and physical AI add new training-compute demand.
- Vertical integration still rare. Few companies can co-design data, model, and silicon.
Even if Dojo changes form, the lesson survives: AI compute is constrained by bandwidth, power, cooling, packaging, and software.
XVII. What to watch
- Whether Tesla continues using Dojo systems internally.3
- AI5 tape-out and production timing.4
- AI6 tape-out timing.5
- Samsung 2nm yield and schedule.5
- TSMC AI5 production in Taiwan / Arizona.4
- Tesla’s actual training-compute spending.
- Nvidia usage inside Tesla.
- AMD usage inside Tesla, if any.
- Whether Dojo software survives in AI5 / AI6 workflows.
- Optimus training requirements.
- Autonomy model size and data growth.
- Tesla’s FSD training cadence.
- Package-scale AI hardware trends.7
- TSMC SoW-X roadmap.7
- CoWoS capacity and package size evolution.7
- HBM dependence vs SRAM-heavy architectures.
- Power and cooling architecture for AI clusters.2
- Whether custom accelerators beat GPU clusters in narrow workloads.
- Vertical-integration economics across hyperscalers and OEMs.
- Optical I/O and COUPE adoption.7
Glossary
A short reference for the vocabulary used above. Definitions are simplified.
- Dojo
- Tesla’s custom AI training supercomputer project.
- D1
- Tesla’s Dojo training chip.
- Training tile
- Package-level unit containing multiple D1 dies.
- Known-good die
- A chip die tested before integration.
- Fan-out wafer
- Packaging process that redistributes connections outward from dies to enable dense integration.
- InFO_SoW
- TSMC Integrated Fan-Out System-on-Wafer technology.
- CoWoS
- TSMC advanced packaging technology using interposers, often for AI / HPC chips with HBM.
- Interposer
- Intermediate layer connecting dies and memory at high bandwidth.
- Reticle limit
- Maximum lithography exposure field size.
- MCM
- Multi-chip module.
- HBM
- High-bandwidth memory.
- SRAM
- Fast on-chip memory.
- PDN
- Power delivery network.
- Cold plate
- Liquid-cooling structure used to remove heat.
- AI5 / AI6
- Tesla’s next-generation AI chip roadmap, per public reporting.
- Tape-out
- Final chip design handoff before manufacturing.
- System-on-wafer
- Approach where many chips or chiplets are integrated across wafer-scale structures.
- Rack-scale AI
- AI compute designed at rack or data-center scale rather than single-chip scale.
XVIII. The wafer-scale training bet
Dojo was not just a chip.
It was a bet that Tesla could make a training computer out of packaging, power delivery, cooling, interconnect, SRAM, and software.
The Dojo path may not have produced Tesla’s Nvidia killer. It did produce an early proof that AI hardware is a system problem, not a chip problem. Tesla’s reorganisation around AI5 and AI6 is a priority shift, not a refutation of the packaging thesis. TSMC’s CoWoS, SoIC, COUPE, and SoW-X roadmap continues exactly where the 2021 Tesla teaser pointed: toward larger packages, denser interconnect, and rack-level integration.
The companies that win the next AI cycle will be the ones that treat the package, the rack, and the software stack as one design problem. Tesla saw that early. Others are catching up now.
That is the wafer-scale training bet.
1 Patel, D. (Aug 2021). Tesla AI Day Supercomputer Chip Teaser | Is This The First Deployment Of TSMC InFO_SoW? SemiAnalysis. Historical anchor for the package-scale framing — 5x5 chip array, cold plate, carrier, BGA pads, power delivery, InFO_SoW speculation, fan-out wafer-scale framing, comparison with CoWoS-style interposers and flip-chip MCMs, reticle-limit context, ~7,000W vs Nvidia A100 ~500W power framing, water-cooling implications, and SRAM-heavy speculation. Used as inspiration only. No content, structure, or charts reproduced.
2 Cadence. Not chips: Tesla’s Dojo. Technical summary covering 25 known-good D1 dies on a fanout-wafer process, ~9 PFLOPS per training tile, ~36 TB/s off-tile bandwidth, ~52V DC, ~18,000A current draw, ~15kW heat dissipation, and the tile / cabinet / ExaPOD framing.
3 Reuters (Aug 2025). Tesla to streamline its AI chip design work, Musk says. Bloomberg-via-Reuters reporting that Tesla was disbanding its Dojo supercomputer team, with Musk saying it did not make sense to divide resources across two very different AI chip designs and that Tesla would focus on AI5, AI6, and later chips described as excellent for inference and at least pretty good for training.
4 Reuters (Jul 2025). Tesla’s $16.5B Samsung chip supply deal. Samsung’s Taylor, Texas factory framed to make Tesla’s AI6; Samsung currently making AI4; TSMC slated to make AI5 first in Taiwan and then Arizona per Musk; self-driving / Optimus / broader AI context.
5 Reuters (Mar 2026). Musk says Tesla may tape out AI6 in December 2026. Samsung executive cited saying Tesla chips based on Samsung’s advanced 2nm process were planned for production in the second half of 2027.
6 Tesla AI Day 2021 and subsequent Tesla technical presentations remain the original sources for D1 chip, training tile, cabinet, ExaPOD, and software stack framing. This essay relies on the Cadence technical summary (fn2) and SemiAnalysis 2021 framing (fn1) for the specific numbers used; no Tesla material is reproduced.
7 TSMC (2026). 2026 North America Technology Symposium. CoWoS expansion (5.5-reticle today, 14-reticle by 2028, ~10 dies + 20 HBM at 14R), SoIC 3D stacking, COUPE co-packaged optics, and 40-reticle SoW-X System-on-Wafer targeted for 2029.
8 Public Hot Chips and other credible Dojo technical coverage is referenced in this essay only at the level the Cadence summary (fn2) and SemiAnalysis 2021 framing (fn1) already disclose. Specific microarchitecture claims are made only where verifiable.
9 Public Nvidia developer / DGX materials are referenced only as comparative platform context. This essay does not reproduce Nvidia visuals or marketing.
10 TSMC packaging materials (CoWoS, InFO, SoIC, SoW) provide additional context for the system-level packaging direction. The essay uses the 2026 symposium framing (fn7) as the primary citation rather than restating individual product pages.
- The Wafer-Scale Latency Bet. Cerebras and the case for removing chip boundaries (Cerebras vs Dojo contrast).
- The Foundry Toll Road. Why TSMC’s pricing power got stronger in the AI era.
- The GAA Credibility Test. Samsung Foundry’s 2nm comeback as a trust test, not a transistor story.
- When AI Runs Out of Copper. Optical I/O, co-packaged optics, and the race to replace copper with light.
- The Custom Silicon Flywheel. Hyperscalers turning their biggest workloads into chips.
- Nvidia Built the AI Factory Anyway. Vertical system integration as the new moat.
- The Bubble That Became Infrastructure. Why Nvidia’s 2021 overvaluation story turned into the AI factory thesis.
- The AI Memory Wall. DRAM, HBM, packaging, and semicap as the new centre of computing.
- The Power Efficiency Layer. Why Power Integrations sits at the quiet intersection of GaN, EVs, grids, and AI data-center power.
- The Networked AI Bet. Tenstorrent’s open, Ethernet-native attack on the AI compute stack.
- The AI Chip Software Wall. Why specialised silicon alone was not enough to beat Nvidia.
- The AI Field Manual. Reference layer for the AI stack: hardware, memory, models, agents, safety, economics.
This is Essay No. 038. The topics: intelligence, AI, systems, knowledge, and the questions underneath the questions everyone else is asking. If you read this far and disagreed with any part of it, write to me. I read everything.