← Back to blog
Essay No. 041  ·  AI Infrastructure  ·  Melbourne, Australia
AI Infrastructure Tesla Dojo Custom AI Hardware D1 AI5 AI6 Nvidia Advanced Packaging InFO_SoW TSMC Fan-Out Wafer SRAM HBM Compiler Rack-Scale AI

The Custom AI Hardware Trap.Original analysisNot investment advice

Why Tesla Dojo proved that impressive silicon is not enough.
PM
Pugalenthi Magendran
April 2026  ·  Melbourne, Australia
12 min read

Dojo was impressive, but the uploaded 2021 SemiAnalysis critique aged well. Tesla’s real challenge was never building a beautiful D1 chip or one powerful training tile. It was making memory, interconnect, software, power, cooling, and economics work as a full production training system. In 2026, Tesla’s move toward AI5 and AI6 suggests the standalone Dojo path became less central. But Dojo still matters because it revealed the future of AI hardware: the chip is no longer enough. The system wins.

Tesla Dojo was impressive. That was never the question.

The D1 chip had huge bandwidth. The training tile was exotic. The power density was wild. The system vision was bold.

But the uploaded 2021 SemiAnalysis article asked the question that mattered more.1 Can this become a usable production training system? Not whether Tesla could build an impressive chip. Whether Tesla could solve memory, interconnect, software, packaging, cooling, power delivery, and economics at the same time.

In 2026, that question looks even more important. Tesla has shifted attention from a standalone Dojo training-chip path toward AI5 and AI6.345 But the lesson is not “Dojo was stupid.” The lesson is sharper. Custom AI hardware is not won at the chip-spec level. It is won at the system level.

The chip-spec sheet is the easy part. The production system is the trap.


1. The 2021 critique was about the system

The uploaded SemiAnalysis piece on Dojo was not a hit-piece on Tesla.1 It did not deny that the D1 chip was technically interesting. It did not argue that the training tile was a bad idea. It argued something more disciplined.

It argued that a custom AI chip is not validated by peak TFLOPS or by a glossy reveal video. It is validated when memory, interconnect, compiler, power, cooling, and economics work together in production. The article identified the hard walls Tesla would have to cross. It was system-level discipline, not anti-Tesla noise.

2021 thesis

A custom AI chip is not validated by peak TFLOPS. It is validated when memory, interconnect, compiler, power, cooling, and economics work together in production.

Five years on, the framing still holds. Dojo’s silicon was the visible artefact. The invisible artefacts were the harder ones: memory balance, software stack, exotic packaging, custom interconnect, cooling design, and a business case narrow enough that only autonomy could justify the spend.

2. The four walls of custom AI hardware

The cleanest way to read the 2021 critique in 2026 is to organise it as four walls. Each wall is independent. A custom AI training system has to cross all four. Failure on any one is enough to keep the whole effort from reaching production scale.

Figure 1  ·  The four walls a custom AI training system has to cross
Wall 01

Memory

SRAM and DRAM per unit of compute. Locality. Activation, gradient, optimizer state, comms buffers.
Wall 02

Interconnect & packaging

SerDes pin count, package I/O, fan-out wafer, cooling, manufacturability, yield, serviceability.
Wall 03

Software

Compiler, runtime, framework integration, graph partitioning, memory placement, debugging, ergonomics.
Wall 04

Economics

NRE versus deployment volume. Cost of exotic packaging. Workload focus narrow enough to justify spend.
Original framing. The uploaded SemiAnalysis 2021 article identified each of these areas as live risks for Dojo.1

Peak TFLOPS get you to wall one. The rest of the project is the walls.

3. The memory wall came first

The uploaded article’s sharpest early observation was about memory per unit of compute.1 Each Dojo functional unit had around 1.25 MB of SRAM and roughly 1 TFLOP of FP16 / CFP8 compute. D1 had 354 such units. The article estimated that the full ExaPOD would have only around 1.33 TB of total SRAM behind well over an exaflop of FP16-class compute.

That is a lot of compute. It is not a lot of memory.

The article’s argument was not that this number is wrong. It was that this ratio is uncomfortable. CFP8 helped stretch memory by reducing precision per value, but it did not change the underlying balance. Dojo was compute-rich and memory-tight relative to the size of the models the industry was already aiming at.1

D1 unit
1.25 MB SRAM
D1 unit
~1 TFLOP
ExaPOD
~1.33 TB SRAM
ExaPOD
>1 EFLOP FP16-class
SRAM and compute estimates from the uploaded SemiAnalysis 2021 article.1 Bar widths are illustrative, not to scale.

Dojo’s first wall was not compute. It was memory per unit of compute.

4. Why memory balance matters

Training large neural networks does not just need compute. It needs the system to feed that compute. The memory side of the budget is broad, and most of it grows with model size.

What training memory has to hold
  • Model parameters
  • Forward activations for backprop
  • Gradients
  • Optimizer state (momentum, second-moment estimates)
  • Intermediate tensors and reductions
  • Communication buffers (all-reduce, all-gather)
  • KV cache when used during training-time evaluation
  • Slack for fragmentation and out-of-place ops

When memory per chip is tight, the system has to either keep computation local, partition the model carefully, or move data between chips and nodes. Each option has a cost.

If model partitioning is forced, the compiler has to place operations near their data. If data movement increases, interconnect bandwidth becomes the binding constraint. If utilisation falls, the headline TFLOPS becomes marketing instead of throughput. None of this means a compute-rich, SRAM-heavy design cannot work. It means the software, the compiler, and the engineering team have to do more work to extract real performance.2

Peak compute is easy to market. Feeding that compute is the hard part.

5. The bandwidth wall became a packaging problem

The uploaded article was equally pointed on I/O.1 It said D1 used 112G SerDes lanes. It said D1 had roughly 576 SerDes lanes. It said the chip reached around 8 TB/s of off-die I/O. It argued that normal organic substrates could not expose this much I/O cleanly. The escape valve, the article said, was exotic packaging: TSMC’s Integrated Fan-Out System-on-Wafer (InFO_SoW).

That framing aged well. Cadence’s independent technical summary later described how Tesla’s training tile uses 25 known-good D1 dies on a fan-out wafer process at the package level, with 9 PFLOPS of BF16 / CFP8 compute per tile and 36 TB/s of off-tile bandwidth.2 Whatever you call the package, the architectural point is the same. Dojo solved chip-to-chip bandwidth by making the package part of the architecture.

Normal package

Bandwidth is bounded by ball-out

  • Limited pin count on the package balls
  • Long, high-power off-package links
  • PCB and connector loss
  • Cooling per chip, not per tile
  • Serviceability is per socket
Fan-out wafer

Bandwidth becomes a package property

  • Dense interconnect on the wafer-like carrier
  • Many short, lower-energy die-to-die links
  • Bandwidth scales with package area, not pins
  • Cooling and power delivery designed for the tile
  • Serviceability becomes tile-level, not chip-level

The architectural elegance is real. The tradeoff is real too. Exotic packaging means tighter coupling with one foundry, harder yield management, more capital tied to specialist tools, and more complex repair stories. Tesla’s answer was a bet that the architectural payoff was worth the manufacturing complexity, at least for a workload it controlled.

Dojo attacked the bandwidth wall by making the package exotic.

6. The tile was beautiful, but the tile was not enough

The training tile is the photograph that travels. It is also the unit that tells the truth about why custom AI hardware is hard.

Tesla’s Hot Chips 34 materials describe the tile as the unit of scale, with around 9 PFLOPS BF16 / CFP8 of compute, around 36 TB/s of off-tile bandwidth, and around 11 GB of high-speed ECC SRAM at tile level.6 Cadence’s summary reaches similar numbers from outside Tesla.2 Tesla materials also describe Dojo Interface Processors with 32 GB of HBM per DIP, 800 GB/s of memory bandwidth, 160 GB of DRAM at the tile edge, and on the order of 13 TB of high-bandwidth DRAM at ExaPOD scale.6

DIP
32 GB HBM6
800 GB/s
Training tile
25 D1 dies2
~9 PFLOPS6
36 TB/s off-tile6
~11 GB ECC SRAM6
Tile-edge DRAM
~160 GB / tile6
~13 TB / ExaPOD
The tile was the centre of gravity, but it was never a stand-alone computer. Surrounding interface processors and DRAM existed so the SRAM-heavy compute surface could be fed.

This is the part of the 2021 critique that aged most cleanly. The tile is beautiful. The tile is not a computer. Around the tile sit interface processors, HBM, edge DRAM, host integration, custom protocols, fault management, and the software that holds it together. The uploaded article was correct to focus on the system, not the photogenic surface.

The tile was the centre. The system made it usable.

7. The software wall was the real test

If the tile is the part you see, software is the part you trip on. The uploaded article was openly skeptical here.1 It said Tesla had not convincingly shown automatic placement and routing of mini-tensor operations across the architecture. It warned that AI hardware companies often struggle with software for years after the silicon exists. It was not arguing that Tesla had no engineers. It was arguing that custom-hardware software stacks have a long, brutal middle.

The list of things a custom training accelerator’s software has to do is long.

01
Framework integration
PyTorch / JAX / TF graphs in, ops mapped out
Failure: researchers must rewrite models instead of running stock code.
02
Compiler
Lowering, fusion, tiling, scheduling
Failure: utilisation collapses for unusual shapes or new model patterns.
03
Graph partitioning
Model parallel, data parallel, pipeline
Failure: strategy has to be hand-tuned per model and per cluster size.
04
Memory placement
SRAM, HBM, DRAM, host RAM
Failure: out-of-memory errors show up only at scale and only sometimes.
05
Tile-to-tile routing
Collectives, all-reduce, peer links
Failure: communication becomes the bottleneck while compute idles.
06
Execution & fault tolerance
Checkpoint, recover, isolate bad tiles
Failure: one bad tile takes down a long run or hides as silent corruption.
07
Debugging & profiling
What is slow, what is wrong, why
Failure: tools cannot answer “why is this slow?” for the people who write models.

None of these are exotic problems. They are the daily problems of running a training cluster. The uploaded article’s deeper point was that the silicon is the easy half, even when the silicon is hard. The software is where most custom AI hardware projects either pay a long, expensive tax or quietly stop being used.

A beautiful chip is useless if researchers cannot easily make models run on it.

8. Nvidia’s moat was the default path

Dojo should not be compared only against Nvidia GPU silicon. The fair comparison is against Nvidia’s system. CUDA, cuDNN, NCCL, TensorRT, profilers, DGX systems, NVLink, networking, distributed-training recipes, hyperscaler integration, framework compatibility, and developer habit are all part of what you buy when you buy a Nvidia GPU.8

Nvidia’s rack-scale systems make the contrast even clearer. Nvidia says its GB200 NVL72 connects 36 Grace CPUs and 72 Blackwell GPUs into a single 72-GPU NVLink domain with around 130 TB/s of low-latency GPU communication.8 Whether or not those numbers are independently verified for any specific workload, they exist as a credible default. A team can put a PyTorch model on a Nvidia cluster on Monday and have it training on Tuesday.

Axis
Nvidia (default path)
Tesla Dojo (custom path)
Software stack
CUDA, cuDNN, NCCL, TensorRT, broad ecosystem8
Custom compiler and runtime, Tesla-owned
Framework support
Stock PyTorch, JAX, TF run unmodified for most cases
Models must target the Dojo path explicitly
Interconnect
NVLink + InfiniBand / Ethernet, broad cluster recipes8
Tile fabric + custom protocols + DIPs6
Rack-scale system
GB200 NVL72 advertised as 72-GPU NVLink domain8
Training tile + ExaPOD design, internal use
Customer reach
Every cloud, every lab, every startup
Tesla’s own training and autonomy workloads
Developer habit
Years of CUDA muscle memory across the field
Dojo-specific tooling, narrower user base

Nvidia wins through ecosystem-scale integration. Dojo tried to win through workload-specific vertical integration. Both can be legitimate strategies. They are not the same strategy. The custom path only wins if the workload is narrow enough and the team is deep enough to outweigh the years of compounding the default path has already done.

Nvidia is not only a GPU vendor. It is the default software path.

9. The economics only worked if autonomy worked

The uploaded article was explicit on the business case.1 It described around 3,000 large 645 mm² 7 nm dies committed for deployment. By the standards of normal chip economics, that is a small volume for the kind of NRE Dojo needed. Exotic packaging added cost. Custom interconnect added cost. A custom software stack added cost. The article framed the only credible payoff as a meaningful acceleration of Tesla’s autonomy and robotaxi programmes.

That framing is sharp and worth repeating. Dojo did not need to become a profitable standalone chip business. It needed to make autonomy arrive faster than it otherwise would have. Anything less than that was an expensive science project.

The only axis that justified Dojo
Sell chips
Not the plan
Cloud rental
Not the plan
Cost vs Nvidia
Possible, hard to verify
Custom workload speedup
Internal benefit
Faster autonomy
The only big payoff

Dojo’s business case did not need a standalone chip P&L. It needed autonomy and robotaxi capability to arrive faster than it would have without it.1

Dojo’s economics only made sense if it made autonomy arrive faster.

10. The 2026 update: Dojo became less central

Reuters, citing Bloomberg, reported in August 2025 that Tesla was streamlining its AI chip design work and disbanding the Dojo supercomputer team.3 The reporting included that Peter Bannon, who had led Dojo work, was leaving. Elon Musk’s response was framed by Reuters as a decision not to divide Tesla’s resources across two very different AI chip designs, and to focus on AI5, AI6, and subsequent chips. Musk described the new chips, per Reuters, as excellent for inference and at least pretty good for training.

Caution

Read this as a strategy shift, not a verdict on the silicon.

The Reuters / Bloomberg framing is about resource allocation and team structure, not about Dojo silicon failing. Tesla has not publicly retracted any specific D1 or tile claim. The shift is best read as a decision that the standalone Dojo training path was no longer the best use of Tesla’s AI hardware effort.3

The right reading is not “Dojo was a failure.” The right reading is that the standalone Dojo training path became too expensive to keep central. The 2021 critique was about whether memory, software, interconnect, packaging, cooling, and economics could work as one system. The 2025 reorganisation is the first publicly visible answer. The system path Tesla chose was AI5 and AI6, not a continued bet on Dojo as a separate training cluster.

Dojo taught Tesla the system problem. AI5 and AI6 became the product path.

11. AI5 and AI6 are the new center of gravity

The 2026 picture of Tesla’s AI hardware programme is built on two threads.

First, Reuters reported in early 2026 that Musk said Tesla may tape out AI6 in December, with Samsung executives saying Tesla chips based on Samsung’s 2 nm process are planned for production in the second half of 2027.4 AI6, per the Reuters framing, is likely to be used in self-driving cars and humanoid robots.

Second, Reuters reported in July 2025 that Tesla signed a roughly USD 16.5 billion supply deal with Samsung.5 Musk was quoted saying Samsung’s Taylor, Texas factory would manufacture AI6, while TSMC was slated to manufacture AI5 first in Taiwan and then in Arizona.5 Samsung currently makes AI4 in Tesla’s in-car generation, and AI5 sits between AI4 and AI6 in this roadmap.5

2021

Dojo unveiled

D1, training tile, ExaPOD design and skeptical 2021 critique.1
2022–24

Dojo iteration

Hot Chips 34 materials, tile and DIP details, internal use by Tesla.6
Jul 2025

Samsung deal

~USD 16.5B supply deal; Samsung Taylor TX for AI6, TSMC for AI5.5
Aug 2025

Dojo streamlined

Reuters / Bloomberg report Dojo team disbanded; focus on AI5 / AI6.3
Dec 2026

AI6 tape-out target

Musk says Tesla may tape out AI6 in December; Samsung 2 nm in H2 2027.4

Tesla did not abandon custom AI silicon. The shift is from a separate training-cluster bet to deployment-scale AI chips that go into the products themselves.

Dojo was the training-cluster bet. AI5 and AI6 are the deployment-scale AI-chip bets.

12. The industry moved toward Dojo’s problem

The most interesting move is not what happened inside Tesla. It is what happened around it. Even as Dojo as a standalone training cluster became less central, the rest of the AI hardware world moved toward the same set of problems Dojo had been trying to solve.

TSMC’s 2026 North America Technology Symposium press materials describe its CoWoS roadmap reaching a 5.5-reticle interposer in production today, a 14-reticle interposer planned for 2028 capable of integrating around 10 large compute dies and roughly 20 HBM stacks, and a 40-reticle System-on-Wafer technology (SoW-X) expected in 2029.7 The same materials describe SoIC for 3D stacking and COUPE for co-packaged optics.7

In production

CoWoS 5.5R

5.5-reticle interposer, current AI accelerators.7
2028 plan

CoWoS 14R

~10 compute dies + ~20 HBM stacks.7
Stacking + optics

SoIC + COUPE

3D logic on logic, co-packaged optics.7
2029 plan

SoW-X

40-reticle System-on-Wafer.7

This is the Dojo problem at industry scale. Memory, interconnect, packaging, power, cooling, optics, and rack systems are converging into a single design surface. Dojo may have become less central inside Tesla. The class of problem Dojo was trying to solve became universal.

Dojo became less central. The Dojo problem became universal.

13. Custom AI hardware is a system trap

The trap is not that custom AI hardware is a bad idea. The trap is that the chip-spec sheet looks easier than it is, and the production system is harder than it looks. Many internal chip teams and AI hardware startups have fallen into the same shape of failure mode.

The shape of the trap
Looks easy at the start

The spec-sheet half

  • Peak TFLOPS at a target precision
  • Die-to-die bandwidth on paper
  • Power per die
  • Process node and area
  • One impressive demo workload
Actually decides the outcome

The system half

  • Memory balance and locality at scale
  • Compiler that does not need a PhD per model
  • Tile-to-tile and rack-to-rack interconnect
  • Cooling, power delivery, serviceability
  • Researcher productivity and debugging
  • Ecosystem migration cost
  • Economics over actual deployment volume

Dojo did not fall into every part of this trap. Tesla genuinely advanced the state of the art on packaging, power delivery, and on what a training tile could look like. The trap is the gap. The chip-spec sheet is the easy part. The production system is what makes or breaks the project.

The chip-spec sheet is the easy part. The production system is the trap.

14. What could break the thesis

The thesis here is that Dojo’s 2021 critique aged well, and that custom AI hardware is decided at the system level. There are honest reasons that reading could be wrong.

Bear case  ·  reasons custom AI hardware is harder than the bull case admits
  1. Memory balance. SRAM-heavy architectures can struggle to absorb large parameter counts without painful partitioning.1
  2. Compiler debt. Custom compilers take years to mature, and most teams underestimate the work.
  3. Interconnect cost. Tile-to-tile fabrics are expensive to design, expensive to maintain, and hard to extend.
  4. Exotic packaging. Fan-out wafer and SoW-class packaging create yield, capacity, and serviceability risks.7
  5. Default path strength. Nvidia’s software and rack-scale system are extremely hard to beat.8
  6. Workload bottlenecks. Autonomy progress may be bottlenecked by data and methods, not by training compute alone.
  7. Team continuity. A reorganisation like the August 2025 Dojo streamlining can reduce institutional memory.3
  8. Roadmap dilution. Splitting effort across training-cluster and product-chip programmes can slow both.
  9. Cost of being early. Being early on package-scale compute does not always convert into product leverage.

15. What could break the bear case

There are equally honest reasons the bear case here could be too dark.

Bull case  ·  reasons the system lessons still pay off
  1. System literacy. Dojo forced Tesla to learn package-scale AI hardware before most of the industry.
  2. Unique data. Tesla controls a fleet-video workload that no other AI lab has at comparable scale.
  3. Hardware-software co-design. Tesla can tune AI5 / AI6 around its own data and autonomy loops.
  4. Inference-first chips. Reuters reporting describes Musk framing AI5 / AI6 as excellent for inference and at least pretty good for training.3
  5. Inherited lessons. AI5 and AI6 may inherit Dojo’s lessons on packaging, power, and software.
  6. Industry tailwind. Package-scale and rack-scale AI are now mainstream design surfaces.7
  7. Power and cooling reuse. Power-delivery and thermal lessons translate to AI factories more broadly.
  8. Workload growth. Robotics and autonomy workloads may grow large enough to justify custom silicon again.

Even if Dojo changes form, the lesson survives: AI compute is constrained by bandwidth, memory, power, cooling, packaging, and software.

16. What to watch

The most honest way to read Dojo in 2026 is as an unfinished experiment whose outcome will be readable over the next 18 to 36 months. These are the signals worth tracking.

Signals to track  ·  Dojo, AI5 / AI6, packaging, and the system path
  • Whether Tesla continues using Dojo systems internally
  • Dojo software survival inside AI5 / AI6 workflows
  • AI5 tape-out and production timing5
  • AI6 tape-out timing4
  • Samsung 2 nm yield and schedule4
  • TSMC AI5 production in Taiwan and Arizona5
  • Tesla’s actual training-compute spending
  • Tesla’s ongoing Nvidia usage
  • Tesla’s AMD usage, if any
  • Optimus training requirements
  • FSD training cadence
  • Model size and data growth at Tesla
  • Package-scale AI hardware trends at peers
  • TSMC SoW-X roadmap and customer alignment7
  • CoWoS capacity and package size cadence7
  • HBM-heavy versus SRAM-local architecture choices
  • AI data-centre power and cooling architecture
  • Whether custom accelerators beat GPU clusters on narrow workloads

17. The custom AI hardware trap

Dojo was impressive, but the uploaded article’s caution aged well. Tesla’s real challenge was not building a beautiful D1 chip or one powerful training tile. It was making memory, interconnect, software, power, cooling, and economics work as a full production training system.

In 2026, Tesla’s move toward AI5 and AI6 suggests the standalone Dojo path became less central.345 But Dojo still matters because it revealed the future of AI hardware. The chip is no longer enough. The system wins.

“Custom AI hardware is not won at the spec-sheet level. It is won at the system level.”

18. Glossary

Quick definitions used in this essay
Dojo
Tesla’s custom AI training supercomputer project.
D1
Tesla’s Dojo training chip.
Training tile
Package-level unit containing multiple D1 dies.
SRAM
Fast on-chip memory.
HBM
High-bandwidth memory stack used near AI accelerators.
SerDes
Serializer / deserializer links used for high-speed data movement.
InFO_SoW
TSMC Integrated Fan-Out System-on-Wafer packaging.
Fan-out wafer
Packaging method that redistributes chip connections through wafer-like structures.
Compiler
Software that maps model operations to hardware.
Runtime
Software that manages execution on hardware.
Graph partitioning
Splitting neural network computation across hardware units.
Model parallelism
Splitting a single model across multiple compute devices.
Data parallelism
Splitting batches of data across compute devices.
TCO
Total cost of ownership of a system over its useful life.
ExaPOD
Tesla’s larger Dojo system concept.
CoWoS
TSMC advanced packaging technology for AI / HPC chips.
SoIC
TSMC 3D stacking technology.
SoW-X
TSMC System-on-Wafer roadmap technology.
NVLink
Nvidia high-speed GPU interconnect.
Rack-scale AI
AI compute designed at rack or data-centre scale rather than single-chip scale.

This piece is original 2026 analysis. It uses the uploaded 2021 SemiAnalysis article only as a cited historical anchor for the 2021 critique. It uses Hot Chips and Cadence materials as a public technical baseline. It uses Reuters reporting as a frame around what Tesla and Musk have said publicly. It is not investment advice. No specific Tesla, Nvidia, Samsung, TSMC, or supplier security is being recommended.

1 Uploaded SemiAnalysis PDF, Dylan Patel (SemiAnalysis), 2021. Skeptical Dojo analysis, framed in this essay as the 2021 anchor. Used only as historical thesis / inspiration, not reproduced. The 2021 piece argued Dojo had a memory problem (~1.25 MB SRAM and ~1 TFLOP FP16 / CFP8 per functional unit, 354 units in D1, an estimated ~1.33 TB of total SRAM at ExaPOD scale behind >1 EFLOP of FP16-class compute), that D1 used 112G SerDes (576 lanes, ~8 TB/s I/O) and needed exotic fan-out wafer packaging (TSMC InFO_SoW), that Tesla’s software claims (placement / routing of mini-tensor operations) were unproven, and that economics relied on ~3,000 large 645 mm² 7 nm dies being justified by the autonomy / robotaxi payoff.

2 Cadence Breakfast Bytes (Paul McLellan). Not chips: Tesla’s Dojo. Independent technical summary of the D1 chip and training tile; the essay uses the Cadence framing for the 25-die training tile on a fan-out wafer process, ~9 PFLOPS, ~36 TB/s of off-tile bandwidth, and power-delivery / cooling design points.

3 Reuters (Aug 2025). Tesla to streamline its AI chip design work, Musk says. Bloomberg-via-Reuters reporting that Tesla was disbanding its Dojo supercomputer team, with Peter Bannon described as leaving, and Musk saying it did not make sense to divide resources across two very different AI chip designs and that Tesla’s effort was focused on AI5, AI6, and subsequent chips, described as excellent for inference and at least pretty good for training. Framed in this essay as a strategy shift, not a verdict on Dojo silicon.

4 Reuters (Mar 2026). Musk says Tesla may tape out next-generation AI6 chips in December. Reuters reporting Musk saying Tesla may tape out AI6 in December, with Samsung executives saying Tesla chips based on Samsung’s 2 nm process are planned for production in the second half of 2027. AI6 likely to be used in self-driving cars and humanoid robots, per the Reuters framing.

5 Reuters (Jul 2025). Tesla / Samsung ~USD 16.5B supply deal. Reuters reporting that Tesla signed a roughly USD 16.5 billion supply deal with Samsung; Musk was quoted saying Samsung’s Taylor, Texas factory would manufacture AI6, while TSMC was slated to make AI5 first in Taiwan and then in Arizona; Samsung currently makes AI4.

6 Hot Chips 34, Dojo System materials. Hot Chips 34 conference deck, Dojo System. Used for training tile as unit of scale, ~9 PFLOPS BF16 / CFP8 if verified, ~36 TB/s off-tile bandwidth, ~11 GB high-speed ECC SRAM, Dojo Interface Processor with 32 GB HBM, ~800 GB/s memory bandwidth, ~160 GB DRAM per tile edge, and ~13 TB high-bandwidth DRAM at ExaPOD scale. Conference materials cited as a public technical baseline; no Tesla AI Day or Hot Chips images are reproduced.

7 TSMC (Apr 2026). TSMC 2026 North America Technology Symposium press release. Used for the CoWoS 5.5-reticle / 14-reticle (2028 plan) / SoW-X 40-reticle (2029 plan) roadmap framing, the ~10 large compute dies + ~20 HBM stacks figure for 14-reticle CoWoS if verified, and the SoIC + COUPE co-packaged optics positioning. No TSMC diagrams reproduced.

8 Nvidia, GB200 NVL72 platform page. GB200 NVL72. Used for the comparative description of 36 Grace CPUs + 72 Blackwell GPUs forming a 72-GPU NVLink domain with ~130 TB/s of low-latency GPU communication, framed as Nvidia’s rack-scale “default path” system in this essay; numbers are Nvidia’s and are not independent benchmarks.

Further reading on this site