The Networked AI Bet.Original analysisNot investment advice
Tenstorrent’s 2021 Wormhole idea was not really about one chip. It was about making scale-out AI look like one programmable mesh. In 2026, that idea has become the Networked AI bet: open software, RISC-V IP, Ethernet-native scaling, and switch-light AI systems.
In 2021, Tenstorrent Wormhole looked like one of the more interesting AI-chip ideas outside Nvidia. Not because it had the biggest die. Not because it used the most advanced node. Not because it copied the GPU.
The idea was different.
Tenstorrent wanted to make AI compute look like a mesh.
The uploaded SemiAnalysis article described Wormhole as a scale-out architecture built around Tensix cores, packetized mini-tensors, an internal network-on-chip, GDDR6 memory, and 16 ports of 100Gb Ethernet.1 The real claim was that Tenstorrent could extend the chip’s internal fabric across chips, servers, and racks so software could see one large mesh of cores instead of a painful hierarchy of GPUs, NICs, switches, and manually partitioned models.
That was the 2021 bet.
In 2026, the same idea has a clearer name: Networked AI.
The correct thesis is not “Tenstorrent will kill Nvidia.” The correct thesis is that Tenstorrent is one of the more interesting attacks on Nvidia because it is not trying to copy Nvidia. It is attacking the system differently: open software, RISC-V IP, Ethernet-native scale-out, switch-light fabrics, and cost-efficient AI serving.
I. The 2021 thesis
In June 2021, Dylan Patel published a SemiAnalysis piece arguing that Wormhole was not interesting for its FLOPS. It was interesting for its scale-out architecture. The chip integrated compute, memory, network-on-chip, and 16×100GbE links into one die, and Tensix cores carried not only compute but also routing and packet-management logic. Data moved as packetized mini-tensors. Nebula and Galaxy were the early system-level expressions of this idea, and the goal was to make communication across cores, chips, servers, and racks look uniform to software.1
The 2021 essay was excited but explicitly skeptical about whether the compiler could really place and route work efficiently across the mesh without congestion. That skepticism aged well. The point still holds five years later.
Wormhole was not just an AI chip. It was a bet that scale-out AI could be made easier by turning chips, servers, and racks into one programmable mesh of Tensix cores.
II. The network is the architecture
Most AI systems are hierarchical. Inside the chip, data moves one way. Across accelerators, it moves another way. Across servers, it moves another way. Across racks, it crosses expensive networking. The software has to understand all of that.
Tenstorrent’s idea is to flatten the hierarchy. The SemiAnalysis 2021 description was that mini-tensor packets move through the mesh, cores include router and packet-manager logic, and sending data between cores on the same chip should look similar to sending data across chips when the network-on-chip extends over Ethernet. The compiler then maps work across that mesh.1
Hierarchy
Mesh
The network was not an accessory. The network was the architecture.
III. Why this matters more in 2026
AI changed after 2021. The market is no longer only about training bigger models. It is about serving them.
Production AI cares about tokens per dollar, power per token, memory per user, latency, time to first token, video-generation throughput, model bring-up speed, model churn, deployment control, private AI, sovereign AI, and avoiding vendor lock-in. ASML’s 2025 Annual Report describes AI as requiring leading-edge processors and a significant increase in DRAM relative to traditional compute. TSMC describes AI demand as the dominant driver of advanced-node and advanced-packaging usage.1314
That demand is enormous, and it is starting to feel concentrated.
AI infrastructure is becoming too expensive, too closed, and too dependent on one vendor. That is Tenstorrent’s opening.
IV. Blackhole is the first real test
Many AI-chip startups never escape slides. Tenstorrent has shipped. In April 2025, the company announced its Blackhole developer products at Tenstorrent Dev Day, including the Blackhole p100 card starting at $999, the p150 card starting at $1,399, and a TT-QuietBox workstation with four Blackhole processors starting at $11,999. Tenstorrent describes Blackhole as a second-generation Tensix architecture built on a 6nm-class process with a faster NoC, higher memory density, integrated RISC-V cores, and an open-source software stack.2
Networked AI
The first job of an AI-chip startup is not beating Nvidia. The first job is shipping.
V. Galaxy is Wormhole turned into a system
The clearest 2026 expression of the Networked AI idea is Galaxy. Tenstorrent describes Galaxy Blackhole as a system with 32 Blackhole ASICs, 23 PFLOPS Block FP8 compute, 6.2GB of SRAM at 2.9PB/s, 1TB of GDDR6 at 16TB/s, 10×400GbE links per ASIC, and up to 56×800GbE QSFP-DD scale-out ports. The company lists Galaxy Blackhole at $110,000 and a four-Galaxy supercluster starting around $440,000.3
That product is what Wormhole was always pointing at: compute, memory, and networking integrated as one fabric, not assembled out of separate proprietary parts. Tenstorrent’s TT-Deploy framing describes Galaxy as production hardware engineered for AI inference at scale, with Ethernet-native interconnect designed to keep switching simple.4
Tenstorrent is not trying to win by building the biggest GPU. It is trying to build a cheaper, open, Ethernet-native AI fabric.
VI. The benchmark claims show direction, not verdict
Tenstorrent’s performance announcement says Galaxy can reach 350+ tokens/sec/user on DeepSeek-R1-0528 671B at 100K context with sub-4-second time-to-first-token, plus a roughly 10× video-generation speedup with Prodia, including 720p 81-frame video generated in about 2.4 seconds.5
These are Tenstorrent claims, not independent proof.
Treat every number above as a vendor benchmark. Independent reproduction with comparable models, contexts, batch sizes, networking, and software releases is what would turn these into infrastructure-grade evidence. Until then, they show the workloads Tenstorrent wants to compete on, not the workloads it has demonstrably won.
The strategic point is not the exact benchmark number. It is where Tenstorrent wants to compete: production inference, video generation, latency-sensitive serving, and cost per token, not necessarily the hardest training workloads.
The proof will be independent benchmarks, customer deployments, uptime, model coverage, and real cost per token.
VII. Open software is the anti-CUDA argument
Tenstorrent’s software docs describe a layered open-source stack. TT-Metalium is the low-level programming model and SDK for Tensix hardware. TT-NN is a Python/C++ neural-network operator library built on top of Metalium. TT-Forge is an MLIR-based compiler stack with frontends including TT-Torch, TT-XLA, and TT-Forge-ONNX, plus a shared TT-MLIR backend that lowers into Metalium.789
Nvidia’s moat is CUDA. But CUDA is not just syntax. CUDA is libraries, debugging, profiling, cloud support, enterprise trust, documentation, production experience, and a decade of ecosystem memory.
What you actually get with CUDA
- Libraries · cuDNN, cuBLAS, NCCL, TensorRT, Triton ecosystem.
- Tooling · Nsight, profilers, debuggers, observability.
- Cloud · first-class availability on every major cloud.
- Enterprise · trusted procurement and support paths.
- Ecosystem memory · a decade of production battle-testing.
What the open stack offers instead
- Open source · readable, inspectable, modifiable stack.
- Architecture simplicity · mesh + Ethernet, fewer hidden layers.
- Lower friction · affordable dev kits, public docs, public repos.
- Less lock-in · portable across deployments and partners.
- Custom silicon path · IP licensing for those who want their own chip.
Nvidia’s moat is CUDA. Tenstorrent’s counterargument is open software plus architecture-level simplicity.
VIII. The IP business may matter as much as the boxes
Tenstorrent’s December 2024 Series D announcement says the company raised over US$693M at a US$2B pre-money valuation, with strategic investors including Samsung Securities, AFW Partners, LG Technology Ventures, Hyundai Motor Group, Fidelity, Baillie Gifford, and Bezos Expeditions. The same materials describe Tenstorrent’s product line as both AI computers and licensable AI/RISC-V IP.10
EE Times has reported separately that Tenstorrent is productising its RISC-V CPU and AI cores as licensable IP, that LG and Hyundai are IP licensees, and that the majority of bookings to date came from IP deals rather than systems sales.11
Tenstorrent is not only competing with Nvidia boxes. It is also competing for the future of custom AI silicon.
IX. Sovereign AI makes the story stronger
Reuters reported in November 2024 that Japan partnered with Tenstorrent on a $50M program to train up to 200 Japanese chip designers over five years, connected to the country’s Rapidus ecosystem and broader semiconductor revival.12
That program is small in dollar terms and large in symbolic terms. Countries are increasingly explicit about wanting control over AI infrastructure: local chip-design skills, custom silicon, supply-chain optionality, open architectures, fewer black boxes, alternatives to Nvidia dependency, and domestic capability.
Sovereign AI is not only about models. It is about who controls the chips, tools, and skills underneath the models.
X. The execution risk is real
None of this matters if the boring product realities do not hold. The clearest example sits in Tenstorrent’s own firmware release notes. Starting January 2026, Blackhole p150 cards ship with 120 Tensix cores instead of the original 140, and firmware v19.5.0 changes existing cards to expose 120 cores to unify the developer interface. Tenstorrent says typical workloads see only a 1–2% performance difference, but developers using grid-size-dependent code may need to update their applications.15
The boring product realities of AI hardware.
Yield. Firmware. Driver compatibility. Application updates. Documentation freshness. Thermal reliability. Procurement confidence. Long-term roadmap trust. Support quality. These are the layers that decide whether a smart architecture turns into a deployable platform. They do not appear in benchmark slides.
Open, developer-friendly hardware still has to survive boring product realities.
XI. Where Nvidia still wins
Be fair to Nvidia. The advantages are real and they are not only chips. Nvidia owns CUDA maturity, developer trust, training dominance, inference maturity, the NVLink / InfiniBand / Spectrum-X stack, cloud availability, enterprise procurement trust, model compatibility, ecosystem tooling, support, performance on the hardest workloads, and a decade of software optimisation depth. ASML and TSMC both describe AI demand as the dominant pull on advanced logic, memory, and packaging, and that pull lands most heavily on Nvidia silicon today.1314
Tenstorrent may have a smart architecture, but Nvidia has the most proven AI infrastructure machine in the world.
XII. What could break the thesis?
The strongest bear case is that Nvidia is not just a chip company. It is the default AI operating environment.
- CUDA stays too strong. A decade of libraries, tooling, and developer habit does not unwind quickly.
- Software immaturity. TT-Metalium, TT-NN, and TT-Forge need to keep up with frontier model churn.7
- Benchmark non-replication. Company tokens/sec and TTFT figures need independent reproduction.5
- Model coverage lag. New open and closed models appear faster than ports.
- Safety preference. Customers may pay more for Nvidia simply because it is safer to defend internally.
- Networking gap. Ethernet-native scale-out may not match NVLink/IB at the hardest training scale.
- Hyperscaler in-house. Cloud vendors increasingly prefer their own silicon.
- AMD wakes up. A stronger ROCm and MI roadmap could absorb the “Nvidia alternative” slot.
- Reliability. Support quality and uptime are where many startups die.
- Yield and revisions. p150 120-core firmware adjustments are minor but symbolic of the risk.15
- Porting friction. Open source does not erase the real cost of moving production workloads.
- Mindshare. Developer attention is concentrated, and concentration compounds.
- IP company outcome. Tenstorrent may end up valued more as an IP business than a systems company.11
XIII. What could break the bear case?
The strongest bull case is that AI is becoming too large, too expensive, and too politically important for one closed stack to satisfy every customer.
- Workloads keep changing. Inference, agentic systems, and video generation reward flexibility over peak training throughput.
- Cost per token wins. As AI scales, every unit of inference cost compounds.
- Customers want alternatives. Procurement teams do not like single-vendor risk.
- Sovereign AI grows. National buyers want control they cannot get from a single foreign vendor.12
- Open software compounds. Inspectable stacks are more valuable as agents and regulations mature.
- RISC-V adoption rises. The base of open IP grows across CPUs, NPUs, and SoCs.11
- AI coding tools reduce porting friction. Model porting becomes cheaper to attempt.
- Ethernet-native systems integrate well. Most data centers already speak Ethernet.3
- Niche wins are enough. Inference, video, private AI, and custom silicon are large markets.
- IP business is high-leverage. A licensing engine pays even if systems do not displace Nvidia.11
AI is becoming too large, too expensive, and too politically important for one closed stack to satisfy every customer.
XIV. What to watch
If the Networked AI bet is real, certain signals should keep showing up across customer announcements, benchmarks, and roadmap notes. If it is fragile, the cracks will appear in the same places first.
- Independent Galaxy benchmarks.
- Real customer deployments at scale.
- DeepSeek, Llama, Qwen, and open-model coverage.
- Cost per token vs Nvidia and AMD.
- Time-to-first-token under real load.
- Video-generation throughput in production.
- Uptime and reliability across long runs.
- Firmware stability and release cadence.
- Blackhole p150 120-core transition impact.15
- TT-Metalium, TT-NN, TT-Forge maturity.
- vLLM and serving-stack support.
- Cloud availability of Tenstorrent systems.
- Developer community growth.
- IP licensing revenue trajectory.11
- LG, Hyundai, and automotive traction.
- Sovereign AI partnerships beyond Japan.12
- Rapidus and broader RISC-V design ecosystem progress.
- Samsung Foundry and TSMC manufacturing roadmap.
- Nvidia’s software and pricing response.
- AMD’s software progress.
Glossary
A short reference for the vocabulary used above. Definitions are simplified.
- Tensix core
- Tenstorrent’s AI compute core combining compute, SRAM, routing, and packet-management logic.
- Mini-tensor
- A smaller packetised tensor unit used inside Tenstorrent’s mesh architecture.
- NoC
- Network-on-chip; a communication fabric inside a chip.
- Ethernet scale-out
- Using Ethernet links to connect many accelerators or systems together.
- SRAM
- Fast memory placed close to compute on the die.
- GDDR6
- Graphics memory used by some AI accelerators for bandwidth.
- FP8 / Block FP8
- Low-precision numerical formats used for AI compute throughput.
- CUDA
- Nvidia’s software platform for GPU computing and AI development.
- RISC-V
- An open instruction-set architecture used for CPUs and custom silicon.
- IP licensing
- Selling reusable CPU or AI core designs to other chip designers.
- TT-Metalium
- Tenstorrent’s low-level programming model and SDK.
- TT-NN
- Tenstorrent’s neural-network operator library.
- TT-Forge
- Tenstorrent’s MLIR-based compiler stack.
- Sovereign AI
- Nationally controlled AI infrastructure, skills, and supply chains.
- Cost per token
- The cost of producing AI model outputs.
- Time to first token
- The latency before a model starts generating output.
XV. The Networked AI bet
Tenstorrent’s 2021 Wormhole idea was not really about one chip. It was about making scale-out AI look like one programmable mesh. In 2026, that idea has become the Networked AI bet: open software, RISC-V IP, Ethernet-native scaling, and switch-light AI systems.
Nvidia still owns the default AI stack. That is not changing overnight. CUDA, libraries, networking, cloud availability, developer trust, and a decade of production battle-testing are not numbers you replace with a slide.
If customers keep wanting lower cost, more openness, more deployment control, and alternatives to a single closed ecosystem, Tenstorrent becomes worth watching. If sovereign AI buyers keep wanting local capability and inspectable stacks, the IP side of the business may matter as much as the box side. If open-source AI hardware can hold up under boring product realities, including yield, firmware, and support, then the Networked AI bet stops being a story about one company and becomes a story about how AI compute gets organised in the next decade.
The proof is still ahead. Independent benchmarks. Real deployments. Reliability. Roadmap discipline. Software maturity. Customer growth.
But the direction is clear. AI hardware is no longer only about who builds the biggest chip. It is about who builds the most useful network.
1 Patel, D. (Jun 2021). Tenstorrent Wormhole Analysis — A Scale Out Architecture for Machine Learning That Could Put Nvidia On Their Back Foot. SemiAnalysis. Historical anchor for the 2021 Wormhole thesis, including Tensix cores, mini-tensor packets, network-on-chip, GDDR6 memory, 16×100GbE links, Nebula and Galaxy topology, and the uniform-mesh software model. Used as inspiration only. No content, structure, or charts reproduced.
2 Tenstorrent (Apr 2025). Tenstorrent launches Blackhole developer products at Tenstorrent Dev Day. Blackhole p100 from $999, p150 from $1,399, TT-QuietBox from $11,999, plus framing of the 6nm-class node, NoC, memory, RISC-V cores, and open software stack.
3 Tenstorrent. Galaxy. Galaxy Blackhole with 32 Blackhole ASICs, 23 PFLOPS Block FP8, 6.2GB SRAM at 2.9PB/s, 1TB GDDR6 at 16TB/s, 10×400GbE per ASIC, up to 56×800GbE QSFP-DD scale-out ports, with pricing of $110,000 for one Galaxy Blackhole and from $440,000 for the four-Galaxy supercluster.
4 Tenstorrent. TT-Deploy. Galaxy production framing, integrated compute / SRAM / DRAM / networking story, and Tenstorrent’s positioning around AI inference at scale.
5 Tenstorrent. Tenstorrent enables AI at scale with industry-leading performance. Company claims of 350+ tokens/sec/user on DeepSeek-R1-0528 671B, 100K context, sub-4-second time-to-first-token, and a roughly 10× Prodia video-generation speedup with 720p 81-frame video in about 2.4 seconds. Treated in this essay as vendor benchmarks, not independent results.
6 Tenstorrent. Tenstorrent software stack overview. TT-Metalium, TT-NN, and TT-Forge described as the main stack components.
7 Tenstorrent. Software stack getting-started docs. Layered model with TT-Metalium at the bottom, TT-NN as the neural-network library, and TT-Forge as the compiler-level entry point.
8 Tenstorrent. TT-Metalium documentation. Low-level programming model and SDK for Tensix hardware.
9 Tenstorrent. TT-Forge documentation. MLIR-based compiler stack with TT-Torch, TT-XLA, and TT-Forge-ONNX frontends.
10 Tenstorrent (Dec 2024). Tenstorrent closes $693M Series D. Over US$693M raised at a US$2B pre-money valuation, with strategic investors including Samsung Securities, AFW Partners, LG Technology Ventures, Hyundai Motor Group, Fidelity, Baillie Gifford, and Bezos Expeditions, plus AI and RISC-V IP licensing framing.
11 Brown, S. Tenstorrent productises RISC-V CPU and AI IP. EE Times. IP-business framing including LG and Hyundai licensees and the reported share of bookings coming from IP deals.
12 Mukherjee, S. and Mukherjee, S. (Nov 2024). Japan taps US chip startup Tenstorrent to help train new wave of engineers. Reuters. $50M program, up to 200 Japanese chip designers, five-year horizon, Rapidus and RISC-V context.
13 ASML (2025). 2025 Annual Report, strategic report section. AI requires leading-edge high-performance processors and a significant increase in DRAM relative to traditional compute architectures.
14 TSMC. 2025 Annual Report. Robust AI-related demand, advanced packaging and 3D stacking investment, and the role of advanced logic and packaging for AI/HPC.
15 Tenstorrent. tt-zephyr-platforms release notes 19.5. Starting January 2026, Blackhole p150 cards ship with 120 Tensix cores instead of 140. Firmware v19.5.0 exposes 120 cores on existing cards for a unified interface, with typical workloads seeing a 1–2% performance difference and possible application updates for grid-size-dependent code.
- The Foundry Toll Road. Companion essay on why TSMC’s pricing power got stronger in the AI era.
- The Custom Silicon Flywheel. Hyperscalers turning their biggest workloads into chips.
- The Inference Efficiency War. Qualcomm AI200/AI250 and cost-per-token inference infrastructure.
- The Other Leading Edge. GlobalFoundries and the specialty foundry layer of AI infrastructure.
- When AI Runs Out of Copper. Optical I/O, co-packaged optics, and the race to replace copper with light.
- The AI Memory Wall. DRAM, HBM, packaging, and semicap as the new centre of computing.
- The AI Memory Tax. AI servers repricing DRAM, NAND, and consumer electronics.
- The Density Illusion. Why Moore’s Law became a system problem.
- Nvidia Built the AI Factory Anyway. Vertical system integration as the new moat.
- Nvidia’s Earnings Quality Test. AI capex, customer concentration, and durability of revenue.
- MediaTek and the Fragmented Compute War. A neutral fabless platform in a bifurcated compute world.
- The AI Field Manual. Reference layer for the AI stack: hardware, memory, models, agents, safety, economics.
This is Essay No. 027. The topics: intelligence, AI, systems, knowledge, and the questions underneath the questions everyone else is asking. If you read this far and disagreed with any part of it, write to me. I read everything.