The AI Chip Software Wall.Original analysisNot investment advice
Graphcore’s IPU was not a stupid idea. But AI hardware is not only silicon. It is software, kernels, frameworks, benchmarks, cloud access, developer trust, customer support, and capital. Graphcore failed as an independent Nvidia challenger because it could not turn specialized silicon into a broad AI platform fast enough. Under SoftBank, it gets a second life as part of a larger AI compute ecosystem.
In 2021, Graphcore wanted to be the Nvidia alternative. The architecture was different. The branding was strong. The ambition was huge. The story was clean: GPUs were built for graphics first, but Graphcore’s IPU was built for machine intelligence from the ground up.
Then came MLPerf.
The uploaded SemiAnalysis article argued that Graphcore’s first major MLPerf showing exposed the problem. Graphcore submitted selectively, used different software stacks for different models, and marketed price/performance comparisons that looked much weaker once adjusted for silicon count, memory capacity, system configuration, and software maturity.1
The point was not only that Graphcore was slower on a few benchmarks. The point was that Graphcore looked incomplete as a platform.
That is the real lesson. In AI hardware, the chip is only the beginning.
Graphcore is the cleanest case study in the AI chip software wall. The IPU was architecturally interesting, but Nvidia’s advantage was not only GPU silicon. It was the full platform: CUDA, libraries, kernels, networking, cloud availability, developer trust, support, and capital. Graphcore failed as an independent Nvidia challenger because it could not turn specialized silicon into a broad AI platform fast enough. Under SoftBank, the company gets a second life, but as part of a larger AI compute ecosystem, not as a standalone GPU replacement.
I. The 2021 thesis
In July 2021, Dylan Patel published a SemiAnalysis piece on Graphcore’s MLPerf v1.0 submission. The piece noted that Graphcore submitted only four closed-division results, one per system, covering only ResNet-50 and BERT, with TensorFlow SDK used for ResNet-50 and PopART used for BERT. The argument was that the selective submission and split software stacks suggested benchmark cherry-picking and heavy hand tuning. The piece also walked through the IPU-POD16 vs DGX A100 marketing comparison, flagging that the IPU-POD16 used 16 IPUs against 8 A100s, had less memory capacity, was slower, leaned on the higher-priced 80GB A100 configuration, and chose DGX instead of lower-cost third-party A100 systems for the cost comparison. The deeper conclusion was that specialised AI silicon is not enough without a broad software platform.1
Graphcore’s MLPerf problem was not just benchmark speed. It was benchmark breadth, software maturity, scaling, and platform trust.
II. The IPU was interesting. The platform was the problem.
Graphcore’s IPU was not a dumb idea. The architectural argument made sense. AI workloads involve graphs, parallelism, irregular memory access, and rapid change. A processor designed around graph execution and fine-grained parallelism could theoretically be valuable.
But AI customers do not buy architectural beauty. They buy working systems. They need PyTorch, TensorFlow, compilers, optimised kernels, runtime support, distributed training, debugging tools, profiling tools, cloud availability, documentation, support engineers, stable roadmaps, and predictable performance.
A chip plus an SDK
An operating environment
Graphcore had an accelerator. Nvidia had an operating environment.
III. The benchmark problem was really a software problem
The 2021 SemiAnalysis piece interpreted Graphcore’s narrow submission and the split between TensorFlow SDK and PopART as evidence of heavy hand tuning and fragmented software, not as a one-off MLPerf strategy. The benchmark story weakened further when the comparison framing changed.1
A platform cannot rely on heroic engineering for every model. The models keep changing — ResNet, BERT, GPT-style transformers, diffusion, mixture-of-experts, long context, multimodal models, agents. A platform has to absorb all of them without rewriting kernels by hand for each one.
The enemy was not one benchmark. The enemy was model churn.
IV. Nvidia’s advantage was breadth
Nvidia’s MLPerf v1.0 write-up emphasised broad benchmark coverage and software-level optimisations across the stack, including CUDA Graphs, SHARP, fused attention, distributed LAMB, optimised collectives, DALI, and communication / computation overlap.2
Those advantages do not show up in one benchmark. They show up across every benchmark, year after year, as new models arrive.
A chip wins one workload. A platform absorbs new workloads.
V. Graphcore improved, but the market moved faster
Graphcore’s own 2022 communications reported MLPerf improvements: Bow Pod16 claimed roughly 31% faster on ResNet-50 versus the Nvidia DGX-A100 640GB and roughly 37% BERT improvement vs the previous MLPerf round, with Baidu’s PaddlePaddle submission cited alongside.5
These are Graphcore claims, not independent comparisons.
Graphcore’s 2022 improvement figures are useful direction-of-travel signals. They depend on the system configuration, software stack, and comparison set chosen by Graphcore. They are not universal performance evidence and they do not address the broader ecosystem and capital concerns that decided commercial outcomes.5
Graphcore was catching up to old benchmarks while the market was moving to new models. Benchmark improvement did not solve ecosystem weakness, and the workload mix kept shifting.
VI. MLPerf moved from BERT to Llama and Flux
MLCommons’ Training v5.1 results refresh emphasised newer generative-AI workloads, with Llama 3.1 8B replacing BERT and Flux.1 replacing Stable Diffusion v2, alongside broad system and organisation diversity.34 Graphcore is not listed among the submitting organisations in the v5.1 result announcement.
ResNet
BERT
GPT-style
Diffusion
Multimodal + agents
The strongest AI hardware companies keep showing up as the benchmark changes. Weak platforms disappear from the scoreboard.
VII. The business outcome exposed the capital problem
In July 2024, Reuters reported that SoftBank acquired Graphcore for undisclosed terms, noting that Graphcore had struggled to secure the investment needed to compete, that the company had been valued at $2.77B at the end of 2020, and that it had cut headcount by about a fifth and closed operations in Norway, Japan, and South Korea.6
AI hardware is expensive. You need silicon teams, compiler teams, kernel teams, systems teams, networking teams, customer engineers, cloud partnerships, manufacturing access, developer relations, and multiple chip generations. Sustained capital is not optional.
AI hardware startups do not only need product-market fit. They need capital-market fit.
VIII. Graphcore’s second life is SoftBank
Graphcore framed its July 2024 SoftBank announcement around becoming a wholly owned subsidiary, continuing to operate under the Graphcore name, and being backed for next-generation AI compute.7 SoftBank’s AI Computing segment page describes the segment as built around Arm, Ampere, and Graphcore, with Ampere positioned as energy-efficient Arm CPUs for AI-driven workloads and Graphcore as a team with experience in AI-specialised chips.8
Graphcore’s second chance is not to be Britain’s Nvidia. It is to become a useful silicon layer inside SoftBank’s AI compute strategy.
IX. The India campus is a rebuild signal
Graphcore announced a new AI Engineering Campus in Bengaluru, with an investment of up to £1B over the next decade and the creation of 500 new semiconductor jobs, framing the campus around SoftBank’s artificial super intelligence platform ambitions. The first roles cover silicon logical design, physical design, verification, characterisation, and bring-up.9
IPU hype
Graphcore’s second chance is not a benchmark result. It is a capitalisation and ecosystem reset.
X. Why CUDA was the wall
CUDA is Nvidia’s programming platform for GPU computing. But CUDA is not only syntax. CUDA is a mental model, libraries, kernels, documentation, debugging tools, profiling tools, university teaching, StackOverflow answers, cloud instances, benchmark recipes, enterprise deployments, a hiring market, and years of trust.
The hardest thing for an AI chip startup is not proving that its chip is clever. It is convincing developers to leave the default path.
XI. The full-stack AI hardware checklist
The cleanest way to read the Graphcore story is to apply a simple checklist to any AI hardware challenger. Beating Nvidia in a serious workload needs all of the following, not just the first one or two items.
A chip can be designed in a few years. A platform compounds over a decade.
XII. What Graphcore still has
Being fair to Graphcore matters. The company still has AI chip architecture experience, compiler and runtime experience, systems-architecture knowledge, engineering talent, accumulated IPU learnings, SoftBank capital, and potential Arm / Ampere ecosystem fit. The UK base and the new Indian campus add engineering capacity that does not exist at most challengers today.9
The IPU may not have beaten Nvidia, but the people, tools, and lessons may still be valuable.
XIII. What could break the new thesis?
The strongest bear case is that Graphcore may have been acquired because its talent was valuable, not because the IPU platform was commercially viable.
- Heavy investment, weak product. SoftBank may invest substantially without producing a competitive product.
- Legacy software. Graphcore’s old software burden may persist into the next generation.
- CUDA compounds. Nvidia’s CUDA moat may keep widening, year after year.2
- Crowded competitors. AMD, Google TPU, AWS Trainium, Microsoft Maia, Groq, Cerebras, and Tenstorrent are all chasing non-Nvidia workloads.
- Sprawl. SoftBank’s AI strategy may become too broad to execute well.
- Integration friction. Arm, Ampere, and Graphcore may not integrate cleanly.
- Talent over product. Graphcore may end up as talent infrastructure, not product infrastructure.
- Niche architecture. IPU-style architectures may remain too specialised.
- Default GPU pull. Developers may keep preferring CUDA and widely available cloud GPUs.
XIV. What could break the bear case?
The bull case is structural: AI compute is too large for one vendor to serve everything, and sovereign and enterprise AI buyers want alternatives.
- Market size. AI compute is too large for a single vendor to satisfy.
- Sovereign AI. National buyers want alternatives to a single foreign vendor.
- Capital. SoftBank has the balance sheet to fund a multi-year rebuild.6
- Arm reach. Arm gives ecosystem reach into mobile, automotive, and data centre.8
- Ampere CPU. Energy-efficient Arm CPUs for AI-adjacent workloads.8
- Compiler talent. Graphcore’s compiler and accelerator experience is real.
- Custom AI infra. Custom AI infrastructure may want specialised silicon.
- Vertical stack. A coordinated Arm + Ampere + Graphcore stack could matter.
- Lessons. IPU-era lessons can shape the next architecture.
Graphcore does not need to replace Nvidia to matter. It needs to become useful inside a larger AI compute strategy.
XV. What to watch
- Graphcore’s next announced product.
- Whether the IPU remains central or is replaced.
- SoftBank AI Computing segment updates.8
- Arm / Ampere / Graphcore integration evidence.
- Bengaluru hiring progress.9
- UK headcount growth.
- Compiler and runtime announcements.
- PyTorch coverage milestones.
- MLPerf re-entry, if it happens.3
- Inference benchmarks beyond MLPerf.
- Cloud availability of Graphcore systems.
- Customer announcements with named workloads.
- Sovereign AI partnerships and pilots.
- SoftBank data-center strategy and disclosures.
- Whether Graphcore becomes product or talent infrastructure.
- How clearly the Arm+Ampere+Graphcore story is told to customers.
Glossary
A short reference for the vocabulary used above. Definitions are simplified.
- IPU
- Intelligence Processing Unit, Graphcore’s AI accelerator.
- MLPerf
- Benchmark suite for measuring machine-learning training and inference performance.
- ResNet-50
- Classic image-classification model.
- BERT
- Transformer model used for NLP benchmarks.
- Llama
- Modern large-language-model family used in newer benchmark rounds.
- Flux.1
- Modern text-to-image model used in newer benchmark rounds.
- CUDA
- Nvidia’s GPU programming platform.
- cuDNN
- Nvidia deep-learning library.
- NCCL
- Nvidia communication library for multi-GPU systems.
- Kernel
- Optimised low-level operation used by AI frameworks.
- Compiler
- Software that maps model operations onto hardware.
- Runtime
- Software layer that executes compiled work on hardware.
- Distributed training
- Training a model across multiple chips or systems.
- Model churn
- Rapid change in AI model architectures and workload requirements.
- Capital-market fit
- Ability to raise enough capital to survive multiple hardware generations.
XVI. The AI chip software wall
Graphcore’s lesson is not that custom AI chips are impossible.
The lesson is that AI hardware is a full-stack war.
The IPU was interesting silicon. But Nvidia had the platform: CUDA, libraries, kernels, cloud availability, developer trust, support, and capital. Graphcore failed as an independent Nvidia challenger because it could not turn specialised hardware into a broad software ecosystem fast enough.
Hardware startups often hear that they only need product-market fit. The Graphcore story is a counter-example. They also need software-market fit, model-market fit, cloud-market fit, support-market fit, and capital-market fit. The chip is the start. The platform is the verdict.
That is the AI chip software wall.
1 Patel, D. (Jul 2021). Graphcore Looks Like A Complete Failure In Machine Learning Training Performance. SemiAnalysis. Historical anchor for the MLPerf v1.0 criticism, including the narrow Graphcore submission (four closed-division results, one per system, only ResNet-50 and BERT), the use of TensorFlow SDK for ResNet-50 and PopART for BERT, the IPU-POD16 vs DGX A100 comparison (16 IPUs vs 8 A100s, memory and time gaps, 80GB A100 pricing, DGX vs third-party systems), and the deeper argument that specialised AI silicon is not enough without a broad software platform. Used as inspiration only. No content, structure, or charts reproduced.
2 NVIDIA Developer (2021). MLPerf v1.0 Training Benchmarks: Insights Into A Record-Setting Performance. Broad benchmark coverage and software optimisations including CUDA Graphs, SHARP, fused attention, distributed LAMB, optimised collectives, DALI, and communication / computation overlap.
3 MLCommons (Nov 2025). MLPerf Training v5.1 results. Newer generative-AI workloads, broader system and organisation diversity. Used as the basis for noting that Graphcore is not listed among the submitting organisations in v5.1.
4 MLCommons. MLPerf Training benchmark archive. Benchmark evolution including the replacement of BERT with Llama 3.1 8B and Stable Diffusion v2 with Flux.1, plus newer LLM and reasoning workloads.
5 Graphcore (2022). Baidu results underscore big MLPerf gains for Graphcore. Graphcore’s own framing of Bow Pod16 ResNet-50 improvements (~31% vs Nvidia DGX-A100 640GB), BERT improvement (~37%) versus the previous MLPerf round, and the Baidu PaddlePaddle submission context. Treated here as Graphcore claims, not independent benchmarks.
6 Reuters (Jul 2024). Japan’s SoftBank acquires British AI chipmaker Graphcore. SoftBank’s acquisition with undisclosed terms, Graphcore’s difficulty securing the investment needed to compete, ~$2.77B valuation context at end of 2020, headcount cut of about a fifth, and closure of operations in Norway, Japan, and South Korea.
7 Graphcore (Jul 2024). Graphcore joins SoftBank Group to build next generation of AI compute. Graphcore becoming a wholly owned subsidiary of SoftBank, continuing under the Graphcore name, with the next-generation AI compute framing.
8 SoftBank. SoftBank AI Computing segment. The Arm / Ampere / Graphcore positioning, Ampere as energy-efficient Arm CPUs for AI-driven data-center workloads, and Graphcore as a team with experience in AI-specialised chips.
9 Graphcore. Graphcore to invest £1B in India, creating 500 semiconductor jobs. Up to £1B investment over the next decade, 500 new semiconductor jobs at the Bengaluru AI Engineering Campus, with first roles spanning silicon logical design, physical design, verification, characterisation, and bring-up.
- The Networked AI Bet. Tenstorrent’s open, Ethernet-native attack on the AI compute stack.
- The Wafer-Scale Latency Bet. Cerebras and the case for removing chip boundaries.
- The Inference Efficiency War. Qualcomm AI200 / AI250 and cost-per-token inference infrastructure.
- The Custom Silicon Flywheel. Hyperscalers turning their biggest workloads into chips.
- Nvidia’s Earnings Quality Test. AI capex, customer concentration, and the durability of revenue.
- Nvidia Built the AI Factory Anyway. Vertical system integration as the new moat.
- The Foundry Toll Road. Why TSMC’s pricing power got stronger in the AI era.
- The GAA Credibility Test. Samsung Foundry’s 2nm comeback as a trust test, not a transistor story.
- The Other Leading Edge. GlobalFoundries and the specialty foundry layer of AI infrastructure.
- When AI Runs Out of Copper. Optical I/O, co-packaged optics, and the race to replace copper with light.
- The Fab That Outlived 3D XPoint. Why Texas Instruments turned a failed memory fab into a 300mm analog asset.
- The AI Field Manual. Reference layer for the AI stack: hardware, memory, models, agents, safety, economics.
This is Essay No. 032. The topics: intelligence, AI, systems, knowledge, and the questions underneath the questions everyone else is asking. If you read this far and disagreed with any part of it, write to me. I read everything.