← Back to blog
Essay No. 032  ·  AI Infrastructure  ·  Melbourne, Australia
AI Infrastructure Graphcore Nvidia CUDA IPU MLPerf SoftBank Arm Ampere AI Accelerators Semiconductor Startups ML Training

The AI Chip Software Wall.Original analysisNot investment advice

Why Graphcore’s IPU shows that great silicon is not enough to beat Nvidia.
PM
Pugalenthi Magendran
April 2026  ·  Melbourne, Australia
12 min read

Graphcore’s IPU was not a stupid idea. But AI hardware is not only silicon. It is software, kernels, frameworks, benchmarks, cloud access, developer trust, customer support, and capital. Graphcore failed as an independent Nvidia challenger because it could not turn specialized silicon into a broad AI platform fast enough. Under SoftBank, it gets a second life as part of a larger AI compute ecosystem.

In 2021, Graphcore wanted to be the Nvidia alternative. The architecture was different. The branding was strong. The ambition was huge. The story was clean: GPUs were built for graphics first, but Graphcore’s IPU was built for machine intelligence from the ground up.

Then came MLPerf.

The uploaded SemiAnalysis article argued that Graphcore’s first major MLPerf showing exposed the problem. Graphcore submitted selectively, used different software stacks for different models, and marketed price/performance comparisons that looked much weaker once adjusted for silicon count, memory capacity, system configuration, and software maturity.1

The point was not only that Graphcore was slower on a few benchmarks. The point was that Graphcore looked incomplete as a platform.

That is the real lesson. In AI hardware, the chip is only the beginning.

Key idea

Graphcore is the cleanest case study in the AI chip software wall. The IPU was architecturally interesting, but Nvidia’s advantage was not only GPU silicon. It was the full platform: CUDA, libraries, kernels, networking, cloud availability, developer trust, support, and capital. Graphcore failed as an independent Nvidia challenger because it could not turn specialized silicon into a broad AI platform fast enough. Under SoftBank, the company gets a second life, but as part of a larger AI compute ecosystem, not as a standalone GPU replacement.


I. The 2021 thesis

In July 2021, Dylan Patel published a SemiAnalysis piece on Graphcore’s MLPerf v1.0 submission. The piece noted that Graphcore submitted only four closed-division results, one per system, covering only ResNet-50 and BERT, with TensorFlow SDK used for ResNet-50 and PopART used for BERT. The argument was that the selective submission and split software stacks suggested benchmark cherry-picking and heavy hand tuning. The piece also walked through the IPU-POD16 vs DGX A100 marketing comparison, flagging that the IPU-POD16 used 16 IPUs against 8 A100s, had less memory capacity, was slower, leaned on the higher-priced 80GB A100 configuration, and chose DGX instead of lower-cost third-party A100 systems for the cost comparison. The deeper conclusion was that specialised AI silicon is not enough without a broad software platform.1

2021 thesis

Graphcore’s MLPerf problem was not just benchmark speed. It was benchmark breadth, software maturity, scaling, and platform trust.

Diagram · MLPerf v1.0 training — Graphcore submission coverage
ResNet-50
BERT
SSD
Mask R-CNN
DLRM
RNN-T
3D U-Net
MiniGo
Graphcore submitted Not submitted
Eight closed-division training benchmarks. A simplified, original visual based on the 2021 SemiAnalysis description of Graphcore’s coverage.1

II. The IPU was interesting. The platform was the problem.

Graphcore’s IPU was not a dumb idea. The architectural argument made sense. AI workloads involve graphs, parallelism, irregular memory access, and rapid change. A processor designed around graph execution and fine-grained parallelism could theoretically be valuable.

But AI customers do not buy architectural beauty. They buy working systems. They need PyTorch, TensorFlow, compilers, optimised kernels, runtime support, distributed training, debugging tools, profiling tools, cloud availability, documentation, support engineers, stable roadmaps, and predictable performance.

Diagram · Graphcore stack vs Nvidia stack
Graphcore 2021

A chip plus an SDK

IPU silicon
Poplar / PopART / SDK
Selected model kernels
Cherry-picked benchmarks
Nvidia 2021

An operating environment

GPU silicon
CUDA
cuDNN / NCCL / TensorRT
DGX + networking
Cloud availability
Developer ecosystem
Broad model coverage
A simplified, original split. Not a Graphcore or Nvidia chart.

Graphcore had an accelerator. Nvidia had an operating environment.


III. The benchmark problem was really a software problem

The 2021 SemiAnalysis piece interpreted Graphcore’s narrow submission and the split between TensorFlow SDK and PopART as evidence of heavy hand tuning and fragmented software, not as a one-off MLPerf strategy. The benchmark story weakened further when the comparison framing changed.1

A platform cannot rely on heroic engineering for every model. The models keep changing — ResNet, BERT, GPT-style transformers, diffusion, mixture-of-experts, long context, multimodal models, agents. A platform has to absorb all of them without rewriting kernels by hand for each one.

The enemy was not one benchmark. The enemy was model churn.


IV. Nvidia’s advantage was breadth

Nvidia’s MLPerf v1.0 write-up emphasised broad benchmark coverage and software-level optimisations across the stack, including CUDA Graphs, SHARP, fused attention, distributed LAMB, optimised collectives, DALI, and communication / computation overlap.2

Those advantages do not show up in one benchmark. They show up across every benchmark, year after year, as new models arrive.

A chip wins one workload. A platform absorbs new workloads.


V. Graphcore improved, but the market moved faster

Graphcore’s own 2022 communications reported MLPerf improvements: Bow Pod16 claimed roughly 31% faster on ResNet-50 versus the Nvidia DGX-A100 640GB and roughly 37% BERT improvement vs the previous MLPerf round, with Baidu’s PaddlePaddle submission cited alongside.5

Reading the claims

These are Graphcore claims, not independent comparisons.

Graphcore’s 2022 improvement figures are useful direction-of-travel signals. They depend on the system configuration, software stack, and comparison set chosen by Graphcore. They are not universal performance evidence and they do not address the broader ecosystem and capital concerns that decided commercial outcomes.5

Graphcore was catching up to old benchmarks while the market was moving to new models. Benchmark improvement did not solve ecosystem weakness, and the workload mix kept shifting.


VI. MLPerf moved from BERT to Llama and Flux

MLCommons’ Training v5.1 results refresh emphasised newer generative-AI workloads, with Llama 3.1 8B replacing BERT and Flux.1 replacing Stable Diffusion v2, alongside broad system and organisation diversity.34 Graphcore is not listed among the submitting organisations in the v5.1 result announcement.

Diagram · The model churn benchmarks track
2015–19

ResNet

image classification
2018–21

BERT

transformers, NLP
2020–22

GPT-style

large language models
2022–24

Diffusion

image / video gen
2024–26

Llama + Flux

open LLMs, text-to-image4
2025+

Multimodal + agents

tool use, reasoning
A simplified, original timeline of AI workloads and the benchmarks that track them. Years are approximate.

The strongest AI hardware companies keep showing up as the benchmark changes. Weak platforms disappear from the scoreboard.


VII. The business outcome exposed the capital problem

In July 2024, Reuters reported that SoftBank acquired Graphcore for undisclosed terms, noting that Graphcore had struggled to secure the investment needed to compete, that the company had been valued at $2.77B at the end of 2020, and that it had cut headcount by about a fifth and closed operations in Norway, Japan, and South Korea.6

AI hardware is expensive. You need silicon teams, compiler teams, kernel teams, systems teams, networking teams, customer engineers, cloud partnerships, manufacturing access, developer relations, and multiple chip generations. Sustained capital is not optional.

AI hardware startups do not only need product-market fit. They need capital-market fit.


VIII. Graphcore’s second life is SoftBank

Graphcore framed its July 2024 SoftBank announcement around becoming a wholly owned subsidiary, continuing to operate under the Graphcore name, and being backed for next-generation AI compute.7 SoftBank’s AI Computing segment page describes the segment as built around Arm, Ampere, and Graphcore, with Ampere positioned as energy-efficient Arm CPUs for AI-driven workloads and Graphcore as a team with experience in AI-specialised chips.8

Diagram · SoftBank AI Computing segment, simplified
Architecture & IP

Arm

CPU IP, mobile and data-center ecosystem, standards influence.8
CPUs

Ampere

Energy-efficient Arm CPUs for AI-driven data-center workloads.8
AI accelerators

Graphcore

AI-specialised chip team, compiler talent, systems experience.8
A simplified, original framing of the SoftBank AI Computing segment based on SoftBank’s own segment page. Not an official org chart.

Graphcore’s second chance is not to be Britain’s Nvidia. It is to become a useful silicon layer inside SoftBank’s AI compute strategy.


IX. The India campus is a rebuild signal

Graphcore announced a new AI Engineering Campus in Bengaluru, with an investment of up to £1B over the next decade and the creation of 500 new semiconductor jobs, framing the campus around SoftBank’s artificial super intelligence platform ambitions. The first roles cover silicon logical design, physical design, verification, characterisation, and bring-up.9

Diagram · Graphcore arc — IPU hype to SoftBank rebuild
2016–20

IPU hype

Strong branding, big rounds, “built for AI” story.
2021

MLPerf criticism

Narrow submission; software fragmentation flagged.1
2022–23

Improvements

Bow Pod16 / BERT gains; market shifts to generative AI.5
2024

SoftBank deal

Wholly owned subsidiary; capital-market fit reset.67
2025+

India + UK rebuild

Bengaluru campus, £1B / 500 jobs framing.9
A simplified, original timeline. Dates are approximate; key milestones per the cited sources.

Graphcore’s second chance is not a benchmark result. It is a capitalisation and ecosystem reset.


X. Why CUDA was the wall

CUDA is Nvidia’s programming platform for GPU computing. But CUDA is not only syntax. CUDA is a mental model, libraries, kernels, documentation, debugging tools, profiling tools, university teaching, StackOverflow answers, cloud instances, benchmark recipes, enterprise deployments, a hiring market, and years of trust.

Diagram · What CUDA actually includes
Libraries
cuDNN / cuBLAS / NCCL
Optimised primitives and collectives.
Tooling
Nsight + profilers
Debug, profile, optimise at scale.
Cloud
Ubiquitous instances
First-class availability on every major cloud.
Trust
Enterprise + procurement
Default safe choice for IT and risk teams.
Knowledge
Docs + Q&A + courses
A decade of materials, examples, and answers.
Hiring
CUDA-fluent talent
Large engineering pool, easier to staff.
Models
Model coverage
Frontier and open models run on day one.
Benchmarks
Recipe culture
Shared MLPerf recipes and tuning playbooks.2
A simplified, original map of what CUDA actually delivers beyond syntax. Each tile is a separate switching cost for a challenger.

The hardest thing for an AI chip startup is not proving that its chip is clever. It is convincing developers to leave the default path.


XI. The full-stack AI hardware checklist

The cleanest way to read the Graphcore story is to apply a simple checklist to any AI hardware challenger. Beating Nvidia in a serious workload needs all of the following, not just the first one or two items.

Card · The full-stack AI hardware checklist
01
Competitive silicon
02
Stable compiler
03
Kernel library coverage
04
PyTorch integration
05
Distributed training
06
Inference serving stack
07
Debug + profiling tools
08
Benchmark breadth
09
Cloud availability
10
Customer support
11
Developer community
12
Multiple HW generations
13
Capital to keep going
14
Clear workload wedge
A simplified, original framework. Graphcore had interesting silicon and SDK pieces; the weak rows were breadth, model coverage, cloud, capital, and a clear sustained wedge.

A chip can be designed in a few years. A platform compounds over a decade.


XII. What Graphcore still has

Being fair to Graphcore matters. The company still has AI chip architecture experience, compiler and runtime experience, systems-architecture knowledge, engineering talent, accumulated IPU learnings, SoftBank capital, and potential Arm / Ampere ecosystem fit. The UK base and the new Indian campus add engineering capacity that does not exist at most challengers today.9

The IPU may not have beaten Nvidia, but the people, tools, and lessons may still be valuable.


XIII. What could break the new thesis?

The strongest bear case is that Graphcore may have been acquired because its talent was valuable, not because the IPU platform was commercially viable.

Bear case · what could break the new thesis
  1. Heavy investment, weak product. SoftBank may invest substantially without producing a competitive product.
  2. Legacy software. Graphcore’s old software burden may persist into the next generation.
  3. CUDA compounds. Nvidia’s CUDA moat may keep widening, year after year.2
  4. Crowded competitors. AMD, Google TPU, AWS Trainium, Microsoft Maia, Groq, Cerebras, and Tenstorrent are all chasing non-Nvidia workloads.
  5. Sprawl. SoftBank’s AI strategy may become too broad to execute well.
  6. Integration friction. Arm, Ampere, and Graphcore may not integrate cleanly.
  7. Talent over product. Graphcore may end up as talent infrastructure, not product infrastructure.
  8. Niche architecture. IPU-style architectures may remain too specialised.
  9. Default GPU pull. Developers may keep preferring CUDA and widely available cloud GPUs.

XIV. What could break the bear case?

The bull case is structural: AI compute is too large for one vendor to serve everything, and sovereign and enterprise AI buyers want alternatives.

Bull case · what could break the bear
  1. Market size. AI compute is too large for a single vendor to satisfy.
  2. Sovereign AI. National buyers want alternatives to a single foreign vendor.
  3. Capital. SoftBank has the balance sheet to fund a multi-year rebuild.6
  4. Arm reach. Arm gives ecosystem reach into mobile, automotive, and data centre.8
  5. Ampere CPU. Energy-efficient Arm CPUs for AI-adjacent workloads.8
  6. Compiler talent. Graphcore’s compiler and accelerator experience is real.
  7. Custom AI infra. Custom AI infrastructure may want specialised silicon.
  8. Vertical stack. A coordinated Arm + Ampere + Graphcore stack could matter.
  9. Lessons. IPU-era lessons can shape the next architecture.

Graphcore does not need to replace Nvidia to matter. It needs to become useful inside a larger AI compute strategy.


XV. What to watch

What to watch
  • Graphcore’s next announced product.
  • Whether the IPU remains central or is replaced.
  • SoftBank AI Computing segment updates.8
  • Arm / Ampere / Graphcore integration evidence.
  • Bengaluru hiring progress.9
  • UK headcount growth.
  • Compiler and runtime announcements.
  • PyTorch coverage milestones.
  • MLPerf re-entry, if it happens.3
  • Inference benchmarks beyond MLPerf.
  • Cloud availability of Graphcore systems.
  • Customer announcements with named workloads.
  • Sovereign AI partnerships and pilots.
  • SoftBank data-center strategy and disclosures.
  • Whether Graphcore becomes product or talent infrastructure.
  • How clearly the Arm+Ampere+Graphcore story is told to customers.

Glossary

A short reference for the vocabulary used above. Definitions are simplified.

Glossary
IPU
Intelligence Processing Unit, Graphcore’s AI accelerator.
MLPerf
Benchmark suite for measuring machine-learning training and inference performance.
ResNet-50
Classic image-classification model.
BERT
Transformer model used for NLP benchmarks.
Llama
Modern large-language-model family used in newer benchmark rounds.
Flux.1
Modern text-to-image model used in newer benchmark rounds.
CUDA
Nvidia’s GPU programming platform.
cuDNN
Nvidia deep-learning library.
NCCL
Nvidia communication library for multi-GPU systems.
Kernel
Optimised low-level operation used by AI frameworks.
Compiler
Software that maps model operations onto hardware.
Runtime
Software layer that executes compiled work on hardware.
Distributed training
Training a model across multiple chips or systems.
Model churn
Rapid change in AI model architectures and workload requirements.
Capital-market fit
Ability to raise enough capital to survive multiple hardware generations.

XVI. The AI chip software wall

Graphcore’s lesson is not that custom AI chips are impossible.

The lesson is that AI hardware is a full-stack war.

The IPU was interesting silicon. But Nvidia had the platform: CUDA, libraries, kernels, cloud availability, developer trust, support, and capital. Graphcore failed as an independent Nvidia challenger because it could not turn specialised hardware into a broad software ecosystem fast enough.

In 2026, its second life is inside SoftBank. The question is no longer whether Graphcore replaces GPUs. The question is whether it becomes a useful silicon layer in a much larger AI compute strategy.

Hardware startups often hear that they only need product-market fit. The Graphcore story is a counter-example. They also need software-market fit, model-market fit, cloud-market fit, support-market fit, and capital-market fit. The chip is the start. The platform is the verdict.

That is the AI chip software wall.


1 Patel, D. (Jul 2021). Graphcore Looks Like A Complete Failure In Machine Learning Training Performance. SemiAnalysis. Historical anchor for the MLPerf v1.0 criticism, including the narrow Graphcore submission (four closed-division results, one per system, only ResNet-50 and BERT), the use of TensorFlow SDK for ResNet-50 and PopART for BERT, the IPU-POD16 vs DGX A100 comparison (16 IPUs vs 8 A100s, memory and time gaps, 80GB A100 pricing, DGX vs third-party systems), and the deeper argument that specialised AI silicon is not enough without a broad software platform. Used as inspiration only. No content, structure, or charts reproduced.

2 NVIDIA Developer (2021). MLPerf v1.0 Training Benchmarks: Insights Into A Record-Setting Performance. Broad benchmark coverage and software optimisations including CUDA Graphs, SHARP, fused attention, distributed LAMB, optimised collectives, DALI, and communication / computation overlap.

3 MLCommons (Nov 2025). MLPerf Training v5.1 results. Newer generative-AI workloads, broader system and organisation diversity. Used as the basis for noting that Graphcore is not listed among the submitting organisations in v5.1.

4 MLCommons. MLPerf Training benchmark archive. Benchmark evolution including the replacement of BERT with Llama 3.1 8B and Stable Diffusion v2 with Flux.1, plus newer LLM and reasoning workloads.

5 Graphcore (2022). Baidu results underscore big MLPerf gains for Graphcore. Graphcore’s own framing of Bow Pod16 ResNet-50 improvements (~31% vs Nvidia DGX-A100 640GB), BERT improvement (~37%) versus the previous MLPerf round, and the Baidu PaddlePaddle submission context. Treated here as Graphcore claims, not independent benchmarks.

6 Reuters (Jul 2024). Japan’s SoftBank acquires British AI chipmaker Graphcore. SoftBank’s acquisition with undisclosed terms, Graphcore’s difficulty securing the investment needed to compete, ~$2.77B valuation context at end of 2020, headcount cut of about a fifth, and closure of operations in Norway, Japan, and South Korea.

7 Graphcore (Jul 2024). Graphcore joins SoftBank Group to build next generation of AI compute. Graphcore becoming a wholly owned subsidiary of SoftBank, continuing under the Graphcore name, with the next-generation AI compute framing.

8 SoftBank. SoftBank AI Computing segment. The Arm / Ampere / Graphcore positioning, Ampere as energy-efficient Arm CPUs for AI-driven data-center workloads, and Graphcore as a team with experience in AI-specialised chips.

9 Graphcore. Graphcore to invest £1B in India, creating 500 semiconductor jobs. Up to £1B investment over the next decade, 500 new semiconductor jobs at the Bengaluru AI Engineering Campus, with first roles spanning silicon logical design, physical design, verification, characterisation, and bring-up.

Further reading
*   *   *

This is Essay No. 032. The topics: intelligence, AI, systems, knowledge, and the questions underneath the questions everyone else is asking. If you read this far and disagreed with any part of it, write to me. I read everything.

Pugalenthi Magendran