AI Taxonomy

The AI Map

A practical view of AI, from energy and chips to models and real-world applications.

AI is not one thing. It is a stack. From energy and silicon to infrastructure, models, and applications, each layer enables the one above it. This page is my attempt to map the field clearly.

The five-layer stack. Click any layer to filter.

Strategic note · framing the stack

Does energy decide who wins AI?

Energy is the hard floor. But the winner is whoever turns electricity into useful intelligence most efficiently.

The claim "whoever has the most energy wins AI" is partly true, but incomplete. A better version: whoever can turn electricity into useful intelligence at the lowest cost, fastest deployment speed and largest scale has the advantage.

AI is becoming an industrial-scale electricity problem, so energy matters. But energy alone does not win. The real edge is intelligence-per-watt, intelligence-per-dollar, intelligence-per-chip, and distribution into real products. Lack of energy can make you lose. Having energy does not guarantee you win.

The better equation

Three framings of the same question, in order of usefulness.

Old Most energy wins.

Better Most useful intelligence per watt wins.

Best Useful intelligence per watt × chips × infrastructure × models × applications.

Where each layer matters

Five compact reads on what each layer actually decides — colour-coded to the cake-stack above.

Energy

Energy sets the ceiling

Power decides how much compute can physically run. Cheap, reliable, fast-to-connect electricity becomes a moat once data centres reach hundreds of megawatts or gigawatt scale. But raw energy without chips, infra, models and customers is just unused capacity.

Chips

Chips change the conversion rate

Better accelerators turn the same electricity into more tokens, lower latency and cheaper training or inference. A chip advantage can beat a pure energy advantage — the question is not megawatts alone, but useful compute per megawatt.

Infrastructure

Infra decides utilisation

Poor infrastructure wastes expensive chips and electricity. Scheduling, networking, cooling, serving, batching, caching, observability and distributed training decide whether hardware becomes reliable AI output or idle capital.

Models

Models decide capability per compute

Better architectures, better data, better training, distillation, sparsity, MoE, state-space models and reasoning systems can change how much capability you get from the same compute budget.

Applications

Applications decide value

Even frontier capability is not automatically a business. Applications turn model output into workflows, revenue, productivity, trust and distribution. The best stack still loses if it does not solve a real problem.

What if chips, infra or models get much better?

Three scenarios that move the bottleneck.

01If chips get better

Energy per token falls.
Total demand may still rise — cheaper AI creates more usage.
Jevons-style effect: efficiency can expand consumption instead of reducing it.

ConclusionBetter chips weaken the energy bottleneck per task, but may increase total AI deployment.

02If infrastructure gets better

Higher GPU utilisation.
Better batching, caching, routing and serving.
A smaller well-run cluster can outperform a larger but poorly run one.

ConclusionInfra is what turns raw power and chips into actual throughput.

03If models get better

Better architectures can reduce brute-force scaling.
Transformers dominate today, but they are not guaranteed to be the final architecture.
Efficiency patterns: MoE, state-space models like Mamba, retrieval, distillation, quantisation, sparsity, routing, small specialists.

ConclusionA large architecture breakthrough could shift the bottleneck — until compute demand disappears, energy still matters.

Bottleneck shifts

The winning layer is usually the current bottleneck.

Energy → Chips → Infrastructure → Models → Applications

When power is scarceenergy wins.
When GPUs are scarcechips win.
When utilisation is poorinfrastructure wins.
When capability is weakmodels win.
When adoption is weakapplications win.

My read

For the next few years, energy is one of the major bottlenecks in frontier AI because data centres are becoming industrial power projects. But energy is not the whole game. The winner is not simply the country or company with the most electricity. The winner is the one that combines cheap reliable power, advanced chips, high-utilisation infrastructure, efficient models, and applications that turn capability into real economic value.

Energy will not automatically win the AI war. But lack of energy can make you lose it.

Sources

IEA — Key Questions on Energy and AI · global data-centre electricity use is projected to roughly double from 485 TWh in 2025 to 950 TWh in 2030, with AI-focused data-centre electricity tripling in that window.
Vaswani et al. — Attention Is All You Need · introduced the Transformer, an attention-based architecture that displaced recurrent and convolutional sequence models.
Gu & Dao — Mamba · alternative sequence-model architecture; original paper claims linear scaling in sequence length and higher inference throughput than Transformers.
Mistral AI — Mixtral of Experts · sparse MoE example: many total parameters, fewer active per token.

These are evidence that architecture and efficiency research is active — not proof that Transformers are obsolete.

Energy layer · deep dive

Energy: the hard floor under AI.

Every AI system begins as an electricity problem. Chips turn electricity into computation; data centres turn grid capacity into intelligence. The countries and companies that can secure cheap, reliable, scalable power will have a structural advantage in AI.

latest available

From power plant to token

Energy 101 — the mental model

Eight short cards. Read them once and you will understand most newspaper energy stories better than the journalist writing them.

Misconceptions to drop

The fastest way to think clearly about AI infrastructure is to stop carrying these around.

Global electricity dashboard — 2025 update

Latest full-year picture of how much electricity the world generated in 2025, what it came from, and how data centres fit into the global mix.

Global electricity generation

Renewables + nuclear share

Sources: Ember Global Electricity Review 2026; IEA Global Energy Review 2026; IEA Key Questions on Energy and AI 2026.

Country comparison

Twenty-one countries and regions, ordered by relevance to AI infrastructure planning. Type to search by name, region, source mix or bottleneck.

Country / region	AI power-readiness	Generation, TWh	Peak, GW	Mix (rounded)	Data-centre relevance	AI power-readiness read	Main bottleneck

Strategic read — six countries that matter most

For each: what is strong, what is weak, why it matters for AI, and the one bottleneck to fix first.

AI data-centre power economics

A 100 MW data centre is already a major industrial load. A 1 GW AI campus is power-plant scale. Electricity turns into tokens; the price you pay flows through to every API call.

Why size matters

100 MW data centre = roughly the load of a small city.
1 GW AI campus = power-plant scale; planning has to happen with utilities directly.
Total facility power = IT load × PUE. A PUE of 1.5 means every 1 kWh of compute costs 1.5 kWh of grid power.
Modern liquid-cooled AI campuses target a PUE of 1.10–1.20; older air-cooled sites are 1.4–1.7.

The real cost stack

Electricity price (energy + capacity + transmission charges).
Grid connection — substations and high-voltage drop-points.
Transformer and substation lead times (often 18–36+ months).
Backup generation, batteries and resilience zoning (N+1, 2N).
Cooling water, chillers, dry coolers or liquid loops.
Long-term PPA structure, often 10–20 years.
Land, permits and political acceptance.
Time-to-power: under 24–36 months is now a competitive moat.

Data-centre annual energy + cost calculator

Annual energy = IT load × PUE × utilisation × 8,760 h. Pick a preset or edit the inputs. Numbers are illustrative; not a quote.

IT load (MW)

PUE

Utilisation %

Price ($/kWh)

Total facility load

— MW

Annual energy

— TWh

Annual electricity cost

— / yr

Where would a 500 MW AI campus be easier to power?

Qualitative score (1–10) on four axes: available megawatts now, time-to-power, electricity price, and grid cleanliness. These are reads on the country / corridor, not a guarantee for any specific site.

AI’s Grid Problem: Concentrated Power

AI is turning electricity from a background input into a frontline strategic constraint. The real issue is not only that AI uses more electricity. It is concentrated demand, grid-connection delays, cooling, permits, and time-to-power becoming a moat.

2024 data-centre electricity use ~1.5% of global electricity

2030 projection ~945 TWh / yr

Main growth driver AI accelerated servers

Grid delay risk ~20% of planned capacity

2024 concentration US 45% · CN 25% · EU 15%

Strategic lesson Time-to-power is an AI moat

The AI energy problem is not only about how much electricity data centres consume. It is about where, when, and how that electricity is drawn from the grid. Globally, data centres still look like a small share of total demand. Locally, they can behave like a new industrial city appearing on the grid overnight. The constraint is no longer just chips. It is megawatts, substations, cooling, permits, and connection timelines.

By 2030, the IEA projects data-centre electricity consumption to roughly double to around 945 TWh per year — close to Japan’s current annual consumption — with AI processing the most significant single driver. Around 20% of planned data-centre projects could face delays if grid constraints are not addressed. Natural gas and renewables are expected to take the lead in powering data centres in the near term; the tech sector may also help bring forward new nuclear, SMR, and geothermal capacity. None of these alone solves the problem.

This is why time-to-power is becoming a strategic moat. The next AI winners will not only be the companies with the best models or the most GPUs. They will be the companies and regions that can secure reliable power, cooling, land, grid access, and long-term energy contracts before everyone else does.

What this means

AI infrastructure is becoming energy infrastructure.
Data-centre location strategy now depends on grid capacity, not just latency.
Power contracts, transmission access, and cooling are becoming competitive advantages.
Countries with cheap, reliable, scalable electricity may become AI hubs.
Regions that cannot connect new loads fast enough may lose AI investment.

Where the pressure shows up

United States: largest share of data-centre electricity (~45% in 2024) and on course to account for almost half of US electricity-demand growth between now and 2030.
China: second-largest share (~25% in 2024) and major AI infrastructure buildout, with its own grid and siting playbook.
Europe: ~15% in 2024 — constrained more by grid connection queues, permitting, and energy-policy complexity than by raw generation.
Japan: data centres could account for more than half of national electricity-demand growth.
Malaysia / Southeast Asia: Singapore spillover and Johor / Southern Malaysia growth could make data centres around one fifth of national demand growth.

Source: IEA — Energy and AI report; IEA Global Energy Review 2026. Percentages and projections are IEA estimates and will be revised; treat them as direction, not precision.

Renewable integration for compute

Solar and wind are cheap and scalable but variable. AI clusters need 24/7 reliable power. Hourly-matched clean energy is much harder than buying annual renewable certificates.

What "matching" actually means

Annual matching: total clean kWh procured ≥ total kWh consumed over a year. Easy, weak.
Hourly matching (24/7 carbon-free): every hour of consumption is matched by clean generation in the same hour and grid region. Hard, real.
Firm clean sources matter: nuclear, hydro, geothermal, biomass with carbon capture, batteries with long-duration storage.
Demand response: shifting flexible workloads (some training, batch inference) to clean-rich hours.

Inverters, frequency, inertia

Solar and batteries output DC. The grid runs on AC at 50 or 60 Hz. Inverters convert and synchronise.
Grid-following inverters need a stable grid frequency to lock onto. They cannot start a dead grid alone.
Grid-forming inverters can help create and support frequency. Critical as the share of variable renewables rises.
Synchronous machines (turbines spinning) give the grid inertia. Inverter-based grids need new ways to keep frequency steady.
Reactive power and fault ride-through keep voltage stable when something trips. Modern equipment must be tuned for both.

Before an AI campus can exist, these must be true

A practical pre-flight checklist. Any single missing item can kill a project for years. Cheap GPUs do not help if the substation does not exist.

Time-to-power is now a moat

Seven stages between a green-field site and the first electron flowing into a GPU rack. Times are ranges, not promises; they vary by jurisdiction, utility and equipment availability.

Strategic takeaways

Sources & methodology

Figures across this section are anchored on the latest reliable year widely available in primary sources (typically 2024 actuals reported in 2025–early 2026 editions). Numbers are rounded; do not treat them as precise. Where exact 2025 values are not yet final, the latest available year is used.

Chips layer · deep dive

Chips: where electricity becomes intelligence.

Chips are not just GPUs. The AI chip layer is the full compute engine: accelerator architecture, memory, packaging, interconnects, networking, software, manufacturing and supply chain. The countries and companies that secure all of it together will define who scales AI.

latest available

Electricity → tokens

Why AI Chips Are Splitting by Workload

AI chips are not just “faster computers”. They are specialised machines for turning electricity into matrix multiplication at trillion-scale. CPUs, GPUs, and TPUs exist because different workloads need different hardware. As AI matures, training and inference are splitting into different chip-design problems.

CPU

General-purpose control

A handful of sophisticated cores with deep control logic, large caches, branch prediction, and flexibility. Excellent for operating systems, orchestration, databases, application logic, and serial tasks. Not enough parallel arithmetic units to drive trillion-scale neural-network matrix multiplication.

GPU

Parallel computation

Thousands of simpler cores that apply the same operation across many pieces of data at once. Originally built for graphics, where millions of pixels can be processed in parallel. Neural networks rely on the same parallel matrix math, which made GPUs the accidental engine of deep learning.

TPU / AI accelerator

Specialised matrix engine

Sacrifices general-purpose flexibility for efficiency on tensor operations. Systolic-array designs push data through grids of simple compute units, doing matrix multiplication with very little wasted control overhead. Higher performance per watt on AI workloads; less useful for arbitrary code.

Why GPUs won AI

Graphics and neural networks look unrelated, but they share the same deeper pattern: huge amounts of parallel math. Graphics applies the same transformations across millions of pixels and vertices. Neural networks apply multiply-add operations across enormous matrices. Hardware built for one turned out to be ideal for the other.

Analogy · CPU

Jumbo jet

Fast, flexible, many routes, many cargo types. The right vehicle when each trip is different and the itinerary matters more than the volume moved.

Analogy · GPU

Cargo ship

Less flexible. Slower for any single trip. But moves enormous volume of the same cargo at once. The right vehicle when you have huge amounts of similar work to do in parallel.

What is actually inside a modern GPU

A GPU is not a faster CPU. It trades flexibility for parallel throughput by stacking thousands of simple arithmetic units, feeding them data through very wide memory pipes.

Many simple arithmetic cores. Each one is much less capable than a CPU core. The workhorse operation is fused multiply-add — multiplying two numbers and adding a third in a single step.
Tensor cores. Specialised units that perform matrix multiplication and addition directly — the central operation of neural networks. This is what made modern GPUs especially good for AI rather than just generally parallel.
Same instruction, many data points (SIMD / SIMT). The thousands of cores execute the same operation across different pieces of data at the same time. Graphics, mining, and deep learning all fit this shape, which is why one chip family serves all three.
Very high memory bandwidth. Thousands of cores starve without data. The AI chip race is partly a memory-bandwidth race: raw compute is wasted if the chip cannot be fed fast enough.

Training and inference are different workloads

The same silicon family runs both, but the optimisations diverge. Training builds the model. Inference runs it. Two different bottlenecks, two different chip-design problems.

Training

Build the model

Compute-bound and interconnect-heavy. Needs raw floating-point throughput, large batch processing, high-bandwidth memory, and very fast chip-to-chip communication so gradients can be shared across many accelerators each step. Rewards scale and cluster networking.

Inference

Run the model

Often memory-bound and latency-sensitive. The system must load weights, manage the KV cache for context, generate tokens quickly, and minimise cost per token. Agentic AI raises the stakes — one user task can trigger many sequential model and tool calls, so inference efficiency drives the unit economics.

Strategic read

AI hardware is moving from one-chip-fits-all to workload-specific design. The frontier training cluster wants maximum throughput and interconnect. The production inference fleet wants low latency, memory efficiency, reliability, and cost control. This is why the AI chip market is splitting into training accelerators, inference accelerators, hyperscaler custom silicon, edge NPUs, and specialised AI processors.

Chips are no longer just about peak FLOPS — utilisation, memory and interconnect decide real throughput.
Memory bandwidth and chip-to-chip interconnect are becoming as important as raw compute.
Training rewards scale, throughput, and cluster networking.
Inference rewards latency, utilisation, KV-cache efficiency, and cost per generated token.
Agentic AI increases demand for efficient inference because one task may require many model and tool calls.
Hyperscalers design custom silicon because workload control can become a structural cost advantage.

Beyond Nvidia: the rise of custom AI silicon

AI compute is no longer a single market. Nvidia GPUs became the default engine because they are flexible, programmable, and excellent at large-scale parallel math — and that flexibility still matters, especially for frontier training. But production AI is increasingly an inference problem: serving tokens quickly, reliably, and cheaply across millions of requests. As that side of the workload grows, specialised silicon becomes attractive — a buyer who knows the workload can trade general-purpose flexibility for better latency, utilisation, power efficiency, and cost per token.

General-purpose AI GPU

Flexible training and inference

Best ecosystem, broad programmability, deep software moat — CUDA, kernels, libraries, model code — and useful across many AI and non-AI workloads. Examples: Nvidia data-centre GPUs. The default platform when the workload is varied or not yet stable.

Hyperscaler ASIC

Custom cloud-scale AI compute

Useful when a cloud provider controls the workload end-to-end and wants lower cost, better performance per watt, and less dependence on external chip supply. Examples: Google TPU, AWS Trainium and Inferentia, Microsoft Maia, Meta MTIA.

Wafer-scale AI chip

Very-large-chip compute

Pushes more compute and memory onto one very large piece of silicon to cut chip-to-chip communication overhead. Targets training and inference where interconnect is the binding constraint. Examples: Cerebras-style wafer-scale engines.

Inference accelerator

Fast token generation

Optimised for low-latency inference, high throughput, and cost-efficient serving. Trades training generality for serving speed and tokens-per-dollar. Examples: Groq-style LPUs, SambaNova, d-Matrix, and other inference-focused ASICs.

Edge NPU

Local inference on device

Moves smaller AI workloads onto phones, laptops, cameras, and embedded devices — lower latency, better privacy, offline use, and reduced cloud cost. Examples: Apple Neural Engine, Qualcomm Hexagon, Intel / AMD / Arm NPUs.

Read it as workload fragmentation, not Nvidia versus everyone

The durable lesson is not “Nvidia is losing” or that any single challenger will win. The durable lesson is that AI compute is fragmenting by workload. Frontier training clusters, cloud inference fleets, enterprise agents, consumer-device AI, and specialised scientific workloads will not all use the same hardware forever. Even Nvidia has shown interest in inference-specific technology through major licensing, partnership, and talent-related moves — the incumbent itself treats inference as its own category.

Nvidia remains the default platform for flexible AI compute, especially for training.
Custom ASICs become more attractive when the workload is predictable and high-volume.
Inference may become the largest economic battleground — models have to be served continuously, agents call them in long chains.
Cost per token, latency, and utilisation become as important as peak FLOPS.
Memory bandwidth, networking, and software tooling matter as much as the silicon.
Hyperscalers have a structural advantage because they control both the demand and the infrastructure.
Startups compete by specialising for a workload, not by copying the Nvidia stack head-on.

Chip types — what does each one actually do?

CPUs, GPUs, TPUs, custom hyperscaler silicon, AMD Instinct, Intel Gaudi, wafer-scale, LPUs and edge NPUs — nine cards, each with its best use, its weakness, and a concrete example.

Which chip for which job?

A practical first cut, not a benchmark. Always validate on your workload, your software stack and your latency target.

Misconceptions to drop

The fastest way to think clearly about AI chips is to stop carrying these around.

The AI chip stack — eight layers

From transistors to global supply chain. Each layer enables the one above it, and any single layer can become the binding constraint.

Hardware benchmark cards

Vendor-published specs for the major AI accelerators. Use these to compare at a glance — not as a benchmark substitute.

Read this before reading the cards

Peak FLOPS are marketing numbers unless paired with memory pressure, interconnect, software stack, utilization and the actual workload. Two chips with the same peak BF16 number can have very different tokens-per-second on the same model.

Training builds the model. Inference runs the business.

Same silicon family, different optimisations. Below: the two workloads as visual flows, then a side-by-side read across seven axes that decide which chip you actually want.

Training

Dataset + chips + time → model weights

Inference

User query + weights + KV cache → tokens

Strategic read—

Who actually makes AI chips?

No single country can ship a frontier AI chip end-to-end. The map below traces the value chain; the cards below it read each major jurisdiction.

Country reads

For each: what is strong, what is weak, why it matters for AI, and the bottleneck to fix first.

Advanced AI chips are strategic assets

Geopolitical bottleneck map

Each major jurisdiction with its role in the chain, where its leverage sits, and where its risk sits. Most leverage points are concentrated in fewer than ten countries.

Export-controls timeline

A high-level read of the post-2022 export-control cycle. Use as a directional summary, not as legal guidance.

—

Real cost is not just chip price

Twelve line items decide the actual dollars-per-token of an AI workload.

Total cost stack

Training vs inference

Training builds the model. Inference runs the business. The chip you want for one is rarely the chip you want for the other.

Why it matters

AI hardware cost intuition

Pick a preset or edit the inputs. Numbers are illustrative; not a quote.

Accelerators

Avg power per accelerator (W)

Utilisation %

Price ($/kWh)

Allocated GPU-hour ($)

Total IT power

— MW

Annual energy

— TWh

Annual energy cost

— / yr

Total run-rate (energy + allocated)

— / yr

Strategic summary · what chips mean for AI power

Energy decides how much compute can run. Chips decide how efficiently it becomes intelligence.

The Chips layer is the conversion layer of AI: it turns electricity into computation, and computation into model capability.

Sources & methodology

Hardware specs are taken from official vendor product pages, datasheets and architecture briefs (NVIDIA, AMD, Google Cloud TPU docs, AWS Neuron docs, Intel Gaudi, Cerebras, Groq, Apple). Geopolitical and supply-chain reads draw on TSMC + ASML disclosures, SIA / CSIS reports, Reuters coverage, and the official Singapore EDB / Malaysia MIDA sector pages. "Latest available" labels apply throughout.

Infrastructure layer · deep dive

Infrastructure: the operating system of AI.

Infrastructure is where chips become usable systems. It is the layer that turns raw accelerators into reliable AI products through distributed computing, networking, storage, scheduling, serving, MLOps, observability, security and data-centre operations.

latest available

Energy → application

Infrastructure 101 — what each primitive actually does

Ten foundational concepts. Read these once and the rest of the section makes more sense.

The AI infrastructure ladder

From a laptop to a hyperscale AI factory. Each rung adds an order of magnitude of complexity.

AI infrastructure maturity model

Where is your team today? Six levels from "demo" to "AI factory", with the giveaway tells for each.

Decision matrix — what should you actually build?

Pick the row that matches your context. Each card shows the recommended pattern, the tools likely involved, the bottleneck to plan for, and the obvious things to avoid.

Misconceptions to drop

The fastest way to think clearly about AI infrastructure is to stop carrying these around.

The AI infrastructure stack — ten layers

From the building to the audit log. Any one layer can become the binding constraint on a real workload.

Training job lifecycle — 13 stages of a real run

What actually happens when a distributed training job kicks off. Each stage is a place where something can fail; each is a place where good infra earns its keep.

Training vs inference — they need different infrastructure

Eleven dimensions, two columns. The most common architecture mistake is assuming the training stack is also the serving stack.

Prompt-to-token: what really happens when a user sends a message

Seventeen stages between the typed prompt and the streamed answer. The model is one step. Everything else is the production system around it — and it decides speed, reliability, safety and cost.

What causes slow AI responses? — diagnostic guide

Four symptom families, each with the most common causes. Use as a triage checklist when latency or cost regress.

Observability control room

The metrics an on-call engineer actually watches at 3am. Logs, metrics and traces are the three pillars; OpenTelemetry standardises all three.

Read this first

Do not confuse workflow orchestration (Airflow / Dagster / Prefect), container orchestration (Kubernetes / Nomad), and GPU job scheduling (Slurm / Volcano / KubeRay / Ray). They solve different problems. Mature AI orgs use more than one — and confusing them is one of the most common sources of "Kubernetes can't run my training" pain.

Reference architectures — four blueprints by scale

Concrete starting points. Pick the row that matches your context, then steal the building blocks. None are universally optimal; they are honest defaults.

The economics of AI infrastructure

Real cost is not just GPUs. It is silicon + servers + network + storage + power + cooling + space + ops + utilisation losses. The calculator below is an intuition tool — pick a preset, edit any field.

AI infrastructure cost intuition

Pick a preset or edit the inputs. Energy = IT load × PUE × utilisation × 8,760 h. Numbers are illustrative; not a quote.

Accelerators

Avg power per GPU (W)

Utilisation %

PUE

Price ($/kWh)

HW cost / GPU ($)

Depreciation (years)

Tokens/sec / GPU

IT power

— MW

Facility power (× PUE)

— MW

Annual energy

— TWh

Annual electricity

— / yr

Annual depreciation

— / yr

Cost / GPU-hour

$—

Cost / million tokens

$—

These are intuition tools, not quotes. Real costs depend on workload, contracts, utilisation, hardware, cooling and region.

Where AI infrastructure breaks — diagnostic table

A four-column on-call diagnostic. Symptom → likely layer → possible cause → first thing to check. Read it left-to-right when something is on fire.

Build vs buy

Bad AI infrastructure patterns

Thirteen anti-patterns. Most outages, blown budgets and bad postmortems trace back to one of these. Knowing them is most of the work.

Strategic summary · what infrastructure means for AI power

Infrastructure is the coordination layer of AI: it turns chips into clusters, clusters into services, and services into products.

Without it, model capability stays as a research result. With it, model capability becomes product performance.

Sources & methodology

Tool descriptions are functional summaries of official documentation (Kubernetes, Slurm, Ray, PyTorch Distributed, JAX, NCCL, vLLM, NVIDIA TensorRT-LLM, SGLang, HuggingFace TGI, NVIDIA Triton, MLflow, W&B, Kubeflow, OpenTelemetry, Prometheus, Grafana). Vendor-claimed performance is flagged as such; benchmarks should be cross-checked against MLPerf and your own workload.

Models layer · deep dive

Models: where compute becomes capability.

Energy gives AI power. Chips convert power into computation. Infrastructure coordinates computation. Models turn computation into language, reasoning, vision, code, planning, memory and action.

latest available

Data → deployment loop

Models 101 — what each primitive actually means

Ten foundational terms. Read these once and the rest of the section makes more sense.

Common confusions

Five mix-ups that block clear thinking about models. Learn them, drop them.

Model maturity model

Where is your team today? Six levels from "demo" to "model operating system" with the giveaway tells for each.

Misconceptions to drop

The longer myth-truth list. Most bad model decisions trace back to one of these.

The AI model stack — eight layers

From data to deployment. Capability is downstream of every layer above; weakness in any layer caps the model that emerges.

Transformer mental model

Architecture uncertainty

Transformers are dominant, not final.

The Transformer unlocked modern AI by replacing recurrent and convolution-heavy sequence systems with attention-based architectures. But it should not be treated as the final form. New architectures and efficiency patterns matter because they can change the compute curve. If an architecture gives a 2–5× efficiency gain, energy still matters. If it gives a 100× gain in useful capability per watt, the stack reshuffles.

Efficiency improvement

Strategic meaning

2–5×

Lower cost, but energy still matters.

10–50×

Chips, infra and architecture become more decisive.

100×+

The bottleneck could shift dramatically.

Architecture + chips + infra together

The whole AI map changes.

Model families

Different shapes of model for different jobs. The serious products use more than one.

How models are made — the 14-stage pipeline

A modern frontier model is not one training run. It is a pipeline. Each stage is a place where capability or safety is shaped.

Reasoning models

Multimodal models

Open-weight vs closed frontier

Three deployment shapes for the model layer. Each has real strengths and real costs — pick the one that matches your context, not the one that matches your taste.

Decision guide

When to choose each path. Most serious products end up hybrid.

Frontier landscape — by role, not vendor

Ten roles the frontier landscape actually breaks into. Use this as a living framework when picking a model — not a leaderboard.

Frontier model landscape — by ecosystem

For reference. The ecosystem read for each major closed lab, open-weight ecosystem, and specialist category.

From prompt to product outcome

Fourteen stages between a user goal and the improvement loop. The model is one stage. The surrounding loop decides reliability, cost, safety and usefulness.

Model system patterns

Seven canonical shapes of AI product. Most real products are one of these or a composition.

Which model should I use? — model selection matrix

Fifteen common workloads, each with a model-type recommendation, key capability, sensitivity ratings, evaluation method and what to avoid. The best AI product rarely uses one model — it uses a model system.

Giving models external knowledge

Five ways the model can reach beyond its weights. They are complementary, not competitors.

Agents and tool use

Reference architectures

Seven concrete blueprint cards from chatbot to autonomous workflow agent. Steal the building blocks for your context.

Concrete architecture examples

Six real product flows as horizontal pipelines. Steal the lane that looks like your product.

Bad model system patterns

Seventeen anti-patterns. Most blown budgets, bad postmortems and sad demos trace back to one of these.

Where model systems break — 4-column diagnostic

Symptom → likely cause → first thing to check → fix pattern. Fifteen rows. The on-call triage list when a deployed model misbehaves.

The model economics stack

Model cost depends on input + output + reasoning tokens, context length, model size, batching, cache hit rate, tool calls, retrieval, reranking, modality processing, self-host vs API, utilisation, and the evaluation + monitoring overhead. The calculator below is an intuition tool — pick a preset, edit any field.

Model cost intuition

Pick a preset or edit the inputs. Numbers are illustrative; not a quote.

Input tokens / req

Output tokens / req

Requests / day

Input $ / M tok

Output $ / M tok

Reasoning ×

Cache hit %

Retrieval $ / req

Human review %

Daily input tokens

—

Daily output tokens

—

Monthly cost

—

$ per active user (/1k DAU)

—

$ per task

—

These are intuition tools, not quotes. Real cost depends on provider pricing, caching, routing, latency, model mix, retries and evaluation overhead.

Strategic summary · what models mean for AI power

The Models layer is the capability layer of AI: it turns computation into reasoning, language, vision, code, action and product intelligence.

The winning products will not simply use the biggest model. They will use the right model system: routing, retrieval, memory, tools, evaluation, safety and cost control.

Sources & methodology

Drawn from official model docs and system cards (OpenAI, Anthropic, Google DeepMind, Meta Llama, Mistral, Qwen, DeepSeek, xAI), foundational papers (transformer / attention, diffusion, RLHF, RAG, DPO), MLCommons / MLPerf, Stanford AI Index, METR and the LMSYS Chatbot Arena (used carefully). Vendor-claimed performance is flagged. Public benchmarks change quickly; this section is a framework, not a leaderboard.

Applications layer · deep dive

Applications: where capability becomes useful work.

Energy gives AI power. Chips convert power into computation. Infrastructure coordinates computation. Models turn computation into capability. Applications turn capability into products, workflows, agents, decisions, revenue and real-world outcomes.

latest available

Problem → outcome loop

Apps 101 — what each primitive actually means

Ten foundational ideas. Read these once and the rest of the section makes more sense.

Common confusions

Six mix-ups that block clear thinking about AI products. Replace them with the right framing.

Application maturity model

Where is your team today? Six levels from "cool demo" to "AI-native operating system" with the giveaway tells for each.

Misconceptions to drop

The longer myth-truth list. Most failed AI applications trace back to one of these.

The AI application stack — nine layers

From user surface to feedback loop. Most failed AI products are weak in one specific layer; finding it is the work.

Demo vs product

The same idea looks identical in a demo and a product. The difference is everything about how it behaves over time.

Application patterns — thirteen canonical shapes

Most real AI applications are one of these shapes or a composition. Understanding the shape comes before picking the model.

Where AI applications create value — eight levers

AI products win when they pull on at least one of these levers strongly. Most pull on two or three.

Where AI applications land — sixteen industries

For each industry: best AI use cases, the workflow wedge, the data advantage, the trust / risk envelope, monetisation lens and what to avoid. A briefing, not a directory.

Vertical AI vs horizontal AI

A persistent strategic split. Horizontal serves everyone shallowly. Vertical owns one workflow deeply. Both win — for different reasons.

See also · Biological Intelligence Atlas — ants, bees, slime moulds, immune systems and embodied cognition as references for multi-agent design.

Ten architecture blueprints

The recurring shapes that ship. Most production AI systems are one of these or a composition. Use them as starting points, not finished designs.

The 12-step strategy playbook

Start narrow. Win a workflow. Then expand. The order matters more than the speed.

Twenty bad patterns to avoid

The recurring traps. Each one looks reasonable up close and corrosive in aggregate. Print this. Re-read before every roadmap.

Metrics that matter — five categories

The dashboard for an AI product. User, quality, operational, business and risk. If a category is missing, you cannot tell whether the system is healthy.

Risks & controls

For every risk, a corresponding control. Trustworthy AI applications make this pairing explicit, not implicit.

Diagnostics — where AI applications break

Symptom → likely cause → first thing to check → fix pattern. Sixteen rows. The on-call triage list when a deployed AI app misbehaves.

Application economics

AI app cost is not just model cost. It is model + retrieval + tools + human review + ops + errors + change management. Value is hours saved, revenue uplift, errors avoided, decisions improved. Pick a preset, edit any field — these are intuition tools, not quotes.

Application value & cost intuition

Pick a preset or edit the inputs. Numbers are illustrative; not a quote.

Users

Tasks / user / mo

Minutes saved / task

Labour $ / hr

Revenue $ / task

Model $ / task

Review $ / task

Ops / eng $ / mo

Error / rework $ / mo

Price $ / user / mo

Time saved / mo

—

Gross value / mo

—

Operating cost / mo

—

Net value / mo

—

Break-even price / user

—

Cost per task

—

These are intuition tools, not pricing. Real economics depend on workflow fit, model mix, retrieval / tool costs, retries, review rate, integration effort and adoption curve.

Strategic summary · what applications mean for AI power

The Applications layer is the value layer of AI: it turns model capability into products, workflows, revenue, productivity and real-world outcomes.

The winners will not simply ship a chatbot. They will own a workflow — with the right context, the right interface, the right evaluation and the right economics.

Sources & methodology

Drawn from operator-grade product research and labelled vendor documentation: Stanford AI Index, Microsoft Work Trend Index and copilot research, Anthropic / OpenAI / Google product docs, NBER and academic productivity studies (where available), Y Combinator and a16z operator essays, and credible product playbooks. Vendor-claimed performance is flagged. Industry cards are illustrative briefings — they map territory, not specific deployments.

Full AI research & engineering taxonomy

How I think about AI

Most AI discussion stays at the top of the stack. People debate which model is best, which agent framework to use, which startup just launched. That layer matters, but it is only one-fifth of the picture.

Real leverage comes from understanding how the layers connect. A breakthrough in chip interconnects changes the economics of distributed training. A new compiler optimization shifts what model sizes are practical to serve. Cheaper energy in a specific geography changes where AI factories get built, which changes who has access to frontier compute.

The people building the most consequential AI systems are not just prompting models. They are reasoning across the full stack: spotting bottlenecks, understanding where marginal progress in one layer unlocks disproportionate gains in the layers above it. That is the kind of thinking this page is designed to support.

For a deeper look at the foundational papers behind many of these ideas, see The AI Atlas, an interactive knowledge graph of the 50 most important AI/ML papers.

Inspired by Jensen Huang's framing of AI as a five-layer industrial stack.