Essay No. 005 · AI & software engineering · Melbourne, Australia

AI software engineering coding agents vibe coding automation

The Cheap Code Era.

AI will automate coding before it can own software. The cheap code era is here, but the expensive part of software is moving from syntax to systems.

Pugalenthi Magendran

February 2026 · Melbourne, Australia

9 min read

Editorial illustration. On the left, a factory conveyor belt streams identical code tiles out of a machine labelled PROMPT, under a neon sign reading CODE IS CHEAP, with a crate labelled GENERATED. On the right, a man at a desk studies a wall covered in hand-drawn diagrams for system design, trade-offs, architecture, risk, consequences, long term and user need, with a stack of books on distributed systems, reliability engineering, security patterns, org design and culture, beside a crate labelled EARNED, under a neon sign reading JUDGMENT IS EXPENSIVE. — Generated on the left. Earned on the right.

Code is becoming cheap. Trustworthy software is not.

That distinction is the whole essay.

AI can already generate components, scaffold apps, fix common bugs, write tests, refactor functions, connect APIs, and turn vague English into a working prototype. That is no longer science fiction, it is the current baseline. The real question for jobs, startups, and the shape of technical work is not whether AI can write code. It is when it stops being a coding assistant and becomes an autonomous software engineer.

The answer cuts both ways. AI will automate a large share of coding sooner than most engineers want to admit, and it will take longer to own software engineering than most AI maximalists want to admit.

Coding is the act of producing code. Software engineering is the act of turning messy human intent into a system that works, survives, scales, remains secure, can be changed later, and does not create unacceptable damage when reality disagrees with the plan. They are not the same job.

The future is not the end of software engineering. It is the end of code as the moat.

Key idea

AI will automate coding before it can own software. The cheap code era makes code abundant, but durable systems still require judgment, security, verification, and accountability.

90%

Developers using AI in 2025 · DORA⁹

46%

Developers distrust AI tool output · Stack Overflow⁸

19%

Slowdown for experienced devs in RCT · METR¹⁰

7 mo

Doubling time for AI task horizons · METR⁷

I. Vibe coding is the first symptom

Vibe coding is not just a meme. IBM defines it as a style of development where users express intent in plain language and AI turns that intent into executable code.¹ The Verge calls the result “personal software”: tools made for one person’s exact need, often rough, often good enough.²

The shift is that software which was never worth hiring to build is suddenly economic. Small tools, private dashboards, internal scripts, one-off automations. Before AI, the cost of building these was higher than the value of having them. Now it inverts.

AI does not only replace software labour. It creates software demand that was previously uneconomic.

The trap is that generation is not engineering. Vibe coding is excellent for disposable software. It becomes dangerous when people confuse disposable software with durable systems.

II. The three software layers

The future makes sense if software is split into three.

Disposable software is personal scripts, prototypes, landing pages, small dashboards, experimental apps, and one-off automations. If they break, damage is low. If they get messy, regenerate. If they are ugly, only one person may care. AI will dominate this layer.

Commodity production software is CRUD apps, admin panels, booking systems, internal workflow tools, standard SaaS features, integrations, reporting dashboards, and simple mobile apps. These follow known patterns. The challenge is not inventing new computer science, it is applying familiar patterns correctly. AI will automate much of this layer, with humans defining requirements, checking security, and owning the result.

Durable systems are banking platforms, healthcare, identity, payment infrastructure, cloud infrastructure, developer platforms, large enterprise software, regulated AI, and safety-critical software. Mistakes compound. Security matters. Reliability matters. Legal responsibility matters. Maintenance matters. AI will deeply assist this layer. It will not own it soon.

The confusion happens because people use one word, “coding,” for all three. Generating a working personal app and maintaining a secure banking system are not the same activity.

The first layer is increasingly AI-native. The second is becoming AI-assisted. The third remains human-accountable.

III. What the evidence actually says

The strongest signal is not that chatbots can write functions. It is that agents are entering the engineering workflow. GitHub’s Copilot coding agent is generally available, working asynchronously in a sandboxed environment to take an issue, plan a fix, edit files, run tests, and open a pull request.³ Anthropic’s Economic Index puts Computer and Mathematical tasks at 35% of Claude.ai conversations in March 2026, with API usage even more concentrated in automation-friendly workflows.⁴ Coding is one of the first places where AI becomes economically real.

Capability is uneven. SWE-bench Verified measures bounded GitHub issues.⁵ SWE-Bench Pro, designed for realistic enterprise work across 41 maintained repositories, kept widely used coding models below 25% Pass@1 under a unified scaffold, with GPT-5 at 23.3% at the time.⁶ METR’s task-horizon research finds AI completion horizons doubling roughly every seven months, with leading models around a 50% success rate on software tasks that take a human expert about an hour.⁷ Capability is rising fast. Reliability for long-horizon ownership is not there yet.

Adoption is racing ahead of trust. Stack Overflow’s 2025 Developer Survey reports 46% of developers distrust AI tool accuracy, 33% trust it, and only about 3% trust it highly.⁸ Google’s DORA report puts AI adoption among professional developers at 90%, with a median of two hours a day in core workflows, while also noting that faster code generation does not automatically translate into stable delivery.⁹ The hidden bottleneck is absorption, not typing.

The most uncomfortable finding is METR’s randomised controlled trial. Sixteen experienced developers working on their own mature repositories expected AI to make them faster. With AI allowed, they took 19% longer.¹⁰ AI helps when the codebase is unfamiliar, the task is bounded, and mistakes are cheap. It can slow you down when the developer already knows the system, the standards are high, and reviewing plausible-but-wrong suggestions takes longer than writing the correct change.

AI is strongest when the cost of generation is the bottleneck. AI is weaker when the cost of judgment is the bottleneck.

Security sharpens the picture. The Cloud Security Alliance notes AI assistants do not inherently understand an application’s risk model or threat landscape.¹¹ Veracode, a vendor source worth treating as one data point, reports that 45% of evaluated AI-generated code samples failed security tests and introduced OWASP Top 10 vulnerabilities.¹² The deeper problem is not bad code. It is bad code that arrives with the tone of certainty.

Read together, the evidence points one way. AI is becoming good enough to produce code at scale. It is not yet trusted enough to own durable systems.

IV. The strongest case for faster automation

This thesis deserves its sharpest opponent. There is a serious case that full automation arrives faster than the essay implies.

Software is unusually friendly to autonomy because verification is built in. Tests run automatically. CI catches regressions. Sandboxes contain changes. Staging environments rehearse them. Static analysis flags suspicious patterns. Rollbacks undo failures. Observability surfaces them. None of that exists for an AI agent translating a contract, performing surgery, or operating a vehicle. In code, the loop closes.

If verification closes the loop, the bar for autonomy is lower than it looks. AI does not need to be perfect. It only needs to be wrong in ways the surrounding system can catch and revert. A 90% reliable generator wrapped in a 99.9% reliable test and rollback layer can act safely on low-risk software. The next generation of coding agents is being trained explicitly for long-horizon coherence, repository understanding, debugging, and end-to-end workflow completion. Combine those with deploy, monitor, and rollback automation, and the human role can shrink quickly.

Companies will accept “good enough” autonomy where the savings are large. Most internal tools never had a strict reliability target. Many SaaS features already ship at 99% reliability. A modest drop in quality offset by a five-fold drop in cost is a trade most product organisations will make.

This case is strong, and it is mostly right for the first two layers. Disposable and commodity software meet most of its conditions. Verification is cheap, mistakes are reversible, the work is well-bounded.

It is weaker for durable systems because tests do not catch everything that matters. They miss business logic the original author never thought to assert. Security assumptions that live inside the heads of senior engineers. Legacy constraints embedded in deployment scripts nobody has read in five years. User behaviour that only emerges at scale. Compliance obligations that change quarterly. Privacy invariants that span systems. Long-term maintainability that nobody can write a test for.

Sandboxes reduce risk, but they do not eliminate responsibility. Rollback is useful when failures are reversible. Many failures are not: a payment sent, a record exposed, a model deployed with a quiet bias, a migration that succeeds and corrupts data.

In serious systems the hard question is not “can the AI produce a change?” It is “can the organisation trust the change and own the consequences?”

That is a different question, and it does not get answered by better models alone.

V. A conditional forecast

The trouble with dated forecasts is they imply a precision that does not exist. Instead of asking which year AI takes over coding, ask which threshold has been crossed.

Task horizon. Can agents reliably complete tasks that take a human expert eight hours, one day, one week?

Benchmark depth. Do agents perform strongly on harder benchmarks like SWE-Bench Pro, not just on the easier issue-resolution sets?

Security reliability. Does generated code pass security review at rates comparable to human code?

Review burden. Does AI reduce total delivery time after review, not just the writing portion?

System integration. Can the agent understand the existing architecture, dependencies, permissions, deployment paths, and business rules?

Trust infrastructure. Are tests, logs, provenance, rollback, monitoring, and human-escalation paths strong enough to absorb autonomous change?

Map those onto the three layers.

Disposable software has already crossed enough of them to be AI-native. The conditions are easy: short horizons, cheap mistakes, trivial integration.

Commodity production software crosses the line as task horizons grow from hours to days, security-reviewed generation becomes routine, and the review burden falls. By the late 2020s, much commodity software may be AI-built by default. This is conditional, not promised.

Durable systems do not become AI-owned simply because models improve. They become AI-owned only when the surrounding engineering process becomes strong enough to absorb autonomous change. That requires not just better agents but better organisational tooling: provenance, audit, sandboxed execution at production scale, automated security review, and an accountability structure that holds an autonomous system to standards a court will recognise. That layer is decades behind the models.

Even if every threshold is met, capability is not permission. The legal system does not care that the model seemed confident. Customers do not care that the agent had good intentions. Boards do not care that the benchmark score was high. If the system fails, someone owns the failure. That is why full autonomy will arrive unevenly, even when it becomes technically possible.

VI. The compression at the bottom

The biggest labour-market impact is not the disappearance of senior engineers. It is the compression of the bottom layer.

AI attacks exactly the tasks that used to train juniors: boilerplate, simple bug fixes, tests, docs, small UI changes, integrations, scripts, first-pass implementations. Those tasks were not just labour. They were apprenticeship.

A Stanford Digital Economy Lab paper finds early-career workers aged 22 to 25 in AI-exposed occupations experienced a 16% relative employment decline, while experienced workers in the same occupations remained more stable.¹³ The signal is not clean. Labour markets are also moved by interest rates, overhiring, layoffs, outsourcing, and macro conditions. But it fits.

AI does not need to replace senior engineers first. It only needs to reduce the need for juniors.

The old ladder ran from junior tasks to skill to judgment to seniority. If junior tasks disappear, the path to seniority gets harder. The future junior cannot just be someone who knows syntax. That layer is gone. The future junior has to be AI-native, product-aware, able to debug, able to test, able to read systems, and able to prove a change is correct. The bar at the bottom rises while the apprenticeship that used to clear it gets removed.

That is the cruellest tension in the cheap code era. The work that survives requires judgment that used to be trained on the work that disappears.

VII. What survives

The work that survives is not “coding” in the narrow sense. It is the work around code. Understanding the problem. Choosing the architecture. Designing the data model. Defining the security boundary. Knowing the user. Deciding what not to build. Reading logs. Handling incidents. Knowing when the AI is wrong. Owning the outcome.

This is why “AI will replace programmers” is both true and false. It will replace many whose value is mostly producing standard code from clear instructions. It will not quickly replace people whose value is deciding what should exist, how it should behave, and whether it is safe to trust.

The value moves from syntax to systems, and from building the thing to owning the consequences of the thing.

For seventy years, software was scarce because writing it required specialised knowledge. Now the act of writing is becoming cheap. That does not make software worthless. It changes where the value lives.

The cheap code era will create more software than the world has ever seen. The future will not belong to the people who generate the most code. It will belong to the people who know what should be built, what should never be built, and what must be proven before anyone trusts it.

The cheap code era does not end software engineering. It ends code as the moat.

¹ IBM (2025). What is vibe coding? Defines the practice as expressing intent in plain language while AI generates the executable code.

² The Verge (2025). The personal software revolution. On the emergence of custom-built tools for individual workflows.

³ GitHub (25 September 2025). Copilot coding agent is now generally available. Describes the agent’s asynchronous workflow: assigned task, sandboxed environment, plan, branch, tests, pull request.

⁴ Anthropic (March 2026). Economic Index: March 2026 report. Computer and Mathematical tasks at 35% of Claude.ai conversations, with API usage even more concentrated in automation-friendly workflows.

⁵ Princeton NLP. SWE-bench leaderboard. The benchmark and the human-filtered SWE-bench Verified subset of 500 real GitHub issues.

⁶ Anonymous et al. (2025). SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks? 1,865 problems across 41 actively maintained repos; widely used coding models under a unified scaffold remained below 25% Pass@1, with GPT-5 at 23.3%.

⁷ METR (19 March 2025). Measuring AI ability to complete long tasks. Task-completion horizons doubling roughly every seven months over six years; Claude 3.7 Sonnet at a 50% time horizon of about 50 minutes on the METR software suite at the time of study.

⁸ Stack Overflow (2025). 2025 Developer Survey: AI section. 46% of developers distrust AI tool accuracy, 33% trust it, around 3% trust it highly.

⁹ Google (2025). DORA Report 2025. AI adoption among software professionals at 90%, median of about two hours per day of AI in core workflows.

¹⁰ METR (10 July 2025). Early-2025 AI experienced open-source developer study. Randomised controlled trial with 16 experienced developers on their own mature repos: tasks took 19% longer when AI tools were allowed, despite developer expectations of speedup.

¹¹ Cloud Security Alliance (9 July 2025). Understanding security risks in AI-generated code. AI coding assistants do not inherently understand application risk models, internal standards or threat landscape.

¹² Veracode (2025). GenAI Code Security Report. 45% of evaluated AI-generated code samples failed security tests and introduced OWASP Top 10 vulnerabilities. Vendor source; one evidence point among many.

¹³ Brynjolfsson, Chandar & Chen, Stanford Digital Economy Lab (November 2025). Canaries in the Coal Mine: Six Facts About the Recent Employment Effects of Artificial Intelligence. 16% relative employment decline for workers aged 22–25 in highly AI-exposed occupations, while experienced workers in the same occupations remained more stable.

* * *

This is Essay No. 005. The topics: intelligence, AI, systems, knowledge, and the questions underneath the questions everyone else is asking. If you read this far and disagreed with any part of it, write to me. I read everything.

Pugalenthi Magendran