Software Engineering — AI Maturity

The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.

AI Maturity by Domain

Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail

DOMAIN

BLEEDING EDGEESTABLISHED

⌨️ Software Engineering

AI across the development lifecycle — writing, reviewing, testing, and shipping code. Code completion is established and IDE-native; agentic coding and AI-driven CI/CD are advancing fast but half the domain remains bleeding-edge. The widest maturity spread of any domain: a few practices are table stakes while many are still experimental.

24 practices: 2 established, 2 good practice, 13 leading edge, 7 bleeding edge

Software Engineering — Biweekly Brief

The headline: AI coding tools are now universal infrastructure -- 92% of developers use them daily -- but the organizations deploying them are measurably slower, not faster, because human review cannot keep pace with machine-generated code.

The Picture

Most companies have licensed AI coding tools; the question is no longer adoption but whether those tools are delivering returns. GitHub Copilot has 4.7 million paid subscribers and 90% Fortune 100 penetration. Claude Code and Cursor between them represent over $3 billion in annual revenue. Yet independent studies of millions of real pull requests show AI-generated code is accepted at barely a third the rate of human code, carries nearly three times the security flaws, and creates review backlogs that slow delivery overall. Amazon's March outage -- 6.3 million lost orders from inadequately reviewed AI code -- is the highest-profile example of a pattern affecting roughly one in five organizations. The small group pulling ahead are those that have invested in review governance and specification discipline, not those that adopted the most tools.

This Fortnight

Architecture documentation crossed into broad deployment. Mintlify raised at a $500 million valuation, revealing that AI agents (software that acts on its own without being prompted) now generate more of its documentation traffic than human users do. Google deployed autonomous agents writing standardized architecture files across microservices, with automated quality gates catching critical infrastructure issues that had gone undetected for months. For organizations still treating documentation as a manual task, this signals it is becoming automated infrastructure -- and the teams that invest in machine-readable specifications now will have a structural advantage as AI agents consume documentation as their primary input.
Security vulnerability evidence hit a new threshold. Georgia Tech's tracking project found AI-tool-attributed CVEs (publicly disclosed software vulnerabilities) tripled quarter over quarter, reaching 56 in the first three months of 2026 alone. A formal verification study found 55.8% of AI-generated code contained vulnerabilities that existing static analysis tools caught at only 2.2%. Separately, Johns Hopkins researchers demonstrated that a single malicious PR title could hijack three major coding agents to steal credentials, rated critically severe by Anthropic. Any organization shipping AI-generated code without dedicated security review processes is accumulating risk faster than it knows.
The review bottleneck deepened. A survey of nearly 3,000 developers found they now spend more time reviewing AI code (11.4 hours per week) than writing it (9.8 hours). AI-generated pull requests wait 4.6 times longer for review than human code. The implication is clear: investing in code generation tools without expanding review capacity creates a more expensive workflow, not a cheaper one.
Autonomous coding stayed stuck. Despite enormous vendor investment -- GitHub paused Copilot signups after agentic workflows exceeded compute budgets -- fully autonomous coding and multi-agent development pipelines showed no movement toward broader deployment. Only about one in ten organizations has taken these beyond piloting, and that ratio has not changed.

Coming Up

Regulated-industry compliance walls are approaching. No cloud-based AI coding assistant is currently HIPAA-compliant, blocking deployment across healthcare, insurance, and adjacent sectors. Teams in regulated industries should evaluate local-inference alternatives (JetBrains, Tabnine on-premises) now, before compliance audits force the issue.
Review capacity will become a hiring and tooling priority. With AI-generated PRs creating measurable delivery bottlenecks, expect investment to shift toward review automation, tiered approval workflows, and dedicated code review teams. Organizations should audit their current review-to-generation ratio and set explicit policy on which code paths require human sign-off.
Specification-driven development is the emerging organizational standard. AWS Kiro, GitHub Spec Kit, and OpenSpec are in production use at enterprises writing machine-readable specifications that constrain what AI agents produce. Organizations that build this discipline now position themselves for the next generation of agent tooling; those that skip it will face the same quality problems at higher volume.

What's Hard About This

Review does not scale like generation. AI can produce code at machine speed, but every line still needs human review for security, architecture, and business logic. No tool has solved this mismatch. Organizations that procured AI coding tools expecting headcount savings are discovering they need more reviewer capacity, not less.
Security detection lags vulnerability creation. AI-generated code carries roughly 2.7 times the vulnerability rate of human code, but static analysis tools catch only a fraction of those flaws. The gap between what AI produces and what security tooling can verify is widening each quarter, creating compounding risk for organizations without dedicated AppSec programs.
Productivity metrics do not match outcomes. Developers report feeling faster while independent measurement shows them performing slower. Most enterprise ROI cases for AI coding tools rely on self-reported data or vendor metrics (lines of code, PR volume) that do not correlate with business outcomes. Only 4% of enterprises report material business impact from these tools despite 78% adoption.

Go deeper: the full Software Engineering briefing — the longer analytical write-up, plus every practice we track in this domain with its maturity rating, the tools to consider, and the evidence behind our assessment.