The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail
AI that autonomously approves and merges code changes meeting defined quality and safety thresholds. Includes automated merge for low-risk changes like dependency bumps; distinct from suggestion-mode review which always requires human sign-off.
Autonomous code approval -- where AI reviews, approves, and merges changes without a human gate -- remains organisationally stalled despite continued vendor maturity. The technical capability exists: Claude Code ships auto-merge, OpenAI deployed AI-reviewing-AI with 99.93% approval accuracy, and production case studies document bounded auto-approve (Ona, Augment, mabl). Yet every major deployment maintains human gates. The defining tension is not capability but safety: PanDev Metrics' study of 100 teams documented that AI-only auto-approve escapes 46% more defects and doubles severity-1 incident rates; independent benchmarks show review tools catch only 33% of production bugs. Auto-approve in production is consistently bounded (low-risk PRs under 1K LOC, no migrations or auth) with humans making final merge decisions. The practice remains bleeding-edge where vendor maturity meets hard organisational constraints: 96% of developers distrust AI code accuracy, and review has become the visible bottleneck (91% increase in review time despite code acceleration). Whether organisations will delegate the approve-and-merge decision to AI systems continues to receive negative signals.
Vendor ecosystem maturity has reached full GA on multiple fronts. Claude Code ships auto-merge, OpenAI's Auto-review system (deployed in Codex) achieves 99.93% approval accuracy with 99.3% prompt-injection blocking—the first frontier-lab production deployment of AI-reviewing-AI with quantified safety metrics. Amazon Q Developer and GitLab Duo offer integrated review automation. CodeRabbit has connected 2M repositories and analyzed 75M defects. Production deployments demonstrate organisationally-acceptable bounded auto-approve: Ona achieved 74% lead-time reduction (4.1h → 1.1h) with explicit low-risk scoping (<1K LOC, no migrations or auth changes); Augment's Cosmos agents achieved 3x code output with Intent Reviewer gates for high-judgment decisions; mabl scaled to 75+ repositories with strict governance that "there is no scenario where code auto-merges without human approval."
Yet this vendor maturity encounters hard safety and organisational limits. PanDev Metrics' empirical study of 100 B2B teams (23,847 PRs) found AI-only auto-approve configuration escapes 46% more defects with 18% post-merge rework rate and near-doubled severity-1 incident rate. Independent benchmark (Entelligence) on 67 production bugs shows CodeRabbit catches 33%, Copilot 22.6%—insufficient safety margins for autonomous approval. Developers spend 91% longer reviewing PRs despite AI acceleration, and 96% distrust AI code accuracy. Practitioner governance consistently rejects full automation: mabl, Google Cloud (after a YOLO-mode incident), and other deployments maintain human approval gates. Security research documented exploitation vectors: prompt injection can spoof git author identity and trigger auto-approve of malicious payloads (12,400+ public claude-code-action workflows exposed). The core pattern is unchanged: organisations adopt AI review as productivity augmentation while governance demands human final authority on merge decisions. Code-quality bottlenecks and trust deficits prevent delegation of autonomous approval.
— MSR 2026: 28.3% AI PRs merge instantly but many agents fail to converge under review. GitClear documents 9x higher code churn with AI. 66% of developers report AI outputs 'almost correct' but flawed.
— Independent benchmark on 67 production bugs: CodeRabbit 33% catch rate, Copilot 22.6%. Low safety margins undermine auto-approve assumptions; combined with PanDev's 46% defect escape, margin approaches zero.
— Plandek analyzed 2,000+ teams: code review became visible bottleneck; bottom-quartile teams take 35+ hours to merge, top teams 21 hours. AI exposes delivery system weakness, does not fix it.
— PanDev Metrics tracked 100 teams over 15 months analyzing 23,847 PRs: AI-only auto-approve escapes 46% more defects, generates 18% post-merge rework rate, doubles severity-1 incidents vs baseline.
— Ona deployed bounded auto-approve (low-risk: <1K LOC, no migrations/auth). Lead time dropped 74% (4.1h → 1.1h), deploys tripled (3.1x). Human always merges; governance via objective criteria.
— Augment deployed Cosmos agents with auto-approve for low-risk PRs (docs, configs). Code output 3x, merge time halved, bug rate per output stable. Intent Reviewer gates high-judgment decisions to humans.
— Security researchers exploited auto-approve in Claude Code with git identity spoofing + malicious payload. 12,400+ public workflows use claude-code-action. Documented supply-chain attack vector against auto-approve.
— LinearB + CircleCI 2026: PR review time increased 91% despite AI acceleration; 39-point perception gap between feeling fast and actual delivery. Review is now the critical constraint, not code generation.