The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail
AI that autonomously approves and merges code changes meeting defined quality and safety thresholds. Includes automated merge for low-risk changes like dependency bumps; distinct from suggestion-mode review which always requires human sign-off.
Autonomous code approval -- where AI reviews, approves, and merges changes without a human gate -- remains organisationally stalled despite continued vendor maturity. The technical capability exists: Claude Code ships auto-merge, OpenAI deployed AI-reviewing-AI with 99.93% approval accuracy, and production case studies document bounded auto-approve (Ona, Augment, mabl). Yet every major deployment maintains human gates. The defining tension is not capability but safety: PanDev Metrics' study of 100 teams documented that AI-only auto-approve escapes 46% more defects and doubles severity-1 incident rates; independent benchmarks show review tools catch only 33% of production bugs. Auto-approve in production is consistently bounded (low-risk PRs under 1K LOC, no migrations or auth) with humans making final merge decisions. The practice remains bleeding-edge where vendor maturity meets hard organisational constraints: 96% of developers distrust AI code accuracy, and review has become the visible bottleneck (91% increase in review time despite code acceleration). Whether organisations will delegate the approve-and-merge decision to AI systems continues to receive negative signals.
Vendor ecosystem maturity has reached full GA on multiple fronts. Claude Code ships auto-merge, OpenAI's Auto-review system (deployed in Codex) achieves 99.93% approval accuracy with 99.3% prompt-injection blocking—the first frontier-lab production deployment of AI-reviewing-AI with quantified safety metrics. Amazon Q Developer and GitLab Duo offer integrated review automation. CodeRabbit has connected 2M repositories and analyzed 75M defects. Production deployments demonstrate organisationally-acceptable bounded auto-approve: Ona achieved 74% lead-time reduction (4.1h → 1.1h) with explicit low-risk scoping (<1K LOC, no migrations or auth changes); Augment's Cosmos agents achieved 3x code output with Intent Reviewer gates for high-judgment decisions; mabl scaled to 75+ repositories with strict governance that "there is no scenario where code auto-merges without human approval."
Yet this vendor maturity encounters hard safety and organisational limits. PanDev Metrics' empirical study of 100 B2B teams (23,847 PRs) found AI-only auto-approve configuration escapes 46% more defects with 18% post-merge rework rate and near-doubled severity-1 incident rate. Independent benchmark (Entelligence) on 67 production bugs shows CodeRabbit catches 33%, Copilot 22.6%—insufficient safety margins for autonomous approval. Developers spend 91% longer reviewing PRs despite AI acceleration, and 96% distrust AI code accuracy. Practitioner governance consistently rejects full automation: mabl, Google Cloud (after a YOLO-mode incident), and other deployments maintain human approval gates. Security research documented exploitation vectors: prompt injection can spoof git author identity and trigger auto-approve of malicious payloads (12,400+ public claude-code-action workflows exposed). The core pattern is unchanged: organisations adopt AI review as productivity augmentation while governance demands human final authority on merge decisions. Code-quality bottlenecks and trust deficits prevent delegation of autonomous approval.
— IDEsaster security vulnerability class: 24 assigned CVEs across Cursor, GitHub Copilot, Windsurf, Zed showing auto-approved tool calls defeated by prompt injection, enabling data exfiltration and RCE; proves auto-approve gates bypassed via prompts.
— Apiiro Fortune 50 analysis: AI-assisted developers commit 3-4x faster but introduce vulnerabilities at 10x rate; privilege escalation +322%, architectural flaws +153%; critical barrier constraining auto-approve adoption.
— Anthropic deployed automated Claude reviewer in production code gate analyzing every proposed change before merge; retrospective analysis found automated reviewer would catch roughly one-third of bugs behind past production outages.
— Veracode testing of 100+ LLMs: 45% vulnerable; developers more likely to rate insecure code as secure; Stanford study confirms developers using AI more prone to security bugs; proposes human review required on AI-generated PRs.
— GitHub's official GA announcement of Agent Merge feature enabling automatic PR approval, comment addressing, CI fix, and merge execution when team-defined conditions are met; developers explicitly control which auto-approval actions agents perform.
— Microsoft Research empirical study (17 developers): identifies problematic oversight heuristics developers adopt (test passing as correctness proxy; trusting agents with unfamiliar contexts), revealing situated challenges blocking autonomous approval.
— Named company (Rewind) production case study of Diff Vader auto-approve system processing ~1,000+ PRs/month; documents risk-based grading architecture distinguishing high-risk database migrations from low-risk boilerplate, specialist reviewer councils, and deterministic verdict engines.
— UK ICO's automated decision-making guidance (effective June 2026) defines meaningful human involvement as active and informed real-time review; clicking approve without understanding does not meet the standard; applies to code review systems under EU AI Act.