The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail
AI agents completing production coding tasks within supervised workflows, including PR creation and review cycles. Includes agent-generated PRs with human review gates and CI checks; distinct from fully autonomous coding which removes the human approval step.
Agentic coding for production integration -- AI agents that write, test, and propose code changes through pull requests with human review gates and CI checks -- is widely experimented with but remains fundamentally constrained by governance and safety barriers. Enterprise adoption has reached 90% at the pilot level, yet only 11% of agentic AI projects escape the pilot phase into production. The tier-defining tension is not capability but risk: real-world deployments consistently trigger critical failures when agents retain production access without execution boundaries. April 2026 incidents document the pattern: a Cursor-based agent deleted a production database due to credential mismatch and unverified API permissions; Amazon's Kiro agent destroyed AWS infrastructure and caused retail outages totalling 6.3M orders; GitHub's platform itself buckles under 275M commits per week and 17M agent-generated PRs per month. Agents reliably handle boilerplate and refactoring -- 26% of refactoring commits are explicitly agent-authored with measurable quality gains -- but fail on 70-90% of complex tasks. Successful supervised workflows (Microsoft .NET runtime: 878 agent PRs, 67.9% merge rate over 10 months) require explicit human oversight, structured tool gateways, and immutable state management. The bottleneck has shifted decisively from capability to architecture: not "can agents write better code?" but "can we build organizational and technical controls that prevent unverified decisions from becoming destructive actions at machine speed?"
GitHub dominates the supervised agentic workflow infrastructure, optimized to 50% faster startup by March 2026 and formalized with the Agents tab for centralized management. Copilot reached 4.7M paid subscribers (January 2026, +75% YoY) with 90% Fortune 100 penetration, and Claude Code achieved $2.5B ARR, confirming platform maturity. Yet April 2026 evidence reveals production integration at scale has become genuinely hazardous. GitHub's infrastructure absorbed 275M commits per week (14x YoY growth) and 17M agent-generated PRs per month before experiencing five major outages in early April alone -- search downtime, Copilot backend exhaustion, agent session failures. The velocity has outstripped safety: Claude Code generates 2.6M commits weekly, a 25x increase from September. May 2026 analysis of 29,585 real GitHub PR lifecycles (Chung & Hassan) confirms that across five production agents (Copilot, Devin, Cursor, Claude Code, OpenAI), merge authority remains "almost exclusively human," with tooling distributing agent initiative differently but supervision gates consistently blocking autonomous deployment. Platform maturity is organizational, not technical: GitHub's May changelog formalizes secrets management and cloud agent environment controls; IBM Bob (May GA) and OpenAI Codex (May disclosure) both detail enterprise production architecture including sandboxing, approval workflows, and telemetry.
Real production failures now dominate the evidence base. April 2026 incident logs document the systematic pattern: (1) Cursor agent deleted production Railway database after finding unverified API token in unrelated file; (2) Amazon's Kiro agent destroyed entire AWS infrastructure (RDS, VPC, ECS, load balancers) after misinterpreting stale Terraform state, causing retail outages totalling 6.3M lost orders (~$6.3M impact) and triggering 90-day code safety reset requiring two-reviewer minimum on all AI-generated changes; (3) GitHub infrastructure strain from agent-driven load, causing cascading outages. Independent research from April 2026 validates the severity: MSR study of 11,771 real production PRs found top models complete only 24% of complex tasks autonomously with 70-90% failure rates as complexity increases; Lightrun's production data shows 49% of AI-generated code fails in production despite passing QA. Security remains critical: 33,000+ agent-generated PRs show recurring vulnerabilities (regex inefficiencies, injection flaws, path traversal) that are merged despite known issues, and analysis of legacy codebases shows AI code produces 2.74x more security vulnerabilities and 1.7x more issues than human code. May 2026 security research intensified this crisis: Microsoft Defender team (May 7) disclosed CVE-2026-26030 and CVE-2026-25592 in Semantic Kernel (27K GitHub stars), demonstrating RCE execution via prompt injection in production agent frameworks; six coordinated research teams (May 9) disclosed credential theft exploits across Codex, Claude Code, Copilot, and Vertex AI, revealing 78% of enterprises lack PAM (Privileged Access Management) for agent credentials; Adversa.AI disclosed TrustFall supply-chain attack (May 7) showing malicious repository injection succeeds identically across all major CLI agents (Claude, Cursor, Copilot, Gemini); NVIDIA Red Team disclosed AGENTS.md injection vector (April 30) specific to cloned repositories in production environments. The gap between framework-level guarantees and deployed reality is absolute: vulnerabilities exist at the design level (tool parameter trust) rather than implementation, making patching insufficient without architectural redesign.
Yet supervised deployments do work at production scale when guardrails are explicit. Microsoft's .NET runtime team achieved 67.9% PR merge rate over 10 months (878 PRs, 535 merged) with 0.6% revert rate -- equivalent to human-authored code -- through explicit human oversight, tool gateways with schema validation, and immutable state management. Specialized single-agent deployments (Iowa fintech: 6 agents, 585 sessions; Ledgerpoint: 180K LOC Java→Kotlin in 8 weeks, 94% first-pass approval) succeeded by enforcing narrow scope, verifiable output, and review gates. The structural pattern is clear: autonomous or minimally-reviewed agents cause cascading failures; supervised agents within tight boundaries succeed. The governance bottleneck remains unsolved at scale: Lightrun reports developers spend 38% of week (double pre-AI baseline) debugging and verifying AI code, with hidden verification costs of $82-103K per month for 20-person teams. Developers spend 4.6x longer reviewing agent PRs, introduction of agent code correlates with 52% increase in review time and 18% rise in production incidents even in supervised workflows. The tier-defining constraint is not capability but architectural readiness: whether organizations can enforce cost caps, tool-call validation, immutable memory boundaries, and observability instrumentation.
— IBM Bob GA (April 2026) end-to-end SDLC agent with named customer deployments, specific metrics, and human-in-the-loop review architecture for production integration.
— CVE-2026-26030 and CVE-2026-25592 in Microsoft Semantic Kernel demonstrate RCE execution risk in production agent frameworks via prompt injection, directly impacting deployed agentic systems.
— Six research teams disclosed coordinated exploits of production agents (Codex, Claude Code, Copilot, Vertex AI), revealing credential theft vectors and 78% enterprise adoption gap in agent PAM controls.
— OpenAI Codex production safety architecture for enterprise deployment: sandboxing, human approval workflows, network policies, and continuous telemetry for CI/CD integration.
— Empirical analysis of 29,585 real GitHub PR lifecycles across five production agents (Copilot, Devin, Cursor, Claude Code, OpenAI) shows merge authority remains almost exclusively human with distinct tool-specific governance patterns.
— GitHub GA: Copilot cloud agent improvements, org-level secrets/variables configuration, and cloud agent environment management for production agentic integration.
— Microsoft Defender Security Team disclosed host-level RCE vulnerabilities in widely-used Semantic Kernel framework (27K+ GitHub stars), demonstrating execution risk for integrated production agents.
— Adversa.AI 'TrustFall' attack demonstrates supply-chain compromise via malicious repository cloned by agents in production CI/CD pipelines across Claude Code, Cursor, Copilot, and Gemini.