The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail
AI that generates, configures, or optimises CI/CD pipelines and infrastructure-as-code definitions for faster, safer deployments. Includes pipeline YAML generation, build optimisation, and cloud resource templating; distinct from deployment risk assessment which evaluates changes rather than generating configurations.
AI-generated CI/CD pipelines and infrastructure-as-code have entered pragmatic production deployment with clear safety boundaries. LLMs now generate 20% of infrastructure deployments (Pulumi, May 2026) with trajectories toward 50%+ by end of 2026, yet measurable capability limits persist: Sonnet 3.7 succeeds on only 34% of realistic incremental AWS CDK edits (Amazon Science benchmark, June 2026), signaling that autonomous generation remains immature for complex modifications. The practice's maturity hinges not on code-generation capability (which is established) but on three operational constraints: verification infrastructure (81% of enterprise leaders report production failures from AI-generated code despite 92% confidence), governance patterns (safety barriers like human approval gates and infrastructure isolation demonstrating 27% success improvement), and systematic vulnerability mitigation (70–97% of AI-generated Terraform/Dockerfiles/CI pipelines contain fixable security flaws). Where teams implement guardrails -- physical separation of plan from apply, mandatory PR review, policy enforcement, and security scanning -- results are strong (87% faster development, 99.8% deployment success). Where governance is superficial, AI-assisted generation amplifies risk. The defining tension has crystallized: the bottleneck is no longer code generation capability, but organizational readiness to absorb what agentic systems produce safely.
Agentic infrastructure generation is now in production at scale across named enterprises: Pulumi reports LLMs generating 20% of infrastructure deployments as of May 2026, with trajectory to exceed 50% by year-end. Named deployments confirm delivery: CTS deployed AI-driven (Claude Code) Terraform IaC in production across multi-cloud (GCP/AWS) with explicit safety patterns—Workload Identity Federation for credential management, 3-layer module structure, and mandatory manual approval for terraform apply. TD Bank achieved 1,360 work-hours saved (12,850 LOC) with Copilot contributing ~15% under strict human oversight. EY deployed IBM Bob for Terraform modernization with mandatory code review. These evidence points establish that production IaC generation with human-in-the-loop operation is operational and delivering measurable ROI.
However, capability asymmetry is hardening as the practice scales. Amazon Science's SWE-InfraBench (June 2026) benchmarks LLMs on realistic incremental AWS CDK modifications from real-world codebases: the best performer (Claude Sonnet 3.7) succeeds on only 34% of tasks; specialized reasoning models (DeepSeek R1) achieve just 24%. This represents the critical constraint: LLMs can generate syntactically valid infrastructure code from scratch, but struggle with incremental modification, context preservation, and complex resource dependencies—the actual work engineers do in production. Combined with empirical data (81% of 200+ enterprise leaders report production failures from AI-generated code, CloudBees June 2026), the practice reveals a maturity split: point solutions and greenfield generation work; incremental modification and complex state management remain brittle.
Quality and governance remain binding constraints. AI-generated infrastructure code shows systematically high vulnerability rates (70–97% contain fixable flaws in Terraform, Dockerfiles, CI/CD pipelines per IOActive April 2026), with predictable failure modes: hardcoded secrets, deprecated provider patterns, hallucinated resources, and authorization logic errors. Practitioner operating experience clarifies the cost: without guardrails, agents achieve 62% success with major incidents (database corruption, IAM escalation, hallucinated limits); with infrastructure isolation, approval gates, and policy enforcement, success improves to 89% but requires significant operational investment. The EU AI Act's August 2026 effective date for high-risk obligations (real-time monitoring, automated escalation, audit trails) creates regulatory pressure forcing governance maturity. Organizations with strong pre-AI CI/CD discipline (mandatory review, security scanning, policy gates, standardized templates) absorb AI-generated IaC safely; those with superficial governance see velocity gains coupled with accumulated risk. Supply-chain exposure from the AI tooling layer itself (agent skills, MCP servers, model configs) remains outside traditional audit frameworks: April 2026 incidents (Vercel OAuth compromise, SAP npm token exposure via AI configs) exposed CI/CD secrets at scale. The limiting factor has shifted definitively from code-generation capability to governance and operational readiness.
— Peer-reviewed benchmark from Amazon Science: best-performing model (Sonnet 3.7) succeeds in only 34% of realistic AWS CDK incremental edit tasks; specialized reasoning models (DeepSeek R1) achieve just 24%. Critical negative signal: significant capability gaps persist even in state-of-the-art models for production IaC modifications.
— Independent research (200+ enterprise leaders): 81% report production issues tied to AI-generated code; 92% express confidence despite quality gaps. Organizations self-assess at 83.6/100 on CARE Index but struggle to trace AI spend to business outcomes, revealing adoption at scale without commensurate governance maturity.
— Named organization (CTS) deploying Claude Code for production Terraform across multi-cloud (GCP/AWS) with Workload Identity Federation (WIF), 3-layer module structure, and physical separation of terraform plan (automatic) from apply (manual approval). Quote: 'AI writes, humans steer, and the CI/CD pipeline acts as a safety mechanism.' Operational maturity: full production across multi-cloud with documented safety patterns.
— Pulumi CEO Joe Duffy reports LLMs now generate 20% of infrastructure deployments (up from ~0% one year ago), expected to exceed 50% by end of 2026. Growth from 20% to 50%+ within 8 months represents significant acceleration in agentic IaC deployment adoption across production workloads.
— Named enterprises (TD Bank, EY, medical device company) deploying IaC generation with AI agents (Copilot, IBM Bob, Claude Code on Terraform/OpenTofu). TD Bank: 12,850 LOC, 1,360 work-hours saved with Copilot (~15% under human oversight). EY: IBM Bob on Terraform modernization with human review gates. Common finding: productivity gains with human-in-the-loop operation; autonomous generation 'isn't ready' for production.
— Security architecture patterns for agentic CI/CD; documents TrapDoor supply chain attack (May 2026) compromising 34 packages via invisible Unicode in .cursorrules. Demonstrates that prompt-based guardrails are unenforceable; only infrastructure isolation (IAM, network, ephemeral credentials) prevents agent-based attacks.
— Two-year production track record from named engineer; identifies clear safety boundaries: AI excels at code generation (EKS, runbooks) but fails systematically on security configs, state-modifying ops, networking. Demonstrates domain-specific risk tiers in IaC generation.
— Comprehensive quality metrics: AI comprises 30-70% of committed code, code churn doubled from 3.3% to 7.1%, AI-generated code turnover 1.8-2.5x higher than human code. Direct signal: velocity gains upstream create maintenance debt downstream; only 31% of AI spend attributable to business outcomes.