Perly Consulting │ Beck Eco

The State of Play

A living index of AI adoption across industries — where established practice meets the bleeding edge
UPDATED DAILY

The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.

The Daily Dispatch

A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.

AI Maturity by Domain

Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail

DOMAIN
BLEEDING EDGEESTABLISHED

CI/CD & infrastructure-as-code generation

LEADING EDGE

TRAJECTORY

Stalled

AI that generates, configures, or optimises CI/CD pipelines and infrastructure-as-code definitions for faster, safer deployments. Includes pipeline YAML generation, build optimisation, and cloud resource templating; distinct from deployment risk assessment which evaluates changes rather than generating configurations.

OVERVIEW

AI-generated CI/CD pipelines and infrastructure-as-code have entered pragmatic production deployment with clear safety boundaries. LLMs now generate 20% of infrastructure deployments (Pulumi, May 2026) with trajectories toward 50%+ by end of 2026, yet measurable capability limits persist: Sonnet 3.7 succeeds on only 34% of realistic incremental AWS CDK edits (Amazon Science benchmark, June 2026), signaling that autonomous generation remains immature for complex modifications. The practice's maturity hinges not on code-generation capability (which is established) but on three operational constraints: verification infrastructure (81% of enterprise leaders report production failures from AI-generated code despite 92% confidence), governance patterns (safety barriers like human approval gates and infrastructure isolation demonstrating 27% success improvement), and systematic vulnerability mitigation (70–97% of AI-generated Terraform/Dockerfiles/CI pipelines contain fixable security flaws). Where teams implement guardrails -- physical separation of plan from apply, mandatory PR review, policy enforcement, and security scanning -- results are strong (87% faster development, 99.8% deployment success). Where governance is superficial, AI-assisted generation amplifies risk. The defining tension has crystallized: the bottleneck is no longer code generation capability, but organizational readiness to absorb what agentic systems produce safely.

CURRENT LANDSCAPE

Agentic infrastructure generation is now in production at scale across named enterprises: Pulumi reports LLMs generating 20% of infrastructure deployments as of May 2026, with trajectory to exceed 50% by year-end. Named deployments confirm delivery: CTS deployed AI-driven (Claude Code) Terraform IaC in production across multi-cloud (GCP/AWS) with explicit safety patterns—Workload Identity Federation for credential management, 3-layer module structure, and mandatory manual approval for terraform apply. TD Bank achieved 1,360 work-hours saved (12,850 LOC) with Copilot contributing ~15% under strict human oversight. EY deployed IBM Bob for Terraform modernization with mandatory code review. These evidence points establish that production IaC generation with human-in-the-loop operation is operational and delivering measurable ROI.

However, capability asymmetry is hardening as the practice scales. Amazon Science's SWE-InfraBench (June 2026) benchmarks LLMs on realistic incremental AWS CDK modifications from real-world codebases: the best performer (Claude Sonnet 3.7) succeeds on only 34% of tasks; specialized reasoning models (DeepSeek R1) achieve just 24%. This represents the critical constraint: LLMs can generate syntactically valid infrastructure code from scratch, but struggle with incremental modification, context preservation, and complex resource dependencies—the actual work engineers do in production. Combined with empirical data (81% of 200+ enterprise leaders report production failures from AI-generated code, CloudBees June 2026), the practice reveals a maturity split: point solutions and greenfield generation work; incremental modification and complex state management remain brittle.

Quality and governance remain binding constraints. AI-generated infrastructure code shows systematically high vulnerability rates (70–97% contain fixable flaws in Terraform, Dockerfiles, CI/CD pipelines per IOActive April 2026), with predictable failure modes: hardcoded secrets, deprecated provider patterns, hallucinated resources, and authorization logic errors. Practitioner operating experience clarifies the cost: without guardrails, agents achieve 62% success with major incidents (database corruption, IAM escalation, hallucinated limits); with infrastructure isolation, approval gates, and policy enforcement, success improves to 89% but requires significant operational investment. The EU AI Act's August 2026 effective date for high-risk obligations (real-time monitoring, automated escalation, audit trails) creates regulatory pressure forcing governance maturity. Organizations with strong pre-AI CI/CD discipline (mandatory review, security scanning, policy gates, standardized templates) absorb AI-generated IaC safely; those with superficial governance see velocity gains coupled with accumulated risk. Supply-chain exposure from the AI tooling layer itself (agent skills, MCP servers, model configs) remains outside traditional audit frameworks: April 2026 incidents (Vercel OAuth compromise, SAP npm token exposure via AI configs) exposed CI/CD secrets at scale. The limiting factor has shifted definitively from code-generation capability to governance and operational readiness.

TIER HISTORY

ResearchMar-2023 → Mar-2023
Bleeding EdgeMar-2023 → Apr-2024
Leading EdgeApr-2024 → present

EVIDENCE (111)

— Peer-reviewed benchmark from Amazon Science: best-performing model (Sonnet 3.7) succeeds in only 34% of realistic AWS CDK incremental edit tasks; specialized reasoning models (DeepSeek R1) achieve just 24%. Critical negative signal: significant capability gaps persist even in state-of-the-art models for production IaC modifications.

— Independent research (200+ enterprise leaders): 81% report production issues tied to AI-generated code; 92% express confidence despite quality gaps. Organizations self-assess at 83.6/100 on CARE Index but struggle to trace AI spend to business outcomes, revealing adoption at scale without commensurate governance maturity.

— Named organization (CTS) deploying Claude Code for production Terraform across multi-cloud (GCP/AWS) with Workload Identity Federation (WIF), 3-layer module structure, and physical separation of terraform plan (automatic) from apply (manual approval). Quote: 'AI writes, humans steer, and the CI/CD pipeline acts as a safety mechanism.' Operational maturity: full production across multi-cloud with documented safety patterns.

— Pulumi CEO Joe Duffy reports LLMs now generate 20% of infrastructure deployments (up from ~0% one year ago), expected to exceed 50% by end of 2026. Growth from 20% to 50%+ within 8 months represents significant acceleration in agentic IaC deployment adoption across production workloads.

— Named enterprises (TD Bank, EY, medical device company) deploying IaC generation with AI agents (Copilot, IBM Bob, Claude Code on Terraform/OpenTofu). TD Bank: 12,850 LOC, 1,360 work-hours saved with Copilot (~15% under human oversight). EY: IBM Bob on Terraform modernization with human review gates. Common finding: productivity gains with human-in-the-loop operation; autonomous generation 'isn't ready' for production.

— Security architecture patterns for agentic CI/CD; documents TrapDoor supply chain attack (May 2026) compromising 34 packages via invisible Unicode in .cursorrules. Demonstrates that prompt-based guardrails are unenforceable; only infrastructure isolation (IAM, network, ephemeral credentials) prevents agent-based attacks.

— Two-year production track record from named engineer; identifies clear safety boundaries: AI excels at code generation (EKS, runbooks) but fails systematically on security configs, state-modifying ops, networking. Demonstrates domain-specific risk tiers in IaC generation.

— Comprehensive quality metrics: AI comprises 30-70% of committed code, code churn doubled from 3.3% to 7.1%, AI-generated code turnover 1.8-2.5x higher than human code. Direct signal: velocity gains upstream create maintenance debt downstream; only 31% of AI spend attributable to business outcomes.

HISTORY

  • 2023-H1: GitLab 16 launched with AI-powered code suggestions and vulnerability explanation; CARFAX case study showed productivity gains with 20% increase in production deployments. Early tools like Pulumi AI and Firefly's AIaC emerged for IaC generation from natural language. Security concerns noted: 40% of AI-generated code contained vulnerabilities, underscoring the need for human review and automated validation.
  • 2023-H2: Major vendors expanded IaC AI capabilities: AWS launched Amazon Q Developer with explicit Terraform/CloudFormation/CDK support; GitLab reported 1 billion CI/CD pipelines with enterprise adoption; HashiCorp introduced AI-generated Terraform module tests (beta). GitLab Duo Code Suggestions reached GA with claimed 7x faster cycle times. Critical research published documenting security flaws in LLM-generated code, confirming quality barriers remain unresolved despite scale of deployment.
  • 2024-Q1: All major cloud vendors shipped production IaC generation capabilities: AWS expanded Amazon Q Developer with infrastructure generator for existing resources; GitLab Duo adoption accelerated with measurable DORA metric improvements reported by customers. Academic survey (March 2024) documented the state of LLM-based IaC generation, confirming persistent quality and security gaps. Adoption pattern emerged: rapid capability deployment paired with mandatory governance controls (automated validation, security scanning, human review).
  • 2024-Q2: AWS Amazon Q Developer reached GA with confirmed adoption metrics (37-50% code acceptance rates at BT Group and NAB); deployment evidence widened beyond announcements to named case studies showing measurable ROI (Notifix: ~40% manual operations reduction, €2,400/month savings per 6-engineer team). GitLab Duo expanded root cause analysis for CI/CD failures, automating troubleshooting of pipeline errors and infrastructure deployment issues. Thoughtworks Radar issued cautionary hold on AI-generated code complacency, signaling persistent governance requirements remain essential for safe deployment at scale.
  • 2024-Q3: GitLab Duo Enterprise reached GA and was recognized as a Leader in Gartner's first Magic Quadrant for AI Code Assistants. AWS demonstrated real-world IaC generation workflows (CDK for serverless, GitLab CI/CD YAML scaffolding) and enhanced CloudFormation's IaC Generator with resource discovery. However, Veracode's Black Hat 2024 presentation documented that 40% of AI-generated code contains known vulnerabilities, with emerging threats from poisoned datasets—confirming that code quality and security governance remained unresolved barriers despite accelerating vendor adoption and enterprise deployment.
  • 2024-Q4: Ecosystem maturity accelerated with new vendors (Infra.new) and expanded features across platforms. GitLab internal testing showed 84% test coverage in 2 days with AI-assisted generation. However, two independent peer-reviewed studies published in late 2024 confirmed the critical governance barrier: systematic literature review and Georgetown CSET evaluation both documented ~50% of AI-generated code contains security vulnerabilities and bugs. Practitioner reports highlighted reliability failures with current tools. By year-end, the pattern was definitive: AI-generated CI/CD and IaC code had achieved broad adoption and clear productivity benefits, but persistent code-quality and security risks remained fundamental barriers—success depended entirely on mandatory human review, automated scanning, and strict governance controls.
  • 2025-Q1: Vendor platforms deepened agentic integration: GitLab announced Duo Workflow in private beta for automating project bootstrapping and CI/CD configuration; GitHub Copilot shipped troubleshooting feature (GA) for GitHub Actions; Amazon Q expanded CLI integrations enabling practitioner teams to embed AI code review into pipelines. Real-world practitioner integration blogs documented specific deployments. However, adoption barriers sharpened: Infragistics survey found 45% of tech leaders cite AI code reliability as a top challenge, with 55% viewing AI deployment as their biggest business risk despite 73% planning expansion. Snyk warned of silent security vulnerabilities in AI-generated code requiring mandatory scanning. AWS's re:Invent session acknowledged catastrophic infrastructure error risks and agent detachment from failures. The critical inflection point persisted: vendor platforms had matured and practitioner adoption was widening, but fundamental code-quality and governance constraints remained non-negotiable for production safety.
  • 2025-Q2: Academic and vendor research confirmed deployment maturity alongside persistent quality risks. AWS researchers published Multi-IaC-Eval benchmark showing modern LLMs achieving >95% syntactic validity in CloudFormation, Terraform, and CDK generation, but facing significant semantic alignment challenges. GitLab Duo Enterprise shipped AI impact analytics dashboard enabling measurement of IaC generation effectiveness on SDLC metrics (deployment frequency, change failure rate, cycle time). However, Apiiro's security analysis reconfirmed 40%+ of AI-generated code contains vulnerabilities, with developers committing insecure code faster than security teams can validate. Practitioner reports of GitHub Copilot failures generating 60-90% speculative content for CI/CD configuration highlighted fundamental reliability limits. Gopher Security survey found 72% of practitioners view GenAI as top IT risk. Market evolution accelerated toward contextual, AI-powered IaC security validation and secure-by-design approaches. The pattern solidified: deployment at scale was real and widening, but success remained entirely dependent on mandatory human review, automated security scanning, and strict governance guardrails around AI-generated infrastructure code.
  • 2025-Q3: Deployment adoption continued accelerating with concrete case studies and time-saving evidence. AWS published QRRA case study showing Amazon Q Developer enabling 3-day microservices modernization; documented Lambda CI/CD pipeline setup automation reducing 30-60 minute manual process to 5 minutes. Academic research (arXiv) proposed reference architecture for policy-bounded AI co-pilots in CI/CD with decision taxonomy and trust-tier frameworks. GitLab shipped GraphQL APIs for AI usage metrics across all Duo features, signaling enterprise-grade measurement infrastructure maturity. However, Veracode's Q3 security analysis confirmed persistent code-quality barriers: 45% of AI-generated code contains vulnerabilities with language-specific failure rates (Java 71%, Python 38%, C# 45%), particularly weak on CWE-80 (XSS) and CWE-117 (log injection). Practitioner consensus emerged on governance patterns: data provenance, model versioning, bias detection, and compliance monitoring became standard operational considerations. The critical pattern hardened: real-world deployment at enterprise scale was concrete and measurable, but fundamental security and reliability governance barriers remained non-negotiable for safe production adoption.
  • 2025-Q4: Real-world deployment at enterprise scale widened with multi-cloud adoption evidence and analytics infrastructure maturity. echo3D migrated to multi-cloud IaC using Amazon Q Developer, achieving 87% faster development time and 99.8% deployment success rates; AI-assisted code conversion pipelines achieved 95%+ accuracy converting CloudFormation to Terraform. GitLab shipped enhanced AI impact analytics dashboard with 6-month SDLC trend visibility, signaling vendor capability maturity for measuring AI-generated code impact at scale. However, critical negative signals persisted: Futurum survey found 60% of organizations concerned about AI-generated vulnerabilities, with 53% having discovered critical or high-severity flaws in the past twelve months. Practitioner experiments showed mixed results—autonomous GitHub Copilot CI/CD pipeline management reduced test incidents by 50% but failed on ambiguous requirements. The practice had entered a phase of pragmatic deployment: adoption was real and productivity benefits were concrete, but fundamental security, reliability, and governance barriers remained unresolved—success in production continued to depend entirely on strict validation, mandatory human review, and contextual security scanning rather than on further vendor capability improvements alone.
  • 2026-Jan: Agentic CI/CD and IaC generation reached major inflection point with simultaneous platform GA announcements from all major vendors. AWS Amazon Q Developer and GitLab Duo Agent Platform both GA'd with explicit agentic workflows for pipeline configuration, infrastructure generation, and troubleshooting. However, simultaneous negative signals underscored persistence of adoption barriers: survey synthesis showed 90% developer AI adoption but 96% distrust of code accuracy, with 40-48% of AI-generated code containing security vulnerabilities. AWS executive analysis warned that AI assistants amplify organizational bottlenecks when delivery pipelines remain manual, with 77% of organizations still deploying once daily or less. Practitioner opinion shifted toward treating AI-generated IaC as intent description layer with embedded standards enforcement, rather than autonomous code generation. The critical pattern sharpened: vendor platforms had matured agentic capabilities significantly, but the fundamental tension persisted—AI could accelerate infrastructure generation at scale, yet success remained entirely dependent on organizational pipeline maturity, mandatory human review, and security-first governance rather than on further vendor feature velocity.
  • 2026-Feb: Platform capability expansion continued with GitHub Copilot enterprise usage metrics GA and Azure Boards integration with custom agents, signaling vendor maturity in agentic workflows. However, empirical evidence of adoption barriers intensified: CircleCI's analysis of 28M workflows showed main branch success rates dropped to 70.8% (5-year low) and recovery times rose 13%, revealing that AI code generation capacity exceeded pipeline integration capability. Developer trust collapsed further—Stack Overflow survey found only 29% trust AI code despite 84% adoption. Critical practitioner analysis (Signadot, byteiota) identified fundamental gaps: traditional CI/CD lacks integration testing for non-deterministic agents, and 96% of developers distrust AI accuracy while only 48% verify before committing, making verification the new bottleneck. The pattern clarified: capability expansion was real, but the limiting factor shifted from tooling maturity to organizational readiness—adoption succeeded where CI/CD practices already included strict integration testing, mandatory human review, and security scanning, and failed where governance was superficial.
  • 2026-Mar: Vendor platform GA announcements converged with high-profile governance failures. Pulumi Neo and Spacelift Intelligence (Intent) both reached GA for natural-language infrastructure provisioning; a Harness survey of 700 leaders found 69% of heavy AI users experience deployment rollbacks despite 45% faster deploy velocity. Amazon's internal Kiro AI agent caused multiple production outages (6.3M lost orders on March 5) due to agentic autonomy bypassing human approval gates; Harper Foley documented ten infrastructure-destruction incidents across six AI tools with no vendor postmortems or liability frameworks. On the generation quality side, AquilaX analysis found AI systematically misconfigures IaC security (78% of S3 buckets unprotected, 71% IAM wildcards, 69% unencrypted EBS); a LinearB analysis of 8.1M PRs showed AI-generated code waits 4.6x longer for review and achieves only 32.7% acceptance, with CI/CD review consuming 57% of cycle time. The month crystallized a dual signal: agentic infrastructure generation is reaching platform maturity, but governance and review infrastructure remain the binding constraint for safe production adoption.
  • 2026-Apr: Deployment progress widened alongside ROI and maturity concerns. New multi-agent case studies emerged: InfraSquad (LangGraph-based Terraform generation with security loops and CIDR sanitization) and Classmethod's production-adjacent CloudWatch automation using Claude Code and HashiCorp Agent Skills, demonstrating agent error recovery capabilities. Ecosystem maturity solidified with HashiCorp Agent Skills, antonbabenko's terraform-skill, and AWS agent plugins becoming de facto standards. Pulumi Neo's official agentic IaC documentation confirmed GA status — natural language to production infrastructure code operating within policy enforcement and mandatory PR review gates. However, ROI barriers intensified: Gartner's survey of 782 I&O professionals found only 28% of AI infrastructure projects achieve full ROI, with 20% failing outright. A security audit of 200+ codebases found 73% contain vulnerabilities automated scanners miss (hardcoded secrets, deprecated patterns, hallucinated functions, authorization flaws, fabricated packages), while JetBrains research confirmed a 73% CI/CD adoption gap — pipelines' requirement for deterministic, reproducible outputs creating fundamental tension with non-deterministic AI generation. Adoption metrics showed AI generating 42% of committed code with 18% faster cycle times, but with quality trade-offs: 1.7× more issues, 3× higher readability problems, 2.74× more security vulnerabilities. The phase matured toward pragmatic integration: practitioner guidance shifted to treating AI-generated IaC as junior engineer output requiring validation gates, secure-by-design patterns, and policy enforcement rather than autonomous generation.
  • 2026-May: Practitioner case studies confirmed multi-agent CI/CD architectures deliver quantified gains (93% deployment time reduction, 92% fewer failed deploys) when guardrails are in place, while a 30-day production experiment without guardrails produced only 62% success and major incidents including database corruption and unauthorized IAM escalation. Security evidence hardened: IOActive's evaluation of 27 AI models on infrastructure code found 70–97% vulnerability rates for Terraform, Dockerfiles, and CI/CD pipelines, and Wiz analysis of hundreds of thousands of cloud environments found 20% of AI-powered development organizations experienced systemic security issues from repeated generation patterns. Platform-level supply-chain exposure persisted as the defining new risk: April 2026 incidents (Vercel OAuth breach, SAP npm token exposure via AI-generated configs) confirmed the AI tooling layer itself remains outside traditional audit frameworks. A new peer-reviewed attack vector — Semantic Compliance Hijacking — achieved a 77.67% breach success rate against agentic CI/CD systems with 0% detection by static scanners, and the TrapDoor supply-chain attack (May 2026) compromised 34 packages via invisible Unicode in .cursorrules files, reinforcing that prompt-based guardrails are unenforceable and only infrastructure isolation prevents agent-based attacks. Enterprise survey data (213 technology leaders) confirmed 81% report production failures from AI-generated code, 70% now identify test maintenance as a bigger burden than writing code, and 54% have increased CI/CD spend to compensate.
  • 2026-Jun (current): Adoption at scale now confirmed with quantified trajectory: Pulumi reports 20% of infrastructure deployments AI-driven as of May 2026, expected to exceed 50% by year-end, with named production deployments (CTS multi-cloud Terraform, TD Bank Ansible automation, EY Terraform modernization) demonstrating both delivery and safety patterns. However, capability asymmetry crystallized: Amazon Science's SWE-InfraBench (June 2026) benchmarks LLM performance on realistic incremental AWS CDK modifications—best model (Sonnet 3.7) succeeds on only 34% of tasks, with specialized reasoning models (DeepSeek R1) at 24%, establishing that autonomous incremental modification remains immature. Governance readiness now clearly differentiates outcomes: organizations with strong pre-AI CI/CD discipline (mandatory review, security scanning, policy gates) deploy safely; those with superficial governance accumulate risk. CloudBees independent research (200+ enterprise leaders) confirms 81% report production failures from AI-generated code despite 92% confidence—the maturity gap between vendor platform capability and organizational governance remains the primary constraint on advancing to broader adoption.