The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail
AI that generates unit, integration, or end-to-end tests from source code, requirements documents, or API specifications. Includes tools generating test suites from implementations, PRDs, and OpenAPI specs; distinct from adversarial test generation which targets fault discovery rather than coverage.
AI-assisted test generation uses machine learning to produce unit, integration, or end-to-end tests from source code, requirements, or API specifications. The practice is split in two, and the halves are moving in opposite directions. Specialized platforms like Diffblue Cover have secured real enterprise deployments -- financial services firms report coverage jumps from under 20% to over 80% on legacy codebases -- and Gartner now explicitly recommends AI-assisted test generation for safe legacy refactoring. Agentic testing, where autonomous agents discover flows, generate cases, and triage failures, has moved from concept to operational pilot. General-purpose AI assistants, however, remain stalled: 75% of organisations discuss AI testing but only 16% deploy it, developer trust sits at 3%, and practitioners report steep maintenance costs that erode initial productivity gains. The defining tension is whether the specialized vanguard can pull broader adoption forward, or whether the trust and governance deficits on the general-purpose side will keep most teams on the sidelines. For now, this remains a bleeding-edge practice: proven value exists, but only for organisations willing to invest in purpose-built tooling and formal guardrails.
Specialized platforms and autonomous agents are advancing toward governed production deployments. Diffblue Cover (Java market leader) maintains product velocity: Q1 2026 expanded Gradle 9.x, Scala, and merge-mode test maintenance, with financial services reporting 26x productivity over general-purpose AI and coverage jumps from under 20% to over 80% on legacy systems. Youzan, Ctrip, and China Unicom show adoption extending beyond Western finance into e-commerce and telecoms. Autonomous agents reached production maturity: TestMu/KaneAI platform processed 1.5B tests across 250K users with Boomi reporting 78% faster execution; Gartner Challenger and Forrester recognition signal analyst validation. Multi-agent architecture (planner, generator, runner, analyser) emerged as 2026 standard, with Gartner projecting 33% of applications running agentic AI by 2028. Domain-specific platforms (Panaya for SAP/ERP, CasePilot for Azure DevOps) with ISTQB methodologies show product maturity in enterprise toolchains. Agentic test generation achieved operational production in niche sectors: game studios generate from design docs in hours, OpenObserve scaled 380→700+ tests with 85% flaky reduction via Claude Code and systematic governance, Axelerant closed zero-coverage gaps across NextJS/Strapi/Magento in 48 hours. Market sizing: automation testing market $25.4B (2026)→$69.2B (2033); AI-enabled testing $3.6B→$6.9B (2036), unit testing 40% of segment driven by CI/CD acceleration and talent shortages.
General-purpose tools signal mainstream maturity but production deployment gaps persist. GitHub Copilot Workspace (March 2026) generated test suites at 85% average coverage; 94% of testers use AI, 45% specifically for test generation. Yet structural barriers remain critical: 75% of organizations discuss AI testing but only 16% deploy it, 60% lack code review processes for AI-generated tests, and empirical data reveals systematic failure modes. Quality gaps emerged sharply in April 2026: Lightrun SRE survey (200 leaders) showed 43% of AI-generated code still fails in production after QA, with 88% requiring 2–3 redeploy cycles per change; 49% failure rate on deployment and 15–18% more security vulnerabilities than human code. Practitioner research (SWE-bench Verified) documented AI-generated tests missing 62.5% of failure classes (cascade-blindness where tests miss related function impacts); TestSprite analysis (470 GitHub PRs) confirmed 1.7× more bugs, with security vulnerabilities (XSS 2.74× higher) and 45% containing OWASP Top 10 flaws. Maintenance costs materialized: $500–800/month token spend, 60% test redundancy, 4+ hours debugging per generation cycle. Governance gap: self-healing automation masks defects rather than exposing them; lack of business context leaves domain logic unvalidated (e.g., credit risk rules, regulatory compliance). Strategic finding: maximum value accrues to mechanical scaffolding (structure, boilerplate) while strategic decisions (what to test, priorities, design) require human judgment. Negative signal strength: testRigor documented four failure categories (business context gaps, historical-data over-reliance, self-healing masking, integration complexity) explain why adoption barriers are structural, not technical. Until governance infrastructure and quality-confidence baselines mature, most teams will continue rewriting AI-produced tests pre-production.
— Amazon.com March 2026 outage (6.3M lost orders, 99% marketplace downtime) traced to untested AI-generated code; Lightrun survey: 43% of AI code needs production debugging; 70% of orgs have AI vulnerabilities in production.
— Meta TestGen-LLM case study: 75% test acceptance rate and 10%+ coverage gains on Instagram/Facebook; RCT showing 19% performance slowdown despite developer perception of 20% speedup—revealing critical perception-reality gap.
— Independent Next.js SaaS deployment: 47 integration tests generated in 12 minutes vs 3 months manual; auto-repair of broken selectors on UI changes; identified locale-handling gaps requiring international team validation.
— Industrial evaluation of autonomous LLM-based test repair on 636 test cases: only 10% first-attempt success, 70% repair convergence at scenario-family level, 38% failed to produce executable artifacts; documents assertion weakening as workaround.
— Ministry of Testing identifies critical adoption barrier: AI test expansion creates CI/CD bottlenecks; case study of financial services firm with 36 microservices and 10k+ tests requiring days-long regression cycles—speed gains negated.
— Market projection: $11.99B (2026) → $39.43B (2031) at 26.88% CAGR; 61% of enterprises run AI test engines on every dev stage; AI contract-testing reduces microservice defect rates by 40% in production studies.
— Industry coverage of agentic testing evolution with Gartner projection of 33% agentic AI by 2028; documents multi-agent architecture (planner, generator, runner, analyser) emerging as standard.
— Azure DevOps extension with three-pass quality validation (Worker, Judge, Optimizer) implementing ISTQB techniques, showing product maturity in mainstream enterprise toolchains.