The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail
AI that assists penetration testers by suggesting attack vectors, automating reconnaissance, and identifying exploitation paths. Includes AI-guided vulnerability exploitation and attack chain planning; distinct from vulnerability scanning which identifies weaknesses without attempting exploitation.
AI-assisted penetration testing has crossed into mainstream operational practice, moving from point-in-time engagements into continuous, agentic validation architectures. Frontier LLMs (Gemini 3 Pro, Claude Opus 4.5) achieve ~70% autonomous exploitation success on diverse targets, with peer-reviewed research isolating specific capability boundaries: exploitation reaches 90% with ground-truth reconnaissance, but autonomous reconnaissance plateaus at 50%, limiting end-to-end autonomy. Market adoption signals are unambiguous: 87% of security leaders actively planning/piloting agentic AI pentesting, 95% expect displacement of traditional manual services, and YesWeHack's June 2026 launch shows enterprise customers (Dassault Systèmes, Sanofi, multiple CAC 40 firms) in production with same-day autonomous testing. The structural tension is not whether AI adds value but where the autonomy boundary lies and how to embed it sustainably. Full end-to-end automation without human validation remains infeasible: detection capability now outpaces organizational remediation velocity (AI findings resolve at 38.4% versus 77.3% for traditional vulnerabilities), and reconnaissance gaps require hybrid architectures. The practice's maturity inflection is evident but constrained by a critical gap: security tooling has accelerated beyond organizational capacity to remediate AI-discovered findings. Production success requires human-in-the-loop orchestration, continuous revalidation, and treated deployment as a governance problem, not a tooling problem.
The vendor ecosystem has consolidated around established platforms shipping production autonomous pentesting. Pentera (938+ enterprise customers, Gartner Representative Vendor, 525-600% documented ROI) expanded in June 2026 with MCP (Model Context Protocol) server enabling AI agent orchestration to trigger pentesting directly in SecOps workflows. YesWeHack launched Agentic Pentest (June 2026 GA) with same-day autonomous testing already deployed to Dassault Systèmes, Sanofi, and multiple CAC 40 companies. RidgeBot v7.0 (AWS and Azure Marketplaces) added Windows Active Directory autonomous compromise simulation; AWS Security Agent GA (31 March 2026, $50/task-hour) extended in May 2026 to repository code review and June to threat modeling. CyCognito expanded continuous AI pentesting to 60+ AI infrastructure categories (MCP servers, RAG systems, Ollama, MLflow), documenting attack chains across AI tools and physical security systems. FireCompass deployed to Fortune 500 technology firm with 11x cost reduction ($5K→<$1K per app), 2+ weeks compressed to 1 day, and coverage expansion 10%→99%.
Frontier LLM capability has matured measurably. Peer-reviewed benchmarking shows Gemini 3 Pro and Claude Opus 4.5 achieving ~70% autonomous exploitation success on diverse 300-server environments; however, empirical decoupling of reconnaissance from exploitation reveals the hard constraint: with ground-truth vulnerability context, agents reach 90% exploitation success, but autonomous reconnaissance alone plateaus at 50% due to telemetry parsing and tool-output interpretation failures. Stanford research documents 80% of human testers finding critical RCEs missed by all tested AI agents, illustrating capability boundaries in novel contexts. Six-layer governance framework (ownership validation, network-level scoping, isolation, validation, observability, data residency) has emerged as production requirement, not guideline, reflected in Cloud Security Alliance 2026 agentic pentesting best practices.
Structural remediation gap has emerged as the limiting factor in adoption. Cobalt's PTaaS data from thousands of engagements reveals a 2:1 remediation deficit: AI/LLM vulnerability resolution at 38.4% versus 77.3% for traditional web vulnerabilities, indicating that detection at scale now outpaces organizational capacity to remediate AI-specific findings. Large-scale deployment data (6.8M findings across 1,000+ organizations) shows cloud vulnerability growth at 44x versus testing coverage growth at 1.23x, creating structural supply-demand imbalance. OWASP Autonomous Penetration Testing Standard (APTS v0.1.0) codifies four autonomy levels with explicit human-oversight requirements, signaling industry consensus that full autonomy remains infeasible. The practice's maturation is evidenced not by capabilities (which have crossed into production effectiveness) but by recognition that autonomous pentesting is a governance and orchestration problem, not a raw capability problem.
— YesWeHack Agentic Pentest GA launch with named enterprise customers (Dassault Systèmes, Sanofi, multiple CAC 40 companies) delivering same-day autonomous testing across web, mobile, APIs with zero-false-positive triage option and EU/APAC region support.
— Large-scale Cobalt PTaaS remediation data reveals critical adoption barrier: AI/LLM vulnerability resolution at 38.4% versus 77.3% for APIs—a 2:1 deficit indicating detection capability outpaces organizational remediation capacity despite tool maturity.
— Empirical two-stage evaluation framework isolates exploitation success (90% with ground-truth context) from autonomous reconnaissance (50% success), identifying telemetry parsing and tool-output interpretation as critical bottlenecks limiting end-to-end autonomy.
— Practitioner analysis of three AI pentesting market segments (autonomous platforms, AI-native web testers, BAS) with critical assessment: Stanford study shows 80% of human testers found critical RCE missed by all tested AI agents, underscoring hybrid human-AI model necessity.
— Fortune 500 technology company deployment: cost reduced 11x ($5K→<$1K per app), lead time compressed from 2+ weeks to 1 day, coverage expanded 10%→99%; demonstrates quantified ROI of continuous autonomous pentesting at scale with <2% false positive rate.
— Structured vendor analysis (Simbian, XBOW, Horizon3, Pentera, Sprocket, BreachLock, NetSPI, Bishop Fox, Praetorian, Synack) evaluated on autonomy depth, surface breadth, reasoning transparency, cadence, pricing, and closed-loop defense integration—mapping market consolidation and adoption drivers.
— CSA/Synack governance framework for agentic pentesting identifying six technical requirements (ownership validation, network-level scoping, isolation, validation, observability, data residency) and organizational guardrails reflecting maturity of human-in-the-loop production deployment patterns.
— CyCognito continuous AI pentesting expansion to AI-native infrastructure (60+ model categories: MCP, RAG, Ollama, MLflow) with documented attack chains showing exposure across AI tools, security systems, and physical infrastructure—evidence of practice expanding beyond traditional network pentesting.