The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail
AI that assists penetration testers by suggesting attack vectors, automating reconnaissance, and identifying exploitation paths. Includes AI-guided vulnerability exploitation and attack chain planning; distinct from vulnerability scanning which identifies weaknesses without attempting exploitation.
AI-assisted penetration testing has crossed from experimental tooling into a proven practice with mainstream adoption: 87% of security leaders are actively planning, piloting, or using agentic AI for pentesting, and 95% expect it will displace traditional services. The category -- distinct from vulnerability scanning in that it actively plans and executes exploitation paths -- now supports human testers with automated reconnaissance, attack vector suggestion, and multi-step attack chain discovery across established platforms. Pentera serves 938+ enterprise customers; AWS launched Security Agent GA (31 March 2026) with context-aware multi-step agentic attacks; and XBOW demonstrated end-to-end autonomous pentesting at production scale (1,060 real vulnerabilities on HackerOne, 48-step exploit chains). Peer-reviewed research confirms both promise and limitations: Wiz Research documented AI reliably solving 9 of 10 realistic challenges with strong multi-step reasoning, while also showing blind spots in creative enumeration and business-logic exploitation. The defining tension is no longer whether AI adds value but where the autonomy boundary lies. Full end-to-end automation remains infeasible: false positive rates, business-logic complexity, and data sensitivity keep human-in-the-loop architectures as the production standard. The question facing security teams is not adoption but orchestration—integrating AI pentesting into continuous validation workflows without overestimating what automation can deliver.
The vendor ecosystem has consolidated around a handful of platforms with real production footprints. Pentera, now a Gartner Representative Vendor for Adversarial Exposure Validation, reports 938+ customers with geographic expansion into Asia-Pacific and analyst-documented ROI of 525-600%. RidgeBot has reached GA on both AWS and Azure Marketplaces; AWS Security Agent achieved GA (31 March 2026) with pricing at $50/task-hour and multicloud support, expanded in May 2026 to include full repository code review capability; and newer entrants like Synack's Sara and Aikido Attack ship autonomous pentesting with human-expert validation or integrated AI-driven remediation. Venture funding continues: Novee emerged from stealth with $51.5M Series B capital. Independent practitioners report production deployments with mixed signals: XBOW achieved #1 HackerOne leaderboard ranking with 1,060 fully autonomous vulnerabilities; Shannon AI pentester validated on SaaS production with 72% accuracy and honest assessment of business-logic gaps; Anthropic-Claude red team discovered 11 high-severity vulnerabilities in Mozilla Firefox.
Government and enterprise adoption confirms operational maturity. The NSA's rollout of NodeZero across 200 defense contractors yielded 20,000+ pentesting hours and 50,000 identified vulnerabilities (70% mitigated). Fortune 500 firms including Expedia, Mandiant, Deloitte, and KPMG have adopted platforms for continuous testing workflows. Synack + Omdia survey of 200 U.S. security leaders found 87% actively planning/piloting agentic AI, 95% expect displacement of traditional services, and 93% emphasize that comprehensive guardrails are critical.
Deployment barriers remain structural despite adoption momentum. EPAM's May 2026 hands-on evaluation of six commercial AI pentesting agents revealed a critical capability gap: AWS Security Agent (best performer) found only 35-38% of known vulnerabilities on realistic targets, while Shannon found 17-33%; practitioners consistently characterize current outputs as "expensive vulnerability scanning" rather than true pentesting. Architecture has emerged as the primary differentiator—Ken Huang's benchmark showed RidgeGen's 0% hallucination versus Shannon's 63% unconfirmed findings on identical applications, proving system design (belief state, validation pipelines, orchestration) matters more than underlying model capability. Practitioners have codified six safety requirements -- ownership validation, network-level scoping, isolation, validation, observability, data residency -- that current tools do not fully satisfy. Research into 28 LLM-based systems confirms a split between transient capability gaps and deeper complexity barriers around business logic and context that automation cannot yet overcome. Critical remediation gap: only 21.1% of serious AI/LLM pentest findings are resolved (versus 73.5% for web), indicating that detection at scale now outpaces organizational ability to fix AI-specific vulnerabilities. Human-in-the-loop architecture is not a transitional compromise; it is the operational standard, formalized by OWASP's May 2026 Autonomous Penetration Testing Standard (APTS) which codifies four autonomy levels with explicit human-oversight and governance requirements for production deployment.
— Synack launched Sara autonomous red agent for continuous vulnerability discovery with human expert validation reducing false positives and confirming exploitability; combines AI-driven reconnaissance with 1,500+ security researcher validation layer.
— AWS Security Agent added full repository code review capability, performing context-aware analysis of entire codebases and surfacing systemic vulnerabilities beyond pattern-matching scope; GA release extends autonomous pentesting to design-phase validation.
— Netragard critical assessment: AI pentesting relies on pre-existing tools and data and cannot think, adapt, or create novel attack paths; distinguishes human novelty discovery and contextualized threat intelligence from automated tool-based approaches.
— OWASP Autonomous Penetration Testing Standard (APTS) v0.1.0 published May 2026; 173 requirements across 8 governance domains with four autonomy levels; marks transition from research to operational deployment requiring formal assurance frameworks.
— Critical maturity signal: only 21.1% of serious AI/LLM pentest findings are resolved (vs 73.5% web, 75.5% API); global market $2.74B (2025)→$7.41B (2034) at 11.60% CAGR; 70%+ adoption of PTaaS; shows strong adoption but remediation gap for AI-specific findings.
— EPAM hands-on evaluation of six AI pentesting agents against realistic targets: AWS Security Agent found 35-38%, Shannon 17-33%, others found fewer; identifies three primary capability gaps (custom logic, multi-step exploits, real-world inconsistencies) contradicting vendor hype with concrete evidence.
— Ken Huang benchmark isolates architecture as differentiator: RidgeGen 0% hallucination rate vs Shannon 63% unconfirmed findings on identical Juice Shop target; system design (belief state, verification, orchestration) drives performance gap more than underlying model.
— Independent benchmark of five AI pentesting tools (Escape, Claude, Shannon, Strix, PentAGI) against 20-vulnerability web app; detection rates 1–9 vulnerabilities, shows tool orchestration matters more than model choice.