The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail
AI that handles process exceptions by classifying the exception type, attempting resolution, or routing to the right human. Includes exception pattern recognition and automated resolution attempts; distinct from ticket routing which classifies incoming requests rather than process failures.
AI-driven exception handling and escalation routing has proven its value at forward-leaning enterprises but remains far from mainstream adoption. The practice — using AI to detect process anomalies, classify exception types, and either resolve them automatically or route to the right human — delivers measurable ROI in well-scoped domains like IT incident triage, accounts payable, and customer support. Leading deployments report 40-60% reductions in resolution time and significant cost savings. Yet the field has settled into a durable equilibrium rather than progressing toward full autonomy. A tiered model has emerged: routine exceptions are highly automatable, complex cases require AI-human collaboration, and high-stakes decisions remain human-led. The binding constraint is no longer technical capability but organizational readiness — governance gaps, data quality issues, and reliability assurance keep most organisations on the sideline. The promise of autonomous escalation remains exactly that.
As of May 2026, 72% of Global 2000 companies operate AI agents in production with escalation routing embedded as standard. SaaS support ticket routing automation delivers 13.3x first-year ROI ($7.60 per $1 invested) with 83% misrouting elimination and 80% resolution time reduction across thousands of deployments. ServiceNow continues dominating IT operations: Microsoft (170,000+ employees, 3,000 daily tickets), Vodafone (40% improvement), HSBC (80% automation); ITOM automates 65-75% of routine exceptions and cuts MTTR by 40-60%. Named customer deployments show concreteness: Bank of America's Erica handles 58M conversations monthly with full context transfer; Klarna cut resolution from 11 to 2 minutes (800 FTE equivalent); accounts payable platforms resolve 95% of invoice exceptions automatically with role-based escalation for the remainder. Fintech workflows reduce exception lookup from 90+ seconds to 5-10 seconds, saving agents 15-20 hours weekly. Platform evolution now distinguishes escalation as intentional governance rather than failure: Zendesk updated AI agent reporting (May 2026) to measure Contained resolutions (no escalation needed) separately from Verified resolutions (escalation with human confirmation), reflecting industry-wide shift toward treating correct escalation as a design success. Appian and UiPath document exception classification and routing as standard platform capabilities; production architectures use confidence thresholds (e.g., <0.75 = manual handling, 0.75-0.85 = agent suggestion, ≥0.85 = auto-reply) to deterministically route decisions.
Yet a critical adoption wall persists. Only 14% of pilot programs (DigitalApplied survey of 650 VP-level leaders) advance to production; 78% have pilots but most stall. Five failure causes dominate (89% of cases): integration complexity with legacy systems, output degradation on edge cases, absent production monitoring, unclear organizational ownership, insufficient domain data. The economic wins cluster at mature organizations with clean data pipelines and governance readiness; most enterprises lack these conditions. Druid AI's production telemetry (15 months across 4 industries, Jan 2025–Mar 2026) validates that escalation quality is a governance discipline, not a capability gap—escalating appropriately matters far more than achieving high containment rates. The binding constraint remains organizational readiness, governance maturity, and escalation handoff design—not technical capability. Advanced implementations treat escalation rules as reusable, versioned, observable AI skills with policy thresholds; basic pilots still lack audit trails, context transfer, or fallback paths. The practice has proven its value at leading companies but plateaued at the pilot-to-production boundary for the broader market.
— Industry guide defining incident response automation: threat detection→intelligent routing→automated response. Cites $300K/hour outage cost and hours-long MTTR under manual processes, positioning automation with proper escalation as mandatory for modern operational scale.
— Vision Language Model false positive rates 28-49% (precision 0.51-0.72) in safety-critical exception detection. Real deployment: TV fireplace mistaken for fire triggering false 911. Demonstrates context-aware exception classification as central challenge.
— Major vendor (Dynatrace) knowledge base: autonomous operations and agentic AI facilitate shift where issues are detected, acknowledged, and recovered without human in loop. Discusses when escalation to humans is needed; positions agentic AI as production capability.
— Life sciences org reduced MTTR by 50% via AI automation of 60% of change processes. Platform implements intelligent noise suppression, cross-system event grouping, context enrichment, and workflow automation—eliminating first 10-15 min of manual incident triage.
— GA security product (June 2026) addressing AI agent failure recovery: Agent Rewind reverses unintended actions; SAGE engine enforces real-time governance; Exception recovery and unauthorized-action reversal for production code-deployment agents.
— Production patterns for exception handling in agents: @retry decorator with configurable backoff, TimeoutPolicy, ErrorHandler nodes. Addresses critical operational gap between state persistence and active recovery for thousands of daily invocations.
— Empirical study of 22 production LLM agent incidents deriving five-class failure taxonomy; 70% of silent failures caught by human observation (not automated tests); defense framework includes declarative governance and monitoring-that-monitors-itself.
— SQM Group longitudinal research: AI-assisted agents improve FCR by 15-25% over unassisted agents. Every 1% FCR improvement reduces costs 1%. IT help desk FCR at top performers 82-88%, tier-1 escalation rate 26%, knowledge-base correlation +8-12pp.
Mid-June window (2026-06-07 to 2026-06-21) research surfaces critical reliability findings: Wei Wu's empirical study of 22 production LLM agent incidents reveals that 70% of silent failures (where systems deliver fluent but false narratives to users) are caught only by human observation, not automated tests, underscoring governance as the first-class control layer. SQM Group longitudinal benchmarking shows AI-assisted agents improve first-contact resolution by 15-25% versus unassisted agents, with IT help desk top performers achieving 82-88% FCR and 26% tier-1 escalation rates. Production incident automation (AiFA Labs) demonstrates 50% MTTR reduction through intelligent noise suppression and cross-system event grouping that eliminates the first 10-15 minutes of manual triage. Rubrik's June 2026 GA release of Agent Cloud—a security product for production code-deployment agents—introduces specialized infrastructure for exception recovery (Agent Rewind) and unauthorized-action reversal, indicating market demand for governance tooling when agents fail. Framework research (LangGraph, CallSphere) distinguishes retry strategies by exception class: transient errors use exponential backoff, permanent errors escalate immediately, and unknown failures trigger circuit breakers. Negative signals persist: Vision Language Model safety systems show 28-49% false positive rates in emergency detection, and distributed system failures (false positives cascading into alert fatigue) erode trust and damage automation credibility.
The evidence reinforces the core equilibrium: exception handling has proven product-market fit at leading companies with deterministic frameworks and governance discipline, while broader adoption remains constrained by organizational readiness and the need for explicit escalation ownership—technical capability no longer the limiting factor. Critical tension: systems that deliver fluent false narratives (confident hallucinations) are worse than transparent failures, requiring governance infrastructure that detects when AI is confabulating rather than merely mistaken.