Perly Consulting │ Beck Eco

The State of Play

A living index of AI adoption across industries — where established practice meets the bleeding edge
UPDATED DAILY

The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.

The Daily Dispatch

A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.

AI Maturity by Domain

Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail

DOMAIN
BLEEDING EDGEESTABLISHED

Exception handling & escalation routing

LEADING EDGE

TRAJECTORY

Stalled

AI that handles process exceptions by classifying the exception type, attempting resolution, or routing to the right human. Includes exception pattern recognition and automated resolution attempts; distinct from ticket routing which classifies incoming requests rather than process failures.

OVERVIEW

AI-driven exception handling and escalation routing has proven its value at forward-leaning enterprises but remains far from mainstream adoption. The practice — using AI to detect process anomalies, classify exception types, and either resolve them automatically or route to the right human — delivers measurable ROI in well-scoped domains like IT incident triage, accounts payable, and customer support. Leading deployments report 40-60% reductions in resolution time and significant cost savings. Yet the field has settled into a durable equilibrium rather than progressing toward full autonomy. A tiered model has emerged: routine exceptions are highly automatable, complex cases require AI-human collaboration, and high-stakes decisions remain human-led. The binding constraint is no longer technical capability but organizational readiness — governance gaps, data quality issues, and reliability assurance keep most organisations on the sideline. The promise of autonomous escalation remains exactly that.

CURRENT LANDSCAPE

As of May 2026, 72% of Global 2000 companies operate AI agents in production with escalation routing embedded as standard. SaaS support ticket routing automation delivers 13.3x first-year ROI ($7.60 per $1 invested) with 83% misrouting elimination and 80% resolution time reduction across thousands of deployments. ServiceNow continues dominating IT operations: Microsoft (170,000+ employees, 3,000 daily tickets), Vodafone (40% improvement), HSBC (80% automation); ITOM automates 65-75% of routine exceptions and cuts MTTR by 40-60%. Named customer deployments show concreteness: Bank of America's Erica handles 58M conversations monthly with full context transfer; Klarna cut resolution from 11 to 2 minutes (800 FTE equivalent); accounts payable platforms resolve 95% of invoice exceptions automatically with role-based escalation for the remainder. Fintech workflows reduce exception lookup from 90+ seconds to 5-10 seconds, saving agents 15-20 hours weekly. Platform evolution now distinguishes escalation as intentional governance rather than failure: Zendesk updated AI agent reporting (May 2026) to measure Contained resolutions (no escalation needed) separately from Verified resolutions (escalation with human confirmation), reflecting industry-wide shift toward treating correct escalation as a design success. Appian and UiPath document exception classification and routing as standard platform capabilities; production architectures use confidence thresholds (e.g., <0.75 = manual handling, 0.75-0.85 = agent suggestion, ≥0.85 = auto-reply) to deterministically route decisions.

Yet a critical adoption wall persists. Only 14% of pilot programs (DigitalApplied survey of 650 VP-level leaders) advance to production; 78% have pilots but most stall. Five failure causes dominate (89% of cases): integration complexity with legacy systems, output degradation on edge cases, absent production monitoring, unclear organizational ownership, insufficient domain data. The economic wins cluster at mature organizations with clean data pipelines and governance readiness; most enterprises lack these conditions. Druid AI's production telemetry (15 months across 4 industries, Jan 2025–Mar 2026) validates that escalation quality is a governance discipline, not a capability gap—escalating appropriately matters far more than achieving high containment rates. The binding constraint remains organizational readiness, governance maturity, and escalation handoff design—not technical capability. Advanced implementations treat escalation rules as reusable, versioned, observable AI skills with policy thresholds; basic pilots still lack audit trails, context transfer, or fallback paths. The practice has proven its value at leading companies but plateaued at the pilot-to-production boundary for the broader market.

TIER HISTORY

ResearchJan-2020 → Jan-2021
Bleeding EdgeJan-2021 → Jan-2022
Leading EdgeJan-2022 → present

EVIDENCE (123)

— Industry guide defining incident response automation: threat detection→intelligent routing→automated response. Cites $300K/hour outage cost and hours-long MTTR under manual processes, positioning automation with proper escalation as mandatory for modern operational scale.

— Vision Language Model false positive rates 28-49% (precision 0.51-0.72) in safety-critical exception detection. Real deployment: TV fireplace mistaken for fire triggering false 911. Demonstrates context-aware exception classification as central challenge.

— Major vendor (Dynatrace) knowledge base: autonomous operations and agentic AI facilitate shift where issues are detected, acknowledged, and recovered without human in loop. Discusses when escalation to humans is needed; positions agentic AI as production capability.

How Enterprises Cut MTTR in HalfCase Studies

— Life sciences org reduced MTTR by 50% via AI automation of 60% of change processes. Platform implements intelligent noise suppression, cross-system event grouping, context enrichment, and workflow automation—eliminating first 10-15 min of manual incident triage.

— GA security product (June 2026) addressing AI agent failure recovery: Agent Rewind reverses unintended actions; SAGE engine enforces real-time governance; Exception recovery and unauthorized-action reversal for production code-deployment agents.

— Production patterns for exception handling in agents: @retry decorator with configurable backoff, TimeoutPolicy, ErrorHandler nodes. Addresses critical operational gap between state persistence and active recovery for thousands of daily invocations.

— Empirical study of 22 production LLM agent incidents deriving five-class failure taxonomy; 70% of silent failures caught by human observation (not automated tests); defense framework includes declarative governance and monitoring-that-monitors-itself.

— SQM Group longitudinal research: AI-assisted agents improve FCR by 15-25% over unassisted agents. Every 1% FCR improvement reduces costs 1%. IT help desk FCR at top performers 82-88%, tier-1 escalation rate 26%, knowledge-base correlation +8-12pp.

HISTORY

  • 2020: SRE teams demonstrated measurable impact from intelligent alert routing and escalation policies (67% incident reduction). Vendors (ServiceNow, IBM) began marketing AI-driven exception detection as part of broader AIOps platforms, but adoption remained research-stage outside specialized IT operations teams.
  • 2021: ServiceNow achieved a 75% reduction in incident resolution time using ML-driven exception handling and routing. RPA platforms (IBM, UiPath) invested in education and tooling maturity. However, critical gaps in governance and exception-handling strategy were identified as adoption barriers in RPA deployments. Most production use remained concentrated in IT/DevOps contexts.
  • 2022-H1: Exception handling and routing moved decisively into production across IT services, public sector, and global enterprises. ServiceNow deployments showed dramatic results: 35–50% faster case resolution in IT services, 97% faster inquiry assignment in government (8 minutes vs. 36 hours), and 80–90% reduction in escalation failures in enterprise operations. Global AI adoption reached 35% of companies, with automation and process optimization as top use cases. However, real-world evidence revealed a critical limitation: deployed AI-driven alerting systems degraded over time due to data drift, highlighting that successful exception handling requires continuous monitoring and governance.
  • 2022-H2: ServiceNow continued advancing AI capabilities for exception handling, releasing specific features for intelligent incident routing: Similar Alerts/Incidents detection to provide context, Alert Clustering to identify automation opportunities, and Automated Text-Based Grouping Rules to reduce noise. These developments reinforced the shift toward autonomous exception classification and routing while maintaining human-in-the-loop decision-making for high-consequence scenarios.
  • 2023-H1: Exception handling platforms consolidated around ServiceNow and enterprise AIOps suites, with Forrester naming ServiceNow a Leader in process-centric AIOps (Q2 2023). Voice and contact center routing systems demonstrated intelligent escalation patterns, routing complex cases to humans with full context. However, emerging evidence exposed critical limitations: LLM-based systems (ChatGPT) struggled with constraint-aware escalation, raising questions about generative AI's role in exception handling beyond code-level automation.
  • 2023-H2: Deployments of intelligent exception handling expanded further, with ServiceNow and IBM announcing enhanced capabilities for automating incident detection and routing. However, 2023 brought heightened scrutiny of AI reliability: industry surveys showed 36% of AI projects failed entirely, and high-stakes domains (legal, medical) rejected AI-generated outputs due to accuracy concerns. This underscored a core reality of exception handling — while automation could handle routine cases, complex and high-consequence exceptions remained routed to humans by necessity, not preference.
  • 2024-Q1: Vendor ecosystem continued maturing with new AI-driven tooling for exception escalation (SiteGPT GA support escalation). However, critical research emerging from NeurIPS 2023 (published Feb 2024) demonstrated that LLMs systematically escalate conflicts in simulations—all tested models (GPT-4, GPT-3.5, Claude 2, Llama-2) showed escalation tendencies, with some deploying nuclear weapons in wargame scenarios. This raised fundamental questions about LLM suitability for high-stakes exception routing decisions, emphasizing that the exception handling maturity curve remained constrained by AI model reliability limitations.
  • 2024-Q2: Critical governance and adoption barriers became visible. Legal analysis warned of chatbot liability and escalation failure accountability. Systematic evidence emerged: 95% of AI pilots fail, primarily due to organizational readiness and integration complexity rather than technical gaps. Pilot-to-production failures exposed critical design flaws—missing fallback paths, poor human handoff design, and demo-to-reality gaps in handling exceptions at scale. This window marked recognition that exception handling had plateaued at 2-tier automation (routine vs. high-stakes), with governance and organizational maturity as the primary adoption constraints going forward.
  • 2024-Q3: Vendor investment in exception handling continued (IBM Sterling OMS automation enhancements), but critical research highlighted fundamental barriers to further autonomous escalation. Ada Lovelace Institute study exposed systematic gaps in AI safety evaluation standards, revealing that current benchmarks are non-exhaustive, easily gamed, and may not predict real-world exception handling reliability. This reinforced the core constraint: exception handling remained stuck at 2-tier deployments because AI reliability assurance was insufficient to justify moving high-stakes exceptions beyond human judgment.
  • 2024-Q4: Reliability failures across AI systems became undeniable. OpenAI research confirmed generative AI systems systematically overstate knowledge; ChatGPT achieved only 49% accuracy in medical diagnoses; legal sector observed 37% reliability concerns and 43% bias in deployed AI tools. BCG data revealed 74% of companies struggle to realize AI value. PagerDuty's survey showed 16% incident increase in enterprises racing to adopt AI—paradoxically, deployment increased operational risk. These findings crystallized the maturity plateau: exception handling remained a 2-tier practice because autonomous AI cannot be trusted for high-stakes decisions. Infrastructure design advocates urged standard patterns (gateways, circuit breakers) to manage AI unreliability systematically. The limiting factor shifted from "technical capability" to "organizational readiness and reliability assurance."
  • 2025-Q1: ServiceNow strengthened market position with named customer wins: Vodafone 40% ticket resolution improvement, HSBC 80% automation, American Express and Bank of America on AI-driven routing. Technical practice proved at scale (40-60% faster resolution, 20-25% cost reduction). However, enterprise adoption hit an organizational wall: 88% of AI pilots fail to reach production; 75% of enterprises lack AI roadmap; only 15% are "AI-reinvention ready." Governance gaps and lack of internal expertise (not technical constraints) emerged as the dominant limiting factors. Exception handling demonstrated product-market fit at leading companies, but mainstream organizational adoption remained constrained by capability and governance maturity.
  • 2025-Q2: AssemblyAI achieved 97% first-response-time reduction using AI agents with runbook-based escalation; PagerDuty data showed 51% of enterprises had deployed agentic AI with 86% expecting operational deployment by 2027. Vendors advanced: ServiceNow GA'd structured exception approval workflows; Manhattan Associates automated order exception resolution. However, infrastructure reliability became undeniable constraint: 73% of AI agent deployments failed to meet reliability expectations within first year; 67% of production RAG systems degraded within 90 days. Factual accuracy in LLM systems (51%+ error rate) raised fundamental questions about autonomous high-stakes escalation, widening gap between pilot wins and production sustainability.
  • 2025-Q3: Production deployments matured across domains: Medius documented 24-hour-to-2-hour AP exception resolution using agentic AI; Netguru analysis showed 50% MTTR improvement in incident response with $30.4M→$16.8M annual cost reductions. Customer support escalation strategies (Gnani.ai) demonstrated 95% routine query automation with intelligent handoff. Finance automation market reached $30.2B valuation trajectory by 2030. However, the practice consolidated around a 2-3 tier exception model (routine, complex, high-stakes) rather than advancing to full autonomy—defensive architectures (circuit breakers, context preservation, failure classification) became standard patterns for managing AI agent unreliability, indicating that systematic error-handling design remained the limiting factor rather than feature availability.
  • 2025-Q4: Enterprise adoption accelerated at scale with concurrent evidence of maturation and deployment barriers. Microsoft's internal ServiceNow deployment (170K+ employees, 3,000+ daily tickets) demonstrated predictive intelligence for incident routing at leading-company scale. ServiceNow's own internal customer success deployment (89% self-service, 37% case workflow automation) and Pedowitz Group case study (88% escalation prediction accuracy, 45% escalation reduction) showed concrete ROI from intelligent routing. However, critical data emerged on implementation gaps: Wharton survey showed 82% of enterprise leaders use Gen AI weekly with 72% formally measuring ROI, indicating mainstream adoption but also accountability focus. OpenAI Enterprise metrics revealed 8x growth in ChatGPT Enterprise usage but highlighted a "frontier gap" where some teams operationalize effectively while others lack instrumentation. MIT research starkly warned that 95% of enterprise AI projects fail to deliver measurable ROI, with 73% citing data quality as primary barrier and Zillow's $500M loss exemplifying risks of unconstrained exception handling at scale. The window reinforced the core equilibrium: exception handling had achieved production credibility and demonstrable ROI at leading organizations, but enterprise-wide adoption remained constrained by data quality, integration complexity, and organizational readiness—not by feature availability or capability limitations.
  • 2026-Feb: Production exception handling frameworks reached inflection point between capability and governance maturity. ServiceNow ITOM deployments achieved 40-60% MTTR reductions with 65-75% of routine tickets automated, confirming continued technical progress. However, critical evidence reinforced systemic limitations constraining broader adoption: Harvard research proposed new AI reliability evaluation frameworks, highlighting that current assessment methods inadequately capture operational dependability; transformer-based LLMs face mathematical barriers to complex task handling with OpenAI admitting accuracy will never reach 100%; practitioner analysis identified 19 specific AIOps implementation failure modes (event noise, correlation errors, false positives) requiring careful tuning rather than automatic magic. Enterprise adoption data showed org-wide AI use doubled to 40% but only 18% tracked ROI, revealing a widening gap between deployment momentum and accountability measurement. The window evidenced transition from "capability is the constraint" to "governance, reliability assurance, and organizational readiness are the binding constraints" — exception handling could deliver ROI at Fortune 500 scale but increasingly faced questions about risk management and escalation governance when AI systems were entrusted with decision-making authority.
  • 2026-Mar: Deployment evidence and architectural clarity advance in parallel. A fintech processing 40K monthly transactions cut exception lookup time from 90-120 seconds to 5-10 seconds using MCP workflow automation, saving agents 15-20 hours weekly; AP automation platforms now resolve 95% of invoice exceptions automatically with role-based escalation for the residual 5%; and 72% of Global 2000 companies operate AI agents in production with escalation routing embedded as a standard pattern. The architectural distinction between retrieval-based and reasoning-first escalation systems sharpens — reasoning-first designs generate audit trails required in regulated industries, while retrieval-based systems remain hallucination-prone in high-stakes routing contexts. The tiered model (routine automation, AI-human collaboration, human-led high-stakes) is consolidating as the durable production pattern rather than giving way to full autonomy.
  • 2026-Apr: ROI evidence crystallizes but adoption barrier persists. SaaS support ticket routing automation demonstrates 13.3x first-year ROI with 83% misrouting elimination and 80% MTTR reduction across real deployments. Leya AI achieves 80%+ autonomous resolution (800+ of 1,000 monthly conversations without escalation). ServiceNow's Flow Designer embeds AI agents in escalation workflows directly. Anthropic released customer-escalation skill for automated brief generation. However, DigitalApplied survey of 650 VP-level leaders reveals the adoption chasm: 78% have AI agent pilots but only 14% reached production scale. Five root causes (89% of failures): integration complexity, edge-case output degradation, missing monitoring infrastructure, unclear ownership, insufficient training data. NimbleBrain documents 85-95% pilot failure rate with escalation governance (missing audit trails, approval chain awareness) blocking production. The window confirms exception handling has achieved product-market fit at leading companies and demonstrated ROI, but remains a leading-edge practice constrained by organizational readiness and governance maturity rather than technical capability.
  • 2026-May: Operational and architectural clarity strengthens evidence base. Practitioner guidance (Sergei P., AI Business) emphasizes deliberate escalation design with pre-launch boundaries and weekly refinement; mature teams move AI to routine work and humans to judgment-intensive exceptions. Platform implementations crystallize the pattern: UiPath and Appian document production exception classification routing (business vs. application exceptions with different retry policies); fintech deployments achieve 50-80% autonomous resolution with escalation for compliance cases (Magic Eden, Step, Airwallex). Failure evidence surfaces: documented $2M production incident (Cursor AI) when escalation logic failed; 60% of resolved AI tickets reopen within 48h in standard implementations. Stanford research validates escalation-based architectures: 51-company study shows 71% median productivity gains from 80% autonomous + human exceptions vs. approval-first models (30% gains). The window underscores both the proven ROI at leading companies and the systematic operational requirements: escalation trustworthiness depends on deterministic workflows, audit trails, and complete context transfer at handoff points—not on feature availability or raw automation percentage.
  • 2026-Jun: Governance and decision-boundary frameworks advance. SmartScope and Appian publish frameworks explicitly separating AI decision levels (candidate generation vs. recommendation vs. policy execution) with exception flags routing appropriate cases to humans—moving escalation from safety mechanism to intentional governance design. Upper Silesia accounting firm case study demonstrates 70% autonomous resolution with 30% escalation achieving €120K annual savings (8→2 FTEs). Production data reveals exception-handling patterns: support agent architectures across 40+ deployments show 78% autonomous resolution with <0.85 confidence thresholds triggering escalation; real estate and procurement domains document variance detection with human-readable exception explanations. Cross-domain analysis identifies the persistent pattern: automation succeeds on exception-light processes (invoicing, triage) with 60-80% time reductions, but fails on exception-heavy work (contract review, complex escalations) due to confident drift and hallucination.

Mid-June window (2026-06-07 to 2026-06-21) research surfaces critical reliability findings: Wei Wu's empirical study of 22 production LLM agent incidents reveals that 70% of silent failures (where systems deliver fluent but false narratives to users) are caught only by human observation, not automated tests, underscoring governance as the first-class control layer. SQM Group longitudinal benchmarking shows AI-assisted agents improve first-contact resolution by 15-25% versus unassisted agents, with IT help desk top performers achieving 82-88% FCR and 26% tier-1 escalation rates. Production incident automation (AiFA Labs) demonstrates 50% MTTR reduction through intelligent noise suppression and cross-system event grouping that eliminates the first 10-15 minutes of manual triage. Rubrik's June 2026 GA release of Agent Cloud—a security product for production code-deployment agents—introduces specialized infrastructure for exception recovery (Agent Rewind) and unauthorized-action reversal, indicating market demand for governance tooling when agents fail. Framework research (LangGraph, CallSphere) distinguishes retry strategies by exception class: transient errors use exponential backoff, permanent errors escalate immediately, and unknown failures trigger circuit breakers. Negative signals persist: Vision Language Model safety systems show 28-49% false positive rates in emergency detection, and distributed system failures (false positives cascading into alert fatigue) erode trust and damage automation credibility.

The evidence reinforces the core equilibrium: exception handling has proven product-market fit at leading companies with deterministic frameworks and governance discipline, while broader adoption remains constrained by organizational readiness and the need for explicit escalation ownership—technical capability no longer the limiting factor. Critical tension: systems that deliver fluent false narratives (confident hallucinations) are worse than transparent failures, requiring governance infrastructure that detects when AI is confabulating rather than merely mistaken.