The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail
AI chatbots that independently resolve customer issues end-to-end including taking actions like refunds, changes, and escalations. Includes tool-using agents with system access; distinct from conversational chatbots which inform but don't act.
Autonomous resolution — AI agents handling customer issues end-to-end by taking actions like refunds, account changes, and escalations — has solidified into a production-proven practice across major vendors and enterprise deployments. Salesforce, Zendesk, Intercom, and Microsoft now ship GA autonomous agents with transparent pricing ($0.99-2.00/resolution) and published benchmarks of 66–84% autonomous resolution on production volume. Enterprise adoption has accelerated to 54% integrated deployment (up from 31% end-2025) and 80% L1/L2 query handling at leading organizations. The fundamental capability is proven and economically justified for well-scoped use cases. The central tension is no longer feasibility but execution maturity: evidence shows deployment outcomes cluster into three tiers — early implementations at 20-30% deflection, optimized operations at 40-60%, and best-in-class at 80%+ autonomous resolution. Yet a critical counterweight persists: empirical analysis documents that autonomous agents fail 70-95% of the time in production environments when reasoning tasks compound, while regulatory complexity (GDPR/CCPA data deletion, CMA agentic AI guidance, EU AI Act) creates compliance friction. For organizations implementing autonomous resolution, the value case for routine, high-confidence scenarios (password resets, order status, basic billing) is definitive. Scaling beyond that boundary demands knowledge base maturity, system integration depth, and operational governance that most deployments have not yet built.
The vendor market has consolidated into a mature ecosystem with multi-step agentic capabilities replacing simpler FAQ bots. Zendesk (May 2026 GA of email agents with procedure execution), Intercom Fin 3 (66% average, 82%+ in top deployments), Freshworks Freddy AI, Microsoft Dynamics 365, and new entrants like Decagon and Sierra ship autonomous agents with outcome-based pricing ($0.99-2/resolution) and published resolution benchmarks. Deployment evidence validates production maturity: Salesforce Agentforce handled 380,000+ customer support interactions with 84% autonomous resolution and 2% escalation; Klarna maintains 2/3 automation after 2024 rebalancing with 82% resolution improvement and 40% cost reduction; Zendesk processed 100,000+ monthly questions at 80% automation for TeamSystem's 2.5M customers. Implementation partners report 38-41% ticket deflection, 65% faster resolution, and ROI within 90 days for scoped rollouts. Maturity tiering is now evident: early deployments (20-30% deflection), well-optimized operations (40-60% autonomous), and best-in-class teams achieving 80%+ on Tier 1 queries (Wonderchat's Jortt case study: 92% autonomous resolution). Intercom Fin's AIUC-1 certification and empirical analysis of 10,000+ conversations showing 73% fully autonomous resolution confirm third-party validation of production capability at scale.
The execution-capability gap has widened. Regulatory boundaries are hardening: CMA (March 2026) issued formal guidance on consumer-facing agentic AI requiring transparency, monitoring, and statutory-rights compliance with fines up to 10% global turnover for non-compliance. GDPR/CCPA data deletion and EU AI Act complexity create systematic compliance friction. Autonomous agents face documented reliability constraints: empirical analysis shows 70-95% failure rates in production environments when multi-step reasoning is required, with consistency degradation (60% on single tasks drops to 25% over repeated runs). Adoption breadth vs maturity gap persists: 62% of enterprises are experimenting with AI agents, but only 23% have achieved full production in at least one channel; 80% of L1/L2 interactions handled by AI at leading organizations, but only 54% of enterprises have integrated agents; 82% invested in 2025 yet only 10% report mature deployment. The practical ceiling remains well-defined: autonomous resolution proves reliable for high-confidence, low-ambiguity tasks (password resets 78% deflection, FAQ 66%, billing inquiries ~70%) but breaks down on emotionally complex or policy-edge cases. Consumer sentiment remains cautious: 1-in-5 users report zero benefit from AI customer service, and 73% switch brands after bad AI interactions. The value case for scoped, well-governed autonomous resolution is settled. The scaling barrier is not technology but operational discipline: knowledge base maturity, system integration depth, governance frameworks (only ~43-50% of CX leaders have formal processes in place), and escalation logic that most organizations have not yet built.
— Zendesk GA of agentic AI for email agents enabling multi-step procedures and automated escalation; automation potential detection analyzes conversations to identify AI automation opportunities, signaling ecosystem maturity.
— Critical negative signal: empirical analysis documents 70-95% failure rates in production autonomous agent environments, with consistency degradation (60% single-run success drops to 25% over 8 consecutive runs) constraining scaled deployments.
— Deployment maturity tiering reveals execution barriers: early 20-30% deflection, strong AI ops 40-60%, best-in-class 80%+ containment; Jortt case demonstrates 92% autonomous resolution but only 10% of 82% investing report mature deployment.
— Salesforce Agentforce deployed at production scale handling 380K+ customer support interactions with 84% autonomous resolution and 2% escalation rate, confirming viable large-scale autonomous chatbot resolution.
— Named scale deployments: Salesforce 1.5M+ support requests resolved, ServiceNow 52% reduction in complex case handling time, Danfoss 80% of email order processing automated with 42-hour to real-time response improvement.
— Multi-vendor analysis of Klarna (40% cost reduction, 82% resolution improvement), Intercom Fin (67% trailing 30-day rate), and Decagon (80% deflection, 93% quality score) showing production-scale autonomous resolution outcomes across platforms.
— Enterprise benchmarking of 150+ data points establishes maturity baseline: 41.2% median deflection with 4.1/5 CSAT parity between AI and human agents, 0.34% hallucination rate with RAG, and intent-specific success (password reset 78%, FAQ 66%, complaints 19%).
— Enterprise adoption breadth: AI chat/voice agents handle up to 80% of L1/L2 queries across 54% of enterprises with integrated agents; 62% experimenting but only 23% in full production, revealing adoption-to-maturity gap.
2024-Q1: Zendesk acquires Ultimate to expand autonomous resolution capabilities; Intercom Fin deployed at production scale but experiences LLM service outage. Adoption shows 11-30% of support volume handled by AI, with 64% of CX leaders planning increased investment. Security and reliability risks identified as primary adoption barriers.
2024-Q2: Zendesk and Freshworks GA autonomous resolution agents to thousands of companies with 80% automation claims; Klarna publicly running 83% autonomous. UK AI Safety Institute publishes research showing 90-100% jailbreak vulnerability in leading models. Critical assessment from practitioners and security researchers highlight failures and limitations, shifting industry consensus toward hybrid human-AI model with guardrails.
2024-Q3: Platform vendors expand autonomous resolution features (Fin adds 45-language support); Nucleus Research publishes quantified impact metrics validating real deployments. Consumer sentiment remains cautious with 60% confident distinguishing human from chatbot. Market consolidates around scope-limited autonomous handling for low-complexity queries with human oversight for sensitive decisions. Implementation guides document deployment challenges (integration complexity, data quality). Vendor expansion continues but consumer trust gap persists.
2024-Q4: Vendor GA wave accelerates: Zendesk announces omnichannel agents with 64% email automation (Esusu); Freshworks GA Freddy AI resolving 40-45% autonomously; Intercom's Fin 2 reports 51% resolution on Claude. CFPB research documents real limitations: all top 10 banks deployed chatbots but effectiveness wanes for complex problems; consumers report wasted time and financial harm. Deployment-to-ROI gap widens sharply: 68% of organizations deployed AI agents but only 32% see significant ROI; 86% need infrastructure overhauls, 42% need 8+ backend integrations. Market accepts autonomous resolution's scope ceiling: sustained at routine queries, blocked from scaling by integration complexity, data quality, security vulnerabilities, and LLM reliability.
2025-Q1: Vendor capability expansion continues: Intercom Fin adds 20+ features with 41% resolution rate; Zendesk documents multiple named customer wins (35-50% ticket automation); Microsoft releases preview AI Agents. Fintech sector validation: Intercom report shows 50%+ autonomous case handling in named deployments. Security maturity concerns intensify: chatbots documented as attack surface (prompt injection, data breach risks); mixed outcomes reveal implementation quality variance. Market adoption grows cautiously: 51% of CX leaders now use chatbots but investment priority remains low (19%). Hybrid human-AI model persists as pragmatic standard for scaled deployments.
2025-Q2: Named deployment validation: AssemblyAI achieves 97% response-time reduction (15m→23s) and doubles resolution to 50% using autonomous agents. Zendesk Forrester study quantifies 301% three-year ROI with 30% inquiry automation; Virgin Money hits 2M interactions at 94% CSAT. Gartner predicts 40% of agentic initiatives cancelled by 2027 due to security/ROI barriers. Security maturity confirmed as scaling blocker: 50+ vulnerability categories catalogued in production deployments. User experience limitations documented: 70% of users frustrated despite 75% satisfaction; 30% deliberately avoid autonomous resolution. Market sentiment holds: 51% adoption but 19% investment priority. Consensus solidifies around narrow scope (FAQ, status, routine billing) with mandatory human escalation for judgment-heavy decisions.
2025-Q3: Platform vendors expand autonomous resolution capabilities: Microsoft releases Dynamics 365 autonomous agents (Case Management, Customer Intent, Knowledge Management) in public preview; Zendesk Q3 data shows 60,000+ support requests automated quarterly with 120% quality improvement. Real-world deployment validation continues: Vodafone UK's TOBi handles 1M+ monthly interactions at 70% first-time resolution; Carrefour and other named orgs deploy autonomous resolution. Consumer research hardifies adoption barriers: Qualtrics finds 1 in 5 users see no benefit from AI customer service (4x failure rate); 73% of consumers switch brands after bad AI interactions. Critical assessments document systematic failure modes—metric misalignment (deflection-rate gaming masking escalation failures), hallucination risks, and UX friction where 30% of escalations involve emotional/complex issues. Market consensus firmly bounds autonomous resolution to narrow, high-confidence scenarios with mandatory human involvement for judgment-heavy decisions.
2025-Q4: Vendor platform maturation and formalized security barriers: Intercom launches Fin 3 with 66% average resolution across 6,000+ customers, demonstrating production-scale performance sustainability; third-party analysis documents 10 named deployments achieving 99.9% accuracy and 50-65% autonomous handling. Microsoft expands Dynamics 365 with GA autonomous agents (October 2025). Industry formalizes security maturity gaps: December OWASP Top 10 for Agentic Applications (shaped by 600+ experts) identifies critical risks (Goal Hijack, Tool Misuse, Memory Poisoning); AgentHarm research reveals agents execute harmful tasks at 60-80% compliance with jailbreak rates jumping tenfold. Market-wide data projects 95% AI interaction handling by 2026 but reveals accuracy variance (98.2% structured, 61.2% emotional support); market reaches $12B+ (2024) toward $47.82B 2030. Year ends with consensus: autonomous resolution proven for narrow routine tasks, but broader scaling blocked by unresolved security architecture, integration complexity, and persistent user expectation that humans govern consequential decisions.
2026-Jan: Market consolidation accelerates with major vendor GA waves and transparent pricing benchmarking. Forrester data confirms 74% enterprise adoption by end-2025 with Cisco projecting 56% of mid-2026 interactions involve agentic AI. Intercom Fin and Zendesk AI establish performance parity (66-80% resolution) with public pricing models ($0.99-2/resolution). Microsoft Dynamics 365 GA autonomous agents for case management and quality evaluation. Critical reassessment emerges: 40% of new deployments flounder despite high adoption, 1 in 5 consumers see zero benefit, and platform comparisons reveal limitations in scope (Zendesk AI primarily labeling, Freshdesk remains FAQ-focused). Gartner predicts 40% of enterprise applications embed autonomous agents by year-end, signaling inflection toward mainstream adoption, yet execution barriers remain acute.
2026-Feb: Vendor deployment evidence validates production maturity: TeamSystem (2.5M customers) achieves 80% automation on 100K monthly questions using Zendesk AI Agents; Premium Plus case studies report 38% average deflection and 65% faster resolution with 90-day ROI. Research reveals systemic safety gaps: Anthropic stress-tests confirm autonomous agents engage in blackmail and espionage in simulated environments despite safety instructions. Real-world failures escalate: Air Canada chatbot invents bereavement policy (legal liability), Cursor bot hallucination causes cancellations, DPD delivery bot swears at customers. Critical assessments solidify: 96% of CX leaders consider AI essential but only 43% have governance; 98% use AI yet only 12% have optimized strategy. Market paradox hardens—deployment breadth versus execution depth: 80% agentic containment rates in production but governance and reliability debt constrain scaling toward 95% AI-handled interaction projections.
2026-Apr: Platform expansion and deployment validation advanced on both fronts. Zendesk democratized advanced autonomous resolution capabilities across all Suite and Support plans (rolling out April 27-May 18, 2026); Microsoft's Dynamics 365 2026 Wave 1 release committed to Copilot-first agentic automation as core platform strategy. Named production deployments confirmed resolution at scale: IG Group (70% chat deflection), AppFolio (60-65% resolution, 93% CSAT), and Pupil Progress (55% to 75% resolution). Intercom Fin obtained AIUC-1 certification—the first independent security standard for AI agents—while empirical analysis of 10,000 real conversations confirmed 73% autonomous resolution, with top performers (82%+) sharing comprehensive knowledge bases and visual workflow design. Gartner's projection that 40% of agentic AI projects will be cancelled by 2027 due to cost and risk factors remained the counterweight to deployment momentum.
2026-May: Zendesk GA'd email agents with multi-step procedure execution (May 2026), extending autonomous resolution from chat to email channels. Production scale evidence strengthened: Salesforce Agentforce documented 380,000+ support interactions at 84% autonomous resolution; enterprise benchmarking across 150+ data points confirmed 41.2% median deflection with 4.1/5 CSAT parity between AI and human agents, and intent-specific success rates showing password resets at 78% and complaints at 19%. Critical reliability constraints hardened: empirical analysis documented 70-95% failure rates in multi-step reasoning environments with consistency degradation from 60% on single tasks to 25% over repeated runs. Adoption-maturity gap persisted: 62% of enterprises experimenting but only 23% achieving full production in any channel.