Perly Consulting │ Beck Eco

The State of Play

A living index of AI adoption across industries — where established practice meets the bleeding edge
UPDATED DAILY

The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.

The Daily Dispatch

A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.

AI Maturity by Domain

Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail

DOMAIN
BLEEDING EDGEESTABLISHED

Ticket intelligence — intent, sentiment & language detection

LEADING EDGE

TRAJECTORY

Stalled

AI that classifies ticket intent, detects sentiment and escalation risk, identifies language, and routes accordingly. Includes multi-label topic tagging and escalation prediction; distinct from ticket routing which assigns based on rules rather than understanding content.

OVERVIEW

Ticket intelligence — AI that classifies support tickets by intent, sentiment, and escalation risk — is technically proven but organisationally stalled. The core capabilities work: production deployments routinely hit 80-90% accuracy on straightforward classification, and best-in-class implementations exceed 98% on escalation routing. Every major cloud platform ships GA intent and sentiment features. The problem is getting from pilot to production. Research consistently shows that most AI agent pilots never reach deployment, blocked by integration costs, data fragmentation, and legacy infrastructure. This gap between what the technology can do and what organisations actually operationalise defines ticket intelligence as a leading-edge practice — forward-leaning teams extract real value, but the majority have not moved beyond evaluation.

CURRENT LANDSCAPE

Zendesk, IBM Watson Assistant, Google Cloud, AWS Comprehend, and NICE all ship production intent detection and sentiment analysis. As of June 2026, Zendesk has shifted its intelligent triage terminology from "Intents" to "Topics" with custom topic suggestions, saving sentiment and language detection directly to standard ticket fields—signaling product maturity focused on accuracy refinement. Deployments that reach production show compelling returns. Fin AI reports greater than 98% accuracy on escalation routing; AssemblyAI cut first-response time from 15 minutes to 23 seconds with 50% automated resolution; Grove Collaborative reduced ticket volume by over 80% through intent-based routing; Coforge's RAG-based system achieved 30–50% faster resolution in banking/insurance production environments. The sentiment analytics market reached $5.71B in 2025, and 82% of senior leaders report investing in AI-powered customer service tools.

Getting there remains hard. RAND and Gartner data indicate 88% of AI agent pilots never advance past proof-of-concept, with integration costs running $140K-$350K and timelines stretching to four to six months. OpenAI's own research frames this as a "capability overhang" — the technology is ready, but most organisations lack the execution frameworks to use it. Technical limitations compound the organisational ones: single-label routing fails when tickets carry stacked intents, automation hits hard limits on entry tractability (the customer's emotional and trust state), and indirect language patterns that humans navigate naturally. Practitioner assessments document that current systems struggle with the long tail of emotionally complex support cases—ticket intelligence works reliably for routine queries (password resets, status checks) but frequently fails on frustrated customers requiring empathy, relationship repair, or nuanced policy interpretation. An Intercom survey of 2,400 support professionals found 77% say AI meets or exceeds expectations, yet only 10% have reached mature deployment—a ratio that captures where this practice actually stands.

TIER HISTORY

ResearchJan-2018 → Jan-2018
Bleeding EdgeJan-2018 → Jan-2020
Leading EdgeJan-2020 → present

EVIDENCE (150)

— Zendesk details intent detection, sentiment analysis, and language identification as core orchestration layer for AI-powered ticket routing and triage, demonstrating vendor continued investment in ticket intelligence.

— Multi-source enterprise adoption benchmarks: 72% IT orgs deployed AI-assisted classification, 60% Global 2000 tickets auto-classified (IDC), 40-60% deflection rates, $1.80-$4.50 cost per AI-handled ticket vs. $22.50 human, demonstrating enterprise-scale intent/sentiment deployment.

— Independent practitioner guide with multi-vendor comparison documents four sentiment types and critical limitation that sentiment scores are only actionable when attached to routing/escalation decisions, not merely reported.

— Peer-reviewed testing on 20,000 call center transcripts shows Qwen2.5 LLM-based sentiment achieves F1 0.91 vs. traditional ML 0.82, demonstrating practical maturity of LLM-based sentiment detection for support operations.

— Vendor analysis assesses intent detection, sentiment analysis, and language identification as industry table-stakes (90-95% tier-1 accuracy parity), indicating core ticket intelligence capabilities now commoditized with differentiation shifting to integration and escalation logic.

— Peer-reviewed empirical analysis of 70,450 support conversations reveals sentiment analysis correlates only 0.36 with actual satisfaction vs. 0.47 for LLM-based approaches, identifying critical limitation that sentiment detection alone misses tolerated friction.

— Zendesk expands Intelligent Triage from premium Copilot to Professional tier (effective July 2026), including topic, sentiment (5-tier scale), ~150 languages, and entity extraction—signaling democratization of ticket intelligence from premium to mid-market tier.

CSAT by Intent Type: 2026 BenchmarksAdoption Metrics

— Production deployment benchmarks show intent classification effectiveness varies sharply by type: 98.2% success on structured tasks (password, refunds) but only 61.2% on emotional/complex intents, revealing capability boundaries and where sentiment/intent detection falls short.

HISTORY

  • 2018: IBM Research demonstrated sentiment analysis on real service provider ticket data for subscription renewal prediction; production helpdesk systems achieved 90% accuracy routing 40,000+ emails/month across major providers; sentiment analysis market forecasted to grow from $123M to $3.8B; multi-label classification on real ticket data still limited to 54% accuracy.
  • 2019: Enterprise deployments accelerated—Lufthansa deployed Watson NLU across 15,000 agents, Meltwater scaled sentiment analysis to 450M documents/day for 30,000+ customers; research advanced multi-intent detection (NAACL 2019); Google released AutoML Natural Language GA; sentiment detection became production-standard, but full automation remained constrained by multi-label complexity and edge cases requiring human review.
  • 2020: IBM advanced Watson with idiom/colloquialism detection and Project Debater commercialization; intent detection benchmarking showed Watson Assistant outperforming competitors by 5-14 percentage points; Observe.ai demonstrated production sentiment analysis driving 50% conversion gains; escalation prediction emerged as validated research focus; ticket quality assessment began attracting academic attention; multi-language support and integration complexity remained primary adoption barriers.
  • 2021: Intent detection matured with peer-reviewed NAACL benchmarking validating Watson Assistant leadership; sentiment analysis expanded to 7-tone emotion detection (frustrated, satisfied, etc.); research advanced transformer-based approaches for handling imbalanced customer datasets; specialized tooling proliferated (Nyckel, etc.); methodological refinements (token-level labeling, transfer learning) incrementally improved accuracy; comprehensive multi-capability integration remained complex.
  • 2022-H1: Google Cloud and NTT DATA demonstrated production ticket intelligence deployments using cloud AutoML and semantic NLP for feedback routing and categorization; sentiment.ai and competing tools advanced multilingual accuracy through deep learning benchmarks; Info-Tech research identified ticket intelligence ROI barriers in ITSM adoption; FinTech sector began applying sentiment analysis to support for issue prioritization and escalation detection; ecosystem tooling matured with specialized support-ticket classifiers.
  • 2022-H2: Google Cloud's internal production pipeline deployed clustering and anomaly detection on support tickets at scale; intent detection advanced in financial services (Banking77 benchmarking) and real-time voice support (LSTM-based latency optimization); sentiment analysis pipelines scaled in AWS and GCP for near-real-time routing; NAACL 2022 research tackled novel intent detection under constrained annotation budgets. However, Zendesk's documented limitations on language-specific accuracy showed production failures in unsupported languages; multi-label simultaneous classification remained organizationally complex; integration with legacy ITSM systems continued to lag behind cloud-native platforms.
  • 2023-H2: Research acceleration on intent detection methods (open models, self-supervised pre-training via RSVP/EMNLP 2023) alongside continued advancement in novel intent discovery frameworks. Vendor product evolution continued with Google Cloud's new PaLM-based sentiment model in Natural Language API v2. Real-world case studies demonstrated active adoption (Qlik escalation reduction, Indonesian company 1000+ ticket pilot), validating ROI drivers. Critical adoption barriers remained: resource quota constraints on cloud platforms, unsupported language failure modes, and complexity of comprehensive multi-label classification requiring continued human review.
  • 2024-Q1: Organizational adoption accelerated with 70% of C-level support executives planning AI investment in customer service; real-world deployments demonstrated operational ROI (telecom companies reducing escalations 18% via real-time sentiment monitoring). Sentiment analysis standardized across cloud platforms with mature tooling. However, vendor platform reliability became a constraint—practitioners reported critical failures during AutoML-to-Vertex migrations, exposing inadequate migration paths and documentation gaps. Structural barriers persisted: cloud platform quota limitations, language-specific failure modes, and complexity of simultaneous multi-label classification continued to require human review in production.
  • 2024-Q2: Intent detection expanded to automotive customer systems (General Motors OnStar), confirming real-world enterprise deployment momentum. Peer-reviewed research revealed LLM limitations in complex sentiment analysis tasks—a key finding that sophisticated ticket intelligence remained beyond LLM reach. Google Cloud's Natural Language API deprecation and migration to Vertex AI created user confusion and exposed vendor documentation gaps. All major cloud platforms (IBM, Google, AWS, Azure) offered production sentiment/intent/emotion detection, but vendor platform stability and tooling reliability remained uneven constraints on broader adoption.
  • 2024-Q3: Enterprise adoption planning accelerated with 70% of C-level support execs planning AI investment per Zendesk survey, while open-source implementations demonstrated multi-label classification at 93% accuracy. However, critical headwinds emerged: peer-reviewed validation that LLMs lag on complex sentiment detection, real-world AI failures in customer service contexts, McKinsey data showing adoption plateau at 50-60% due to cost and hallucination barriers, and Google's incomplete Natural Language API migration creating vendor platform instability. Structural barriers persisted: quota constraints, language-specific failure modes, and multi-label classification requiring human review. Credibility gap between vendor marketing and production reality widened as adoption matured.
  • 2024-Q4: Market adoption metrics confirmed 1,250+ solutions globally serving 4,500+ corporate end-users, demonstrating category-level breadth. Academic research continued advancing intent detection methods (CNN-BiLSTM) while peer-reviewed evidence confirmed LLMs lag on complex sentiment. Market skepticism intensified: MIT economist warned AI infrastructure investments may underperform; industry analysis found only 44% of companies had AI strategies despite 76% feeling competitive pressure; real-world customer service AI failures documented. Vendor platform churn continued with Google's incomplete migration. Gap between investment intent and deployment execution widened; adoption plateau persisted at 50-60% due to cost, hallucination risks, and execution complexity. Practice remained in leading-edge with broad organizational consideration but significant headwinds to sustained momentum.
  • 2025-Q1: Vendor platforms matured with Google Cloud Natural Language and Freshworks shipping enhanced AI ticketing features. Real deployments showed strong ROI: Infiniticube's Sun West Mortgage case study achieved 40% faster resolution and 30% cost reduction; Glammmup improved CSAT from 62 to 78 via sentiment analysis. Research advanced multilingual emotion detection (SemEval 2025) but revealed language-specific robustness gaps. Critical failures documented in language detection production deployments exposed fundamental technical limitations. Organizational commitment remained high (70% of C-level execs planning investment) despite widening gap between investment intent and deployment feasibility.
  • 2025-Q2: Real-world deployments demonstrated substantial ROI: AssemblyAI achieved 97% reduction in first response time and 50% automated resolution; Monte dei Paschi di Siena Bank deployed BERT-based classification at 85.88% accuracy on production tickets. Practitioner guidance emerged on escalation-trigger design including sentiment and complexity detection. However, platform disruption accelerated—Google announced June 2025 cutoff for AutoML Text classification/sentiment/entity extraction, forcing deployments to migrate to Gemini-based approaches. Vendor platform instability emerged as critical adoption barrier alongside persistent multi-label classification complexity and language-detection production failures.
  • 2025-Q3: Escalation-routing deployments matured with Fin AI achieving >98% accuracy on production escalation decisions using custom models with sentiment/complexity logic. NICE and major vendors reinforced intent detection as standard GA feature. Adoption metrics confirmed ~28% resolution time improvement from AI-driven ticket triage with up to 35% ticket deflection. However, customer sentiment data revealed significant headwinds: 70% of consumers abandon brands after poor AI experiences and 88% prefer human agents, contradicting vendor deployment momentum. Platform disruption continued—Google's AutoML cutoff caused migrations; vendor documentation gaps exposed organizational friction. Credibility gap between marketing claims and production reality widened, creating tension between strong investment intent (70% of C-level execs, 78% of organizations deploying AI) and execution barriers (quota constraints, language-detection failures, multi-label complexity).
  • 2025-Q4: Independent case studies demonstrated strong intent detection ROI (Grove Collaborative's 80%+ volume reduction via intent routing, Eurail's 95% first-response improvement). Market growth accelerated with $12.06B AI customer service market in 2024, projected $47.82B by 2030 (25.8% CAGR). However, McKinsey data (September 2025) revealed 73% of AI pilots fail to reach production with integration costs of $140K–$350K and 4–6 months required. Text-based intent/sentiment analysis exposed fundamental limitations: poor tone detection, inability to probe surface issues, and weak cross-ticket pattern recognition. Platform disruption persisted with ongoing Google AutoML migrations. Practice remained firmly leading-edge with clear organizational momentum (70% of C-level executives planning investment) but widening execution-intention gap as adoption barriers (integration costs, documentation gaps, technical limitations) became more apparent.
  • 2026-Jan: Vendor platforms released coordinated capability upgrades (Google Cloud, IBM Watson, Zendesk all shipped enhanced GA features) signaling continued investment in production maturity. However, OpenAI's "capability overhang" analysis and RAND/Gartner research documented fundamental deployment barriers: 88% of AI agent pilots fail to reach production; enterprise implementation requires 4–6 months and $140K–$350K integration costs. Market data confirmed strong adoption momentum (82% of senior leaders invested, sentiment analytics market at $5.71B), but revealed persistent tension between technical readiness and organizational execution capability. The practice remained in leading-edge with clear momentum but plateauing ROI realization.
  • 2026-Feb: Vendor platforms released coordinated refinements—Zendesk shipped intent quality recommendations for personalized conflict resolution, and independent research demonstrated real-world ticket classification in public administration (ISTAT). However, Syncro deprecated AI ticket classification as adoption barriers persisted. Market adoption accelerated (77% of teams report AI meeting/exceeding expectations, 80% of routine interactions handled by AI) yet early maturity remained (only 10% at mature implementation stage). Critical analysis documented production limitations: single-label routing fails with stacked intents and accuracy metrics miss real containment/task success signals. Platform evolution and organizational execution gaps continued to define the leading-edge plateau.
  • 2026-Apr: Major vendors accelerated platform refinements—Zendesk added intent quality recommendations and entity extraction reporting; AWS announced Predictive Insights for Amazon Connect with intent and sentiment detection; Cisco Webex expanded multilingual sentiment analysis support across 7 new languages. Real-world case studies documented strong ROI: SupportLogic customers (Salesforce, Nutanix, Basware, Databricks) achieved 30-80% escalation reductions; Robylon deployments reached 93% ticket classification accuracy on 300k+ annual tickets with 83% automation. Large-scale adoption analysis (150+ enterprise deployments, 10M+ tickets) confirmed 95%+ routing accuracy and 80% autonomous resolution as baseline capabilities. Vendor product consolidation signaled market maturity—Zendesk's acquisition of Forethought positioned ticket intelligence (intent, sentiment, language detection) as foundational infrastructure rather than optional features. Market-wide sentiment analysis tools demonstrated 94%+ accuracy benchmarks. Leading-edge plateau persisted due to integration complexity and organizational execution barriers despite strong technical capability and proven ROI.
  • 2026-May: Additional production case studies confirmed strong real-world ROI: DataArt deployed real-time sentiment analysis on support calls using Whisper + Bedrock; Helpshift customers (Halfbrick, Hutch Games, Supercell) reduced resolution time from 84 to 9 hours via intent classification and sentiment scoring; eZintegrations reported 89%+ sentiment classification accuracy and 91%+ urgency detection. Market adoption data aggregating 150+ benchmarks showed intent-based deflation asymmetry (structured 70%+ deflation, sentiment-heavy 19-34%) indicating practice maturity variance. Industry benchmarks reported 78%+ AI adoption in support operations with $3.50–$8.00 ROI per dollar invested and 25.8% market CAGR through 2030. However, vendor consolidation continued—IBM's Watson Tone Analyzer (standalone sentiment detection) deprecated by February 2023, marking shift toward embedded capabilities in multi-function platforms. Leading-edge plateau persisted with proven technical maturity (80-95% accuracy, 28%+ resolution improvement) and strong organizational commitment (78%+ adoption), but execution barriers (integration costs, documentation gaps, multi-label complexity) continued to limit production deployments.
  • 2026-May (mid): Vendor ecosystem reinforced sentiment/intent as GA features: Microsoft shipped email sentiment classification in Dynamics 365; Sprinklr's production deployment reached 10B sentiment predictions/day (>80% accuracy, 100+ languages). Real deployments demonstrated operational maturity: Lexsis detected issue spikes (280% increase) within 72 hours vs. 10 days manual, preventing $85K losses; Weverse deployed multilingual NLP across 245 countries. Adoption metrics (Databricks 20K+ customers) showed 40% of AI workflows involve ticket classification/routing. However, vendor platform stability emerged as critical headwind: Google quietly deprecated Gemini 2.0 Flash production API without public notice, forcing LLM-based ticket intelligence deployments to migrate to preview-tagged models. Multilingual accuracy benchmarks (Mihup) confirmed production-ready sentiment trajectory (82-89% across Hindi/Tamil/Bengali/Marathi) but revealed language-specific fragmentation. Intent detection accuracy improved to 85-95% (2025), but operational bottleneck shifted: confidence threshold tuning now critical barrier vs. classification itself. Adoption patterns showed intent routing (23% of feedback contains intent signals) as operationally distinct from sentiment-only systems. Leading-edge status sustained by technical maturity and market-scale deployment (Sprinklr's 10B predictions, Databricks 327% agentic growth) but platform instability risks and execution barriers continued to constrain broader adoption.
  • 2026-May (late): Recent independent research exposed deeper deployment challenges contradicting vendor momentum claims. Digital Applied's independent audit (53 verified data points) documents vendor-vs-field deflection gap: vendor self-reports (Ada/Decagon 70-80%) vastly exceed Zendesk enterprise median (41.2%, top quartile 58.7%) — a 30-40 percentage-point reality gap. Metrigy's multi-year study shows inferred sentiment adoption accelerating (15% to 45%, 3× in 2 years) with measurable ROI (22% cost reduction, 31.7% efficiency gain) but revealing operational dependency on clean data and iterative tuning. CMSWire's critical analysis cites Gartner data showing only 14% of service issues fully resolved through self-service, with 43% failures from content gaps and 45% from system misunderstanding—contradicting adoption optimism. Critical signal: Sinch's AI Production Paradox finds 74% of enterprises rolled back AI agents post-deployment due to governance failures despite 62% already live in production, with real documented failures (Air Canada hallucinations, Chevrolet prompt-injection refund exploit, Cursor churn-inducing false policies) showing intent/sentiment detection gaps. Positive counterweight: independent sentiment taxonomy research (Arabic NLP, multilingual BERT frameworks) confirms classification methods advancing; intent accuracy improved to 85-95%; Metrigy's 100% conversation coverage (vs. survey sampling bias) reveals structural shift from sampling to universal sentiment scoring. Platform instability persists (Google Gemini 2.0 Flash deprecation, quota constraints) but vendor consolidation signals maturity—Sprinklr's ViralMoment acquisition expands sentiment detection to multimodal (video/image/audio). Practitioner assessments (Faye Digital, WFM Labs) document matured technical foundation with operational prerequisites: data quality, threshold tuning, fallback design. Leading-edge plateau sustained: strong technical capability (85-95% intent accuracy, 80-90% sentiment accuracy, >98% escalation routing possible) and proven ROI in mature deployments clash with vendor-vs-field effectiveness gap, high rollback rates, and prerequisite organizational complexity driving adoption stall despite widespread investment intent.
  • 2026-Jun: Zendesk's June 2026 Intelligent Triage GA update renames "Intents" to "Topics," adds custom topic suggestion, and saves sentiment and language detection directly to standard ticket fields — a product-maturity signal focused on usability and accuracy over feature expansion. Production validation continues: Coforge's RAG-based system achieved 30–50% faster resolution and 40%+ knowledge reuse on 50,000+ tickets in banking/insurance; peer-reviewed BiLSTM research on 3M+ real support conversations achieves 84.45% precision on dissatisfaction detection under weak supervision. Practitioner critique sharpens the hard limits: four documented failure modes (entry tractability, indirect language, policy coherence, customer emotional state) show where intent/sentiment detection stalls on the long tail of complex cases. Gartner survey of 321 CX leaders shows 91% under pressure to deploy AI with 88% using it, but only 25% have achieved integration — the tooling layer is mature while the execution layer remains the binding constraint.
  • 2026-Jun (mid): Additional vendor and research evidence confirms both maturity and limitations. Zendesk expands Professional tier access to Intelligent Triage (topic, sentiment, ~150 languages, entity extraction), signaling democratization from premium to standard tier effective July 2026. Peer-reviewed research on 70,450 real support conversations shows fundamental sentiment-analysis limitation: sentiment alone correlates 0.36 with actual customer satisfaction vs. 0.47 for richer LLM-based satisfaction + problem extraction. LLM-based sentiment in call centers achieves F1 0.91 (Qwen2.5) vs. 0.82 traditional ML, but degrades to 0.76 on ASR transcripts (improving to 0.88 with agentic refinement). Production benchmarks show intent-classification effectiveness varies dramatically by request type: 98.2% success on structured tasks (passwords, order status) vs. 61.2% on emotional/complex intents (complaints, disputes). Multi-vendor vendor adoption metrics (72% IT orgs, 60% Global 2000 auto-classified) and market assessment confirm intent/sentiment/language detection now table-stakes (90-95% vendor parity) with competitive differentiation shifting to integration, escalation logic, and organizational execution prerequisites. Remaining barriers: sentiment scores actionable only when attached to routing/escalation decisions (not mere reporting), complex emotions require richer annotation than tonality, and technical maturity outpaces organizational deployment capacity.
  • 2026-Jun (late): Adoption benchmarks and capability boundaries solidify from multiple independent sources. Enterprise adoption data confirms 72% of IT organisations have deployed AI-assisted classification and 60% of Global 2000 tickets are auto-classified (IDC), with AI cost per ticket at $1.80–$4.50 vs $22.50 human baseline and 40–60% deflection rates at scale. Zendesk's June 2026 detailed ticketing guide frames intent detection, sentiment analysis, and language identification as the core orchestration layer for routing and triage—a vendor positioning shift from optional capability to foundational infrastructure. Independent practitioner analysis documents the critical operational constraint: sentiment scores are actionable only when directly attached to routing or escalation decisions; teams that deploy sentiment as a reporting layer rather than a decision layer see no meaningful operational benefit. AI customer service agent assessment confirms core ticket intelligence capabilities (intent, sentiment, language) have reached 90–95% tier-1 accuracy parity across vendors, with differentiation now fully dependent on integration depth, escalation logic, and organizational execution—the technology commoditisation milestone has arrived.

TOOLS