The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail
AI that automatically summarises support calls and generates disposition codes and structured notes for CRM entry. Includes after-call work automation and key moment extraction; distinct from call transcription in sales which focuses on sales conversations rather than support calls.
Call summarisation and disposition has reached mainstream platform maturity with ecosystem-wide GA adoption and documented ROI from early production deployments. Every major vendor (Microsoft, AWS, Zendesk, Genesys, Talkdesk, ServiceNow, Oracle, Webex, Dialpad, Five9) ships AI-generated post-call summaries and automated disposition as core GA features, and proven deployments report 25-50% AHT reductions with sustained agent productivity gains (40-90 seconds saved per call in field implementations). The practice replaces manual after-call work—typing summaries, selecting disposition codes, updating CRM—with LLM-based automation that extracts issue, resolution, action items, and classification codes on-call or immediately post-call. The technology clearly works at scale: Genesys, Vapi, and AWS implementations confirm sub-100-second automation cycles, and third-party platforms (Lindy.ai, Lindy) are shipping plug-and-play solutions. However, production deployments universally require deliberate architectural choices, quality validation protocols, and acceptance of persistent limitations. Hallucination, fabricated customer statements, and diarisation failures on hybrid calls remain unsolved technical barriers that separate capability availability from mid-market adoption. Organisations willing to invest in RAG-grounded architectures, fine-tuning, and human-in-the-loop validation achieve genuine efficiency gains; those deploying out-of-the-box models continue to experience agent distrust and quality failures.
Platform ecosystem standardisation is complete: all major contact centre vendors (AWS, Microsoft, Genesys, Zendesk, ServiceNow, Oracle, Webex, Talkdesk, Dialpad, Five9, CloudTalk) offer GA call summarisation bundled into base platform pricing. Geographic expansion continues (AWS Contact Lens June 2026 added Portuguese, French, Italian, German, Spanish, Chinese, Japanese, Korean support), confirming sustained investment in global production readiness. Third-party vendors (Vapi, Lindy.ai, Aircall, Nextiva) are shipping independent AI summarisation and disposition products, demonstrating ecosystem depth beyond platform incumbents.
Real-world deployments confirm ROI at scale: Genesys implementations document 45-90 seconds saved per call; Lindy.ai reports 40-60 second reductions with +40% contact centre capacity; Five9 TruConnect (healthcare) achieved 40% ACW reduction; Verint baseline research (1,000 agents) establishes 54% of calls require after-call work including summarisation, confirming scale of demand. Field implementations validate 25-50% AHT reduction from combined front-of-call and back-of-call AI automation, establishing credible mid-range ROI.
However, production deployments reveal persistent technical limitations constraining mid-market adoption. Hallucination remains endemic: independent comparative testing of five AI medical scribes documents systematic fabrication of medications/dosages, phantom exam findings, and confabulated patient statements—error types that directly map to call summarisation risks (invented customer statements, misrepresented agreements, fabricated action items). AI Evals production framework establishes threshold requirements (faithfulness >95%, coverage >85%) that most raw summaries fail; organisations deploying out-of-the-box models face 63-89% raw accuracy, rising to 94-96% only with structured human-in-the-loop validation. Speaker diarisation accuracy drops ~30 percentage points on hybrid calls; domain jargon blindness requires custom vocabulary tuning; context reconstruction on escalations costs $200-500 per incident. Genesys implementations explicitly document mandatory human review of AI outputs before finalisation (agents cannot skip validation step), and UJET research confirms 93% of agents feel need to double-check AI outputs pre-deployment despite crediting summarisation with ACW reductions. The practice tier has stabilised at good-practice: mainstream feature availability coexists with explicit technical barriers (hallucination, quality validation costs, tuning complexity) that separate capability from confident autonomous deployment.
— Peer-reviewed empirical study of LLM-based call summarization (JMIR); rated summaries across accuracy, thoroughness, and hallucination freedom; documents both competence (useful 4.8/5, consistent 4.9/5) and gaps (hallucination-free 4.4/5)—validating careful quality assessment as required practice.
— Directly studies hallucination in LLM-based summaries across major models (ChatGPT 0.62, GPT-4 0.84, Claude 2 1.55 hallucinations per summary); demonstrates 35% hallucination reduction via factored verification, critical constraint for disposition reliability.
— GA call summarization platform with reasoning-first architecture generating structured post-call summaries (intent, resolution, sentiment, next steps) that flow into CRM/helpdesk; 98% accuracy claim and 48-hour deployment timeline demonstrating market maturity.
— Verint Wrap Up Bot uses generative AI to automate after-call summarization; named case study (Utilita Energy) documents 35-second reduction in summary time and 10% agent capacity increase from automated disposition.
— Benchmarks hallucination risk on 2,075 real production contact center calls (Feb-May 2026); finds GPT-5.5 achieves 84.8% non-hallucination rate in production, establishing 15.2% production failure ceiling even for frontier models—directly relevant to disposition quality reliability.
— Named fintech deployed Claude models via Amazon Bedrock for automated call summarization in production, achieving 250,000+ annual hours saved, 18-second handling-time reduction, 5-point NPS lift, and $700k annual efficiency gain.
— Peer-reviewed JMIR study introduces multi-dimensional evaluation framework (fabrication, accuracy, comprehensiveness, usefulness) for LLM-generated summaries; demonstrates systematic quality assessment methodology applicable to call summarization validation.
— Production evaluation framework rejecting ROUGE metrics in favor of faithfulness >0.95 (one fabrication per 20 summaries) and coverage >0.85; recommends FActScore atomic-claim decomposition and RAGAS hallucination scoring for continuous monitoring—establishing quality baseline for contact center deployments.