Perly Consulting │ Beck Eco

The State of Play

A living index of AI adoption across industries — where established practice meets the bleeding edge
UPDATED DAILY

The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.

The Daily Dispatch

A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.

AI Maturity by Domain

Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail

DOMAIN
BLEEDING EDGEESTABLISHED

Call summarisation & disposition

GOOD PRACTICE

TRAJECTORY

Advancing

AI that automatically summarises support calls and generates disposition codes and structured notes for CRM entry. Includes after-call work automation and key moment extraction; distinct from call transcription in sales which focuses on sales conversations rather than support calls.

OVERVIEW

Call summarisation and disposition has graduated from experimental feature to proven capability. Every major contact centre platform now ships AI-generated post-call summaries and disposition codes as GA functionality, and early adopters report 25-40% reductions in handle time once they invest in tuning. The practice replaces the manual after-call work agents perform on every interaction — typing summary notes, selecting disposition codes, updating CRM records — with models that extract the customer's issue, resolution, action items, and classification codes automatically. The question facing most organisations is no longer whether the technology works but how much customisation it demands. Out-of-the-box accuracy sits well below production-grade thresholds, and bridging that gap requires structured validation workflows, domain-specific fine-tuning, and ongoing human review. Organisations that make that investment see real returns; those expecting plug-and-play results do not.

CURRENT LANDSCAPE

The vendor ecosystem has reached full feature parity. Microsoft, AWS, Zendesk, ServiceNow, Oracle, Talkdesk, Webex, CloudTalk, and Dialpad all offer GA summarisation capabilities, often bundled into platform pricing rather than sold as add-ons. Recent releases reflect refinement rather than novelty: Microsoft extended Copilot with row-level summarisation in Dynamics 365 Customer Service, Cisco Webex added mid-call transfer summaries with API access, and CloudTalk shipped AI tagging with direct CRM auto-entry. Industry estimates put contact centre adoption above 60%, with documented cost reductions of 35% at deployed sites.

Those headline numbers obscure a persistent gap between feature availability and production-grade accuracy. Microsoft's own Azure AI documentation now explicitly flags dialectal variance, abstractive hallucination, and degraded performance on under-represented conversation types as known limitations. Independent testing tells a similar story: raw AI summaries achieve 63-89% accuracy, while deployments with structured human-review workflows reach 94-96%. Speaker diarisation accuracy drops nearly 30 percentage points on hybrid calls, and domain jargon remains a blind spot without custom vocabulary tuning. Context reconstruction on escalated tickets still costs an estimated $200-500 per incident. The pattern is clear: organisations willing to invest in validation protocols and fine-tuning unlock genuine efficiency gains, but the out-of-the-box experience remains insufficient for unsupervised use.

TIER HISTORY

ResearchJan-2021 → Jan-2021
Bleeding EdgeJan-2021 → Jan-2023
Leading EdgeJan-2023 → Apr-2025
Good PracticeApr-2025 → present

EVIDENCE (100)

— Mid-size European bank case study: 47,000 calls/quarter with only 26% captured in CRM summaries; unanalyzed 35,000 calls contained 2,800 upsell signals, 1,400 churn warnings, 340 compliance gaps, revealing material adoption and implementation gap.

— Independent third-party testing of 10 call summary platforms across 400+ real test calls; demonstrates ecosystem maturity with broad vendor feature parity and adoption breadth across major contact center platforms.

— AWS Transcribe Call Analytics product page: tier-1 platform GA'd generative AI call summarization combined with call categorization/disposition; confirms market-leading vendor capability maturity and feature consolidation.

— Technical analysis of AI meeting summarization pipeline with specific error rates: ASR 3-35% WER depending on conditions, diarization 11-13% error, LLM hallucination measurable; directly applicable error modes and failure patterns to call summarization deployments.

— Deepgram releases domain-specific language model for contact center call summarization, fine-tuned on 200K conversations with quantified wrap-up time reduction use case demonstrating vendor-specific optimization for summarization practice.

— Cisco Webex Contact Center official feature documentation: GA'd AI conversation summarization across multiple scenarios (dropped calls, AI transfers, consults); confirms tier-1 platform capability with CSAT improvement outcomes.

— Real-world call center automation guide including Telefónica Germany deployment case with specific ACW reduction and operational efficiency metrics, positioning summarization as core automation lever in modern contact center stacks.

— NIST AI 600-1 governance framework requiring pre-deployment TEVV for confabulation testing in regulated domains; directly applicable to call summarization quality assurance requirements in financial services, healthcare, and compliance-sensitive operations.

HISTORY

  • 2021: IBM Research releases TWEETSUMM dataset for customer service dialog summarization; AWS Contact Lens launches production machine learning call summarization; Zendesk and competitors begin early access programs; standalone vendors like Noota claim commercial traction.
  • 2022-H1: AWS expands with Transcribe Call Analytics GA (March); Microsoft releases Context IQ AI-generated summaries in Dynamics 365 (April); ASAPP launches AutoSummary with 10%+ handle time reduction claims (May); agent surveys show 41% prioritize call summarization automation as top workflow improvement (June); academic research surfaces LLM position bias and gaps in responsible AI consideration in summarization systems.
  • 2023-H1: Technology shifts to LLM-based approaches across all major platforms; Zendesk and Google expand geographic rollout of generative AI summarization (March); research demonstrates fine-tuning techniques for smaller LLMs with controlled summary length (April); open-source community continues active development (CallSum, June); technical focus moves from capability maturation toward safe, fair, and responsible deployment patterns.
  • 2023-H2: Secondary vendors (CallMiner, Talkdesk) launch generative AI summarization capabilities (July-September); AWS publishes production deployment patterns for LLM-based summarization (November); Balto AI survey documents actual adoption momentum and ROI realities in contact centers (October); practice approaches commodification with cost and tuning barriers replacing capability barriers.
  • 2024-Q1: Microsoft automatically enables Copilot summarization for all Dynamics 365 Enterprise customers (January), signaling mainstream production-ready status; AWS enhances Contact Lens with generative AI post-contact summaries (March); ClickUp ships native call summarization in workflow platform; Qualtrics survey shows only 20% agent AI adoption despite platform availability; research and vendor analysis document persistent challenges: hallucination risks, accuracy issues across languages, 120-300 seconds still spent per call on dispositioning, less than 25% of notes meeting quality standards. Practice moves into mainstream with availability but deployment success requires significant customer-specific tuning.
  • 2024-Q2: Microsoft announces Dynamics 365 Contact Center as Copilot-first CCaaS platform with general availability July 1, 2024, establishing call summarization as core capability; AWS markets generative AI summarization in Transcribe Call Analytics for post-call efficiency; ServiceNow releases post-call summarization in Q2 2024 CSM update; Microsoft Wave 1 enhancements expand Copilot across omnichannel. Vendor consensus confirms market readiness, but Deloitte survey finds only innovator segment (minority) actively deploying, indicating broad platform availability without proportional adoption uptake. Persistent accuracy and tuning barriers remain despite universal vendor support.
  • 2024-Q3: Platform standardization completes—Microsoft, AWS, ServiceNow all deliver production summarization capabilities with general availability. Microsoft Dynamics 365 Contact Center launches July 1, 2024 (Copilot-first CCaaS); AWS extends summaries to agents in Contact Lens (July); ServiceNow formalizes now-assist call summarization (Aug). Microsoft's September guide emphasizes pilot-first rollout with measurement criteria for tuning success. However, technical barriers persist: peer-reviewed empirical study documents fine-tuned BART models achieving 71% recall but >50% degradation in zero-shot scenarios; Australian government evaluation shows LLMs produce verbose hallucinated summaries inferior to human effort. Practice commodified but constrained by deployment tuning and accuracy barriers.
  • 2024-Q4: Early production deployments validate ROI at scale. Lenovo case study (December) documents 15% productivity gains and 20% handle time reduction with Copilot summarization. Amazon reports tens of thousands of Connect customers (10M daily interactions) with named adopters across retail, logistics, education, and travel. AWS releases fresh generative AI analytics (December) and detailed secure-summarization technical tutorial (October); research advances fine-tuning of smaller cost-efficient LLMs with length control (October). Gartner recognizes Microsoft as CRM leader, validating strategic Copilot-first architecture. Broadscale adoption remains constrained by accuracy, tuning, and business case barriers despite universal feature availability and proven Fortune 500 deployments.
  • 2025-Q1: Full platform standardization and feature expansion signal vendor confidence. AWS restructures Connect pricing to bundle post-contact summaries (March 2025), Microsoft extends summaries into quality management and compliance workflows (February 2025), and Zendesk achieves GA for agent workspace summaries (March 2025). Gartner reports 60%+ adoption across contact centers with 87% projected by year-end. Specialized vendors optimize for domain: AssemblyAI releases Conversational summarization models targeting support calls (February 2025). Practice moves from platform availability to deployment friction: organizational adoption, summary quality tuning, and ROI validation for mid-market remain the binding constraint on broadscale growth.
  • 2025-Q2: Production deployments validate ROI at scale; technical limitations become explicit in vendor transparency. Wisconsin DOR achieves 66% cost reduction and 60% hold time improvement across 500 agents with Amazon Connect Contact Lens (May 2025); Metrigy research documents 35% call time savings (May 2025). Simultaneously, Microsoft's official Azure AI documentation (June 2025) acknowledges quality degradation across language dialects and hallucination risks in production systems, and industry analysis (Dialpad, May 2025) details ASR errors and lack of reliable evaluation metrics for factual consistency. Practice reaches inflection point: mainstream platform adoption and early adopter ROI validation coexist with explicit documentation of technical barriers for broader deployment.
  • 2025-Q3: Platform standardization completes with quality focus shift. Empower (financial services) scales Amazon Connect Contact Lens + Bedrock for QA automation with 5,000 daily transcriptions and 20x QA efficiency (August 2025); global company deploys Dynamics 365 Contact Center with Copilot post-call summaries across regions (July 2025). ServiceNow updates Now Assist documentation (July 2025); Zendesk expands GA internationally (September 2025). Observe.AI research documents critical quality limitation: all 20 major LLMs (OpenAI, Claude, Llama, Nova) exhibit measurable operational bias on real call transcripts—shifting narrative from deployment speed to bias mitigation (August 2025). Practitioner ROI claims remain (25-40% handle time reduction) but adoption constraints shift from availability to accuracy and organizational change management. Practice consolidates in good-practice tier as early adopters prove ROI while broader mid-market adoption waits for bias and fine-tuning solutions.
  • 2025-Q4: Vendor feature parity reaches completion with enterprise-context capabilities. AWS launches AI-powered case summaries supporting multi-interaction and cross-team context (November 2025); Oracle ships automatic summarization with agent review workflows (October 2025); Zendesk enhances ticket summary capture with expanded word limits and improved context inclusion (October 2025). Platform commodification stabilizes with all major vendors offering GA features; remaining barriers are implementation friction (fine-tuning for dialect/vocabulary), organizational adoption (agent retraining), and quality limitations (persistent LLM bias from Q3 research). Early-adopter ROI documented (25-40% handle time reduction, 30% productivity gains) is primarily driven by customer-specific tuning, not platform feature quality alone. Practice remains at good-practice tier—proven ROI for innovators, but platform availability has decoupled from mid-market adoption; success now depends on solving implementation and quality barriers rather than feature development.
  • 2026-Jan: Platform vendor consolidation continues with Microsoft, Talkdesk, and major CCaaS providers confirming GA summarization capabilities (January-end). Practitioner analysis shifts focus from capability availability to implementation economics and accuracy validation: documented evidence shows successful deployments require structured validation workflows (93-96% accuracy vs 63-89% raw AI), economic analysis reveals $200-500 per ticket context reconstruction costs with targeted solutions achieving 20-40% improvement, and technical failure modes (diarization drops 29 points on hybrid calls, jargon blindness, conditional logic omission) remain unresolved in out-of-box deployments. Early-adopter case studies continue to report 25-40% handle time gains, but analysis reveals these depend on customer-specific tuning rather than platform maturity. Practice tier stable at good-practice; mid-market adoption blocked by economic validation requirements and accuracy-tuning friction rather than feature gaps.
  • 2026-Feb: Vendor feature standardization and transparency reach new maturity. Microsoft extends Copilot with row summarization capability in Customer Service (Feb 25, 2026); Webex adds AI-enhanced post-call and mid-call summaries with 24-hour API access (Feb 17); CloudTalk updates product with AI tagging and CRM auto-entry (Feb 27); Dialpad maintains GA for AI Call Summary with sentiment and category support. Critically, Microsoft publishes official Azure AI documentation (Feb 28, 2026) explicitly detailing summarization quality limitations: dialectal variance causing degradation, abstract hallucination risks, poor performance on under-represented conversation types—marking shift from marketing claims to vendor acknowledgment of deployment barriers. Industry metrics from Thunai (Feb 12, 2026) document 60%+ contact center adoption of AI summarization tools with 35% operational cost reduction claims, confirming ecosystem momentum. However, adoption metrics reflect feature deployment rather than ROI realization; the documented validation and tuning requirements from Q1 2026 remain binding constraints. Practice consolidates at good-practice: universal platform GA status coexists with explicit vendor documentation of reliability limitations and persistent deployment friction that separate capability availability from organizational adoption at scale.
  • 2026-Q1 (Mar-Apr): Deployment evidence validates scale and ROI with vendor feature consolidation. AWS Contact Lens (Mar 31) confirms conversational analytics GA with three named customer deployments: Neo Financial (90-second ACW savings per call, 40 hours/month leadership efficiency); Fujitsu (60% QA automation efficiency); Frontdoor (50x sampling increase). Microsoft Dynamics 365 Contact Center (Mar 18) confirms 2026 Wave 1 release with one-click case summaries across chat, email, and notes. Amazon Connect Health (Mar 5) case study: UC San Diego Health deployment with quantified clinical note summarization benefits. Simultaneously, critical quality limitation research documents systematic hallucination and bias risks: SupportLogic production framework reveals 94% to 73% quality variance across models; Suprmind's Vectara HHEM leaderboard benchmarks all 20 major LLMs showing measurable hallucination rates. Practitioner evidence (InflectionCX operator assessment) documents Contact Lens implementation reality: 2-5s latency, manual vocabulary configuration, pattern-matching limitations, and a steep implementation barrier. Real deployment case studies (Utilita: 35-second ACW reduction with Verint; UC San Diego Health) confirm ROI is achievable but requires structured validation workflows and customer-specific tuning. Ecosystem pattern: feature parity complete across AWS, Microsoft, ServiceNow, and secondary vendors; adoption friction has shifted definitively from whether the technology works to whether organizations can cost-justify the tuning and validation burden—a question that remains unsolved for the mid-market. Practice tier stable at good-practice; near-term growth blocked by implementation economics and quality validation requirements rather than capability gaps.
  • 2026-May: Vendor differentiation intensifies with domain-specific optimization and governance frameworks. Deepgram releases domain-specific language model for contact center summarization, fine-tuned on 200K conversations, signaling specialized vendor optimization (Apr 27). Cisco Webex adds multi-scenario AI summarization (dropped calls, transfers, consults) as GA feature (Apr 27). Independent third-party testing (Brilo, Apr 29) evaluates 10 platforms across 400+ real test calls, confirming broad vendor ecosystem maturity but revealing quality variance across implementations. Adoption gap persists as documented barrier: European bank case study (Apr 30) shows only 26% of 47,000 quarterly calls captured in CRM summaries, with 35,000 unanalyzed calls containing 2,800 upsell signals and 340 compliance gaps; UK contact center analysis (Apr 20) documents 67% record 100% of calls but 90% lack time/capability to analyze—revealing analysis bottleneck as adoption constraint. Agent trust barriers documented: UJET survey (Apr 22) shows 93% of agents feel need to double-check AI outputs before customer use despite ~70% crediting AI with ACW reduction, indicating quality reliability concerns persist. Governance frameworks emerge as binding requirement: NIST AI 600-1 (Apr 24) establishes pre-deployment testing and compliance requirements directly applicable to regulated summarization deployments. Economic analysis (Apr 20) documents 86/100 viability score for summarization + CRM automation with 3.5 FTE capacity recovery and £4,200 implementation cost. Practice consolidates at good-practice tier: vendors delivering full feature parity, deployments proving ROI, but adoption remains constrained by implementation economics, quality validation requirements, and organizational change barriers rather than technology capability.

TOOLS