Perly Consulting │ Beck Eco

The State of Play

A living index of AI adoption across industries — where established practice meets the bleeding edge
UPDATED DAILY

The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.

The Daily Dispatch

A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.

AI Maturity by Domain

Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail

DOMAIN
BLEEDING EDGEESTABLISHED

Voice AI — real-time translation in support calls

BLEEDING EDGE

TRAJECTORY

Advancing

AI that provides real-time spoken translation during customer support calls, enabling cross-language support. Includes live interpreter replacement and bidirectional voice translation; distinct from content localisation which translates pre-written materials rather than live speech.

OVERVIEW

Real-time voice translation in support calls promises to decouple agent hiring from language requirements, letting monolingual contact centers serve global customers. The premise is compelling: translate speech live, eliminate interpreter costs, and staff for skill rather than fluency. Vendor consolidation accelerated through June 2026 with simultaneous GA launches from Google (Gemini 3.5 Live Translate, June 9), Krisp (Voice Translation API), and competitive offerings from Gradium, DeepL, OpenAI, and Microsoft solidifying real-time translation as standard enterprise platform feature. Yet the practice remains firmly experimental. Independent benchmarks consistently show AI translation accuracy between 60-85% against a 95%+ human baseline, with hallucination rates of 33-60% and cultural mistranslation rates around 40%—gaps that confine deployments to lower-stakes, cost-sensitive scenarios. Production deployment barriers have hardened: code-switching (70% of Indian contact centers naturally code-switch to Hinglish) fails in traditional ASR pipelines; acoustic artifacts from poor microphones and overlapping speech cause 63% more failures than noise alone; and entity accuracy gaps (16.7-25.5% miss rates) expose risks in regulated industries. Cloud infrastructure compounds the problem: latency varies by architecture—cascaded pipelines (STT→translate→TTS) spike to 2-6 seconds, while new end-to-end models drop to 2.9-3.6 seconds, narrowing but not eliminating the gap. The defining tension is not whether the technology works in demos—it does—but whether it can sustain the accuracy, compliance, and responsiveness that live customer conversations demand. Production deployments at scale (Alorica hospitality and healthcare, Krisp healthcare 90% resolution without interpreters, OpenAI early customers including Zillow 69%→95% call success, Google Grab pilots with 10M calls/month) demonstrate real-world viability and ROI, but adoption remains minority-level (17% of enterprises) with 83% still relying on manual or traditional workflows. For most contact center operators, this is a space to watch and pilot, not to bet operations on.

CURRENT LANDSCAPE

Vendor ecosystem reached critical mass with simultaneous enterprise GA launches in June 2026. Google released Gemini 3.5 Live Translate (June 9) with audio-to-audio architecture removing cascaded STT-LLM-TTS pipeline failures, supporting 70+ languages, SynthID watermarking for EU AI Act compliance, and enterprise rollout via Google Meet H2 2026. Krisp's Voice Translation API achieved GA with 61 languages, 1M+ minutes of production translation, 96% accuracy in healthcare contexts, and full SOC 2/GDPR/HIPAA/PCI-DSS compliance—zero patient safety incidents across 8 languages. Gradium entered the market (June 2026) with new stt-translate and s2s-translate models achieving 3.0s latency with competitive accuracy benchmarks across 5 languages, collapsing traditional three-stage pipelines to two-pass architecture. OpenAI gpt-realtime-translate confirmed at 4.53/5 fidelity (120 utterances) with Deutsche Telekom and Vimeo in production. Established vendors—Microsoft (Dynamics 365 Contact Center May GA), AWS, DeepL (Voice API GA Feb 2026), and TTEC—solidified real-time translation as standard platform feature. Independent benchmarking confirms measurable quality gaps: Slator March 2026 showed DeepL Voice at 79% fully-correct translation segments (vs 42% for competitors), while latency architecture comparisons document cascaded systems at 800ms-2s vs emerging end-to-end models at 2.9-3.6s. The vendor ecosystem now includes 21+ competing platforms with diverse cost structures ($0.04–$1.25/minute) and deployment models, though vendor claims on language coverage ("50+ languages") frequently exceed production-quality support—marketing language count masks real-world mid-conversation switching failures and quality degradation outside English.

Production-scale real-world deployments demonstrate viability in specific use cases. Krisp's named healthcare provider achieves 90% end-to-end multilingual call handling without interpreter intervention with zero patient safety incidents (June 2026). OpenAI's early customers show measurable improvements: Vienna boutique hotel (direct-booking conversion +38%), DACH e-commerce (time-to-market compression 9 months→3 weeks), Zillow (26-point call-success-rate improvement: 69%→95%). Google is piloting Gemini Live Translate with Grab (10M voice calls monthly) in high-noise Southeast Asia (Thai, Vietnamese, Bahasa Indonesia, Tagalog), testing production viability in genuinely challenging acoustic environments. Alorica ReVoLT serves Fortune 25 healthcare organizations with 75+ languages (Everest Group validation). eesel.ai documented three production deployments: German jewelry (1K tickets/month), Spanish insurance (564 conversations/48 hours), German lending (100K+ tickets/month), all fully automated at scale. Yet ecosystem adoption remains constrained: 17% of enterprises deployed next-generation AI translation as of March 2026, with customer support the 23% primary driver—indicating market remains early-stage despite vendor product maturity and competitive pricing compression. Mordor Intelligence sizes the market at $2.74B (2026) growing to $5.58B (2030) at 19.5% CAGR.

Critical deployment barriers remain hardened despite vendor architectural progress. Code-switching is quantified as systemic blocker: ServiceNow's June 2026 SWER benchmark documents hallucination and phonetic errors across frontier ASR models on Hinglish and Spanglish, with 30-50% WER degradation on mixed-language utterances (70% of Indian contact center users naturally code-switch). India-specific ASR shows 94-96% accuracy on Indian English but drops to 88-92% on Hinglish, a 15-20 point delta that determines deployment viability in South Asian markets. Acoustic failures in real-world support calls create cascading accuracy loss: background voices cause 63% more failures than noise alone; elderly care deployments show 78% failure from TV audio interference. Entity accuracy gaps remain substantial (16.7-25.5% miss rates per AssemblyAI and Deepgram), particularly problematic in healthcare and financial support where terminology precision is non-negotiable. Enterprise compliance remains adoption barrier: 84% of organizations failed pre-deployment AI compliance audits, citing SOC 2, GDPR, BAA certification gaps. Platform-specific failures documented: Azure Speech Service systematically filters code-switched English from Cantonese transcription; Retell AI voice agents fail mid-call language switching with documented revenue impact. Infrastructure barriers persist: AudioCodes reports most enterprise voice AI projects stall not from AI failure but from telephony integration complexity (SIP/VoIP inconsistencies), scale bottlenecks at hundreds of concurrent sessions, vendor lock-in risk, and transcription accuracy as the binding constraint rather than translation quality. Latency trade-offs remain: cascaded pipelines (STT→translate→TTS) achieve 2-6 second round-trip latency; end-to-end architectures (Gemini 3.5, Gradium S2S) improve to 2.9-3.6 seconds but still exceed conversational naturalness thresholds (<800ms). These constraints mean that while the technology works in bounded use cases (monolingual to single-target-language high-volume support in non-critical contexts), real-time translation remains confined to cost-sensitive, non-regulated support routing until code-switching, compliance, latency, and acoustic robustness barriers narrow substantially.

TIER HISTORY

ResearchJan-2024 → Jan-2024
Bleeding EdgeJan-2024 → present

EVIDENCE (109)

— Production patterns for contact centers: 270ms real-time transcription latency, ASR accuracy (29% WER improvement in accented speech) as binding constraint, BPO cost-benefit analysis proving offshore staffing ROI only with correct audio infrastructure.

— Technical deep-dive on production-ready real-time S2S translation: Gradium 3.0s, OpenAI 3.6s, Gemini 2.9s latency benchmarks; documents architecture shift from cascaded to two-pass reducing latency to conversational range for Spanish support.

— New vendor real-time S2S models (3.0s latency, competitive BLEU/MetricX accuracy across 5 langs) demonstrating collapsing three-stage pipelines to two-pass architecture; explicit support-call use case with WebSocket delivery.

— Production deployment guide for India with code-switching handling (Hinglish): 94-96% ASR accuracy on Indian English, 88-92% on Hinglish, 15-20 point delta vs global models; 10 languages with <800ms latency requirement in regulated sectors.

— Implementation guide comparing cascaded (800ms-2s) vs end-to-end S2S (<700ms) architectures; real support examples (Spanish/English real estate, Vietnamese healthcare LEP triaging); WebRTC architecture with deployment across 6 verticals.

— Real-time translation for human agent assist in travel support: bidirectional AI translation enabling monolingual agents to cover dozens of languages, solving multilingual staffing problem with sub-second latency constraint requirement.

— Three production deployments: German jewelry (1K tickets/month), Spanish insurance (564 calls/48hrs), German lending (100K+ tickets/month), all fully automated with multilingual support proving production viability at scale.

— Platforms compared on auto mid-conversation language switching (most vendors fail), code-switching handling, quality parity across languages: identifies language-count marketing gap and switching as vendor-breaking technical barrier in production.

HISTORY

  • 2024-Q1: Cloud platforms AWS and Microsoft release real-time translation features for contact centers; major BPO Alorica launches ReVoLT platform with 75-language support and pilot deployments. Technical barriers remain around latency, dialect handling, and translation accuracy for customer support contexts.
  • 2024-Q2: Alorica reports production scale (1M+ minutes translated, 97% fluency, 50% claimed cost reduction). AWS publishes caching optimization guides for real-time systems. NTT advances voice conversion research. Cloud reliability issues surface (Azure TTS latency spikes to 24s). Practitioner guides specify <500ms latency targets and interrupt handling as critical.
  • 2024-Q4: Vendor ecosystem consolidates around real-time translation as core contact center capability. DeepL launches Voice API (Nov), Krisp launches AI Live Interpreter (Dec), AWS releases V2V samples (Dec). Technavio forecasts $217.2M market by 2028 at 8.9% CAGR with contact centers as major drivers. Independent testing and industry analysis document persistent accuracy and latency barriers; customer preference for native-speaking agents remains unresolved.
  • 2025-Q1: AWS and DXC prototype hybrid V2V architecture for Amazon Connect, documenting latency/naturalness trade-offs. Alorica reports production hospitality deployment with 97% accuracy, 117% conversion uplift, and 34% revenue-per-call improvement. Research papers and critical analyses highlight terminological chaos in real-time speech translation research and persistent limitations (context understanding, mistranslation failure modes). Market analysts forecast $1.09B market by 2034 at 9.5% CAGR. Translation quality and latency remain key barriers despite vendor consolidation and production ROI evidence.
  • 2025-Q2: Vendor ecosystem expands with competitive offerings—Alorica confirms ReVoLT scaling (75 languages, 200 dialects), TTEC launches Addi tool (30+ languages, sub-second latency, >80% interpreter cost reduction). Multiple vendors now claim measurable cost reduction at scale. Architectural patterns and reference deployments (AWS-DXC prototype) mature. Yet research gaps persist: February 2025 analysis of 110 papers reveals simplifying assumptions in speech translation research; practitioner assessments document context/emotion understanding failures and continued need for hybrid AI-human workflows. Latency and customer preference for native speakers remain adoption blockers despite vendor progress.
  • 2025-Q3: Enterprise platforms integrate native real-time translation—Microsoft Dynamics 365 Contact Center ships enhanced real-time translation (GA October 2025) with language profiles and agent controls. Amazon Connect reference architectures continue maturing. However, deployment barriers solidify: peer-reviewed research finds AI translation consistently inferior for Chinese, Vietnamese, Somali; Azure Speech Services reports production latency (12s round-trip); independent quality analysis documents 60-85% AI accuracy vs. 95%+ human baseline and 40% cultural mistranslation rates. Vendor cost-reduction claims persist (TTEC Addi, Alorica metrics) but lack independent validation. Language-specific accuracy gaps and cloud service latency constraints deployment to non-critical support contexts.
  • 2025-Q4: Vendor ecosystem expands with specialized offerings—Deepgram announces low-latency STT/TTS for Amazon Connect (December); Google reveals S2S translation architecture achieving ~2-second latency in production (November). Enterprise platforms solidify real-time translation as standard feature across Microsoft, AWS, DeepL, and TTEC. However, adoption barriers deepen: MIT study documents 95% of enterprise AI deployments deliver no measurable ROI (December 2025), indicating systemic integration challenges. Technical barriers persist (cloud latency, language-specific accuracy gaps, cultural mistranslation). Market forecasts project $1.09B market by 2034, yet practitioner guidance emphasizes persistent fragmentation and gaps between vendor claims and real-world deployment consistency. Real-time translation has matured from bleeding-edge to standardized platform feature, but deployment remains constrained to cost-sensitive multilingual support rather than enabling mainstream adoption.
  • 2026-Jan: DeepL launches Voice API with real-time transcription and full V2V translation (GA February 2026), intensifying vendor competition in contact center segment. Market intelligence projects $762M market size (2026) growing to $1.25B by 2031 at sustained 10%+ CAGR; language service integrators report 41% evaluating and 30% actively using AI interpreting. Translation quality variance documented: benchmark testing shows DeepL leading European languages (BLEU 62.8 Spanish) while LLMs competitive for Asian languages (54.1 Chinese), highlighting ongoing language-specific accuracy gaps. Critical deployment barriers remain: user reports document Azure OpenAI production latency issues, and practitioner analysis warns of high-stakes liability risks in legal/healthcare contexts due to specialized terminology and cultural mistranslation failures. Real-time translation ecosystem matured with new vendor offerings and sustained market growth, yet fundamental ROI validation and language-specific quality gaps continue constraining deployment beyond cost-sensitive multilingual operations.
  • 2026-Feb: Krisp releases Voice Translation SDK (February 2026) with 60+ language support, custom vocabulary, and domain-specific dictionaries, validated in six months of production CX deployments; demonstrates maturation of developer-accessible translation APIs. Enterprise platforms (Microsoft, AWS, DeepL, TTEC) solidify real-time translation as standard feature. Market and LSI adoption signals sustained. Infrastructure and language-specific quality barriers continue constraining deployment to cost-sensitive contexts.
  • 2026-Apr: Independent Slator benchmark confirms DeepL Voice leading competitors on quality (79% fully-correct segments vs. 42%) and T-Mobile deploys real-time translation at network level across 50+ languages, marking telecommunications-grade infrastructure adoption. Despite vendor quality improvements, enterprise adoption remains at only 17% for next-generation AI translation tools; Slator documents persistent hallucination rates (33-60%) and a comprehensive tool survey quantifies support-call-specific barriers — 40% accuracy reduction from background noise and 2-6 second pipeline latency. Machine translation market projected at $2.74B (2026) growing to $5.58B (2030) at 19.5% CAGR.
  • 2026-May: Major vendor consolidation: OpenAI launches gpt-realtime-translate (May 7) with 70+ language native audio support, GPT-5-class reasoning, and 96.6% audio benchmark performance; DeepL and Microsoft solidify platform integration with May GA releases; Fora Soft's independent comparison of four leading systems benchmarks latency below 800ms as the UX threshold and costs spanning $0.04–$1.25/minute. Everest Group's 2026 analyst report validates Alorica ReVoLT at Fortune 25 healthcare scale. Production barriers have hardened across three dimensions: code-switching failures (30-50% WER degradation in mixed-language utterances; 70% of Indian contact center users naturally code-switch to Hinglish), acoustic failure modes (background voices cause 63% more failures than noise alone, with 78% failure rate in elderly care from TV audio interference), and entity accuracy gaps (AssemblyAI documents 16.7% miss rate; Deepgram 25.5%). Enterprise compliance audit data shows 84% pre-deployment failure rates; regulated industry failure case (hospital hidden AI translation) reveals governance gaps that triggered after 47 documents over 4 months. Deepgram-AWS partnership expands STT with 30% WER improvement in noisy/accented speech. Fora Soft's 21-vendor analysis sizes the market at $3.8B growing 28% CAGR through 2030, with build-kit approaches winning enterprise deals at 40% cost reduction.
  • 2026-Jun: Simultaneous GA launches from Google (Gemini 3.5 Live Translate, June 9: 70+ language audio-to-audio model with SynthID watermarking, enterprise rollout via Google Meet H2 2026) and Krisp (Voice Translation API: 61 languages, 1M+ minutes in production, 96% accuracy in healthcare, full SOC 2/GDPR/HIPAA/PCI-DSS compliance) signaled maturation of end-to-end audio architectures bypassing cascaded pipeline failures. Gradium launched stt-translate and s2s-translate (June 24) achieving 3.0s latency versus OpenAI 3.6s and Gemini 2.9s in independent benchmarks, confirming architectural shift from three-stage to two-pass pipelines. Production deployments documented at scale: eesel.ai (German jewelry 1K/month, Spanish insurance 564 calls/48hrs, German lending 100K+ tickets/month), Parloa (travel: monolingual agents covering dozens of languages via bidirectional AI), and Caller Digital's India guide confirming 94-96% ASR on Indian English, 88-92% on Hinglish with sub-800ms latency requirements in regulated sectors. OpenAI gpt-realtime-translate confirmed at 4.53/5 fidelity (120-utterance benchmark) with Deutsche Telekom and Vimeo in named production. Gladia documented 270ms real-time transcription latency and 29% WER improvement in accented speech as binding constraint for BPO cost-benefit cases. Persistent structural barriers: ServiceNow's SWER benchmark quantified code-switching as a systemic frontier-model failure (hallucination and phonetic errors in Hinglish and Spanglish); most platforms fail mid-conversation language switching (Lorikeet); 84% pre-deployment compliance audit failure rate persists; AudioCodes confirmed enterprise projects stall primarily from SIP integration complexity and scale bottlenecks rather than translation quality.

TOOLS