The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail
AI that provides real-time spoken translation during customer support calls, enabling cross-language support. Includes live interpreter replacement and bidirectional voice translation; distinct from content localisation which translates pre-written materials rather than live speech.
Real-time voice translation in support calls promises to decouple agent hiring from language requirements, letting monolingual contact centers serve global customers. The premise is compelling: translate speech live, eliminate interpreter costs, and staff for skill rather than fluency. Vendor consolidation accelerated through June 2026 with simultaneous GA launches from Google (Gemini 3.5 Live Translate, June 9), Krisp (Voice Translation API), and competitive offerings from Gradium, DeepL, OpenAI, and Microsoft solidifying real-time translation as standard enterprise platform feature. Yet the practice remains firmly experimental. Independent benchmarks consistently show AI translation accuracy between 60-85% against a 95%+ human baseline, with hallucination rates of 33-60% and cultural mistranslation rates around 40%—gaps that confine deployments to lower-stakes, cost-sensitive scenarios. Production deployment barriers have hardened: code-switching (70% of Indian contact centers naturally code-switch to Hinglish) fails in traditional ASR pipelines; acoustic artifacts from poor microphones and overlapping speech cause 63% more failures than noise alone; and entity accuracy gaps (16.7-25.5% miss rates) expose risks in regulated industries. Cloud infrastructure compounds the problem: latency varies by architecture—cascaded pipelines (STT→translate→TTS) spike to 2-6 seconds, while new end-to-end models drop to 2.9-3.6 seconds, narrowing but not eliminating the gap. The defining tension is not whether the technology works in demos—it does—but whether it can sustain the accuracy, compliance, and responsiveness that live customer conversations demand. Production deployments at scale (Alorica hospitality and healthcare, Krisp healthcare 90% resolution without interpreters, OpenAI early customers including Zillow 69%→95% call success, Google Grab pilots with 10M calls/month) demonstrate real-world viability and ROI, but adoption remains minority-level (17% of enterprises) with 83% still relying on manual or traditional workflows. For most contact center operators, this is a space to watch and pilot, not to bet operations on.
Vendor ecosystem reached critical mass with simultaneous enterprise GA launches in June 2026. Google released Gemini 3.5 Live Translate (June 9) with audio-to-audio architecture removing cascaded STT-LLM-TTS pipeline failures, supporting 70+ languages, SynthID watermarking for EU AI Act compliance, and enterprise rollout via Google Meet H2 2026. Krisp's Voice Translation API achieved GA with 61 languages, 1M+ minutes of production translation, 96% accuracy in healthcare contexts, and full SOC 2/GDPR/HIPAA/PCI-DSS compliance—zero patient safety incidents across 8 languages. Gradium entered the market (June 2026) with new stt-translate and s2s-translate models achieving 3.0s latency with competitive accuracy benchmarks across 5 languages, collapsing traditional three-stage pipelines to two-pass architecture. OpenAI gpt-realtime-translate confirmed at 4.53/5 fidelity (120 utterances) with Deutsche Telekom and Vimeo in production. Established vendors—Microsoft (Dynamics 365 Contact Center May GA), AWS, DeepL (Voice API GA Feb 2026), and TTEC—solidified real-time translation as standard platform feature. Independent benchmarking confirms measurable quality gaps: Slator March 2026 showed DeepL Voice at 79% fully-correct translation segments (vs 42% for competitors), while latency architecture comparisons document cascaded systems at 800ms-2s vs emerging end-to-end models at 2.9-3.6s. The vendor ecosystem now includes 21+ competing platforms with diverse cost structures ($0.04–$1.25/minute) and deployment models, though vendor claims on language coverage ("50+ languages") frequently exceed production-quality support—marketing language count masks real-world mid-conversation switching failures and quality degradation outside English.
Production-scale real-world deployments demonstrate viability in specific use cases. Krisp's named healthcare provider achieves 90% end-to-end multilingual call handling without interpreter intervention with zero patient safety incidents (June 2026). OpenAI's early customers show measurable improvements: Vienna boutique hotel (direct-booking conversion +38%), DACH e-commerce (time-to-market compression 9 months→3 weeks), Zillow (26-point call-success-rate improvement: 69%→95%). Google is piloting Gemini Live Translate with Grab (10M voice calls monthly) in high-noise Southeast Asia (Thai, Vietnamese, Bahasa Indonesia, Tagalog), testing production viability in genuinely challenging acoustic environments. Alorica ReVoLT serves Fortune 25 healthcare organizations with 75+ languages (Everest Group validation). eesel.ai documented three production deployments: German jewelry (1K tickets/month), Spanish insurance (564 conversations/48 hours), German lending (100K+ tickets/month), all fully automated at scale. Yet ecosystem adoption remains constrained: 17% of enterprises deployed next-generation AI translation as of March 2026, with customer support the 23% primary driver—indicating market remains early-stage despite vendor product maturity and competitive pricing compression. Mordor Intelligence sizes the market at $2.74B (2026) growing to $5.58B (2030) at 19.5% CAGR.
Critical deployment barriers remain hardened despite vendor architectural progress. Code-switching is quantified as systemic blocker: ServiceNow's June 2026 SWER benchmark documents hallucination and phonetic errors across frontier ASR models on Hinglish and Spanglish, with 30-50% WER degradation on mixed-language utterances (70% of Indian contact center users naturally code-switch). India-specific ASR shows 94-96% accuracy on Indian English but drops to 88-92% on Hinglish, a 15-20 point delta that determines deployment viability in South Asian markets. Acoustic failures in real-world support calls create cascading accuracy loss: background voices cause 63% more failures than noise alone; elderly care deployments show 78% failure from TV audio interference. Entity accuracy gaps remain substantial (16.7-25.5% miss rates per AssemblyAI and Deepgram), particularly problematic in healthcare and financial support where terminology precision is non-negotiable. Enterprise compliance remains adoption barrier: 84% of organizations failed pre-deployment AI compliance audits, citing SOC 2, GDPR, BAA certification gaps. Platform-specific failures documented: Azure Speech Service systematically filters code-switched English from Cantonese transcription; Retell AI voice agents fail mid-call language switching with documented revenue impact. Infrastructure barriers persist: AudioCodes reports most enterprise voice AI projects stall not from AI failure but from telephony integration complexity (SIP/VoIP inconsistencies), scale bottlenecks at hundreds of concurrent sessions, vendor lock-in risk, and transcription accuracy as the binding constraint rather than translation quality. Latency trade-offs remain: cascaded pipelines (STT→translate→TTS) achieve 2-6 second round-trip latency; end-to-end architectures (Gemini 3.5, Gradium S2S) improve to 2.9-3.6 seconds but still exceed conversational naturalness thresholds (<800ms). These constraints mean that while the technology works in bounded use cases (monolingual to single-target-language high-volume support in non-critical contexts), real-time translation remains confined to cost-sensitive, non-regulated support routing until code-switching, compliance, latency, and acoustic robustness barriers narrow substantially.
— Production patterns for contact centers: 270ms real-time transcription latency, ASR accuracy (29% WER improvement in accented speech) as binding constraint, BPO cost-benefit analysis proving offshore staffing ROI only with correct audio infrastructure.
— Technical deep-dive on production-ready real-time S2S translation: Gradium 3.0s, OpenAI 3.6s, Gemini 2.9s latency benchmarks; documents architecture shift from cascaded to two-pass reducing latency to conversational range for Spanish support.
— New vendor real-time S2S models (3.0s latency, competitive BLEU/MetricX accuracy across 5 langs) demonstrating collapsing three-stage pipelines to two-pass architecture; explicit support-call use case with WebSocket delivery.
— Production deployment guide for India with code-switching handling (Hinglish): 94-96% ASR accuracy on Indian English, 88-92% on Hinglish, 15-20 point delta vs global models; 10 languages with <800ms latency requirement in regulated sectors.
— Implementation guide comparing cascaded (800ms-2s) vs end-to-end S2S (<700ms) architectures; real support examples (Spanish/English real estate, Vietnamese healthcare LEP triaging); WebRTC architecture with deployment across 6 verticals.
— Real-time translation for human agent assist in travel support: bidirectional AI translation enabling monolingual agents to cover dozens of languages, solving multilingual staffing problem with sub-second latency constraint requirement.
— Three production deployments: German jewelry (1K tickets/month), Spanish insurance (564 calls/48hrs), German lending (100K+ tickets/month), all fully automated with multilingual support proving production viability at scale.
— Platforms compared on auto mid-conversation language switching (most vendors fail), code-switching handling, quality parity across languages: identifies language-count marketing gap and switching as vendor-breaking technical barrier in production.