The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail
AI that provides real-time spoken translation during customer support calls, enabling cross-language support. Includes live interpreter replacement and bidirectional voice translation; distinct from content localisation which translates pre-written materials rather than live speech.
Real-time voice translation in support calls promises to decouple agent hiring from language requirements, letting monolingual contact centers serve global customers. The premise is compelling: translate speech live, eliminate interpreter costs, and staff for skill rather than fluency. A handful of BPOs and enterprise platforms have pushed this into production, and vendor tooling now spans the major cloud providers. Yet the practice remains firmly experimental. Independent benchmarks consistently show AI translation accuracy between 60-85% against a 95%+ human baseline, with hallucination rates of 33-60% and cultural mistranslation rates around 40% — gaps that confine deployments to lower-stakes, cost-sensitive scenarios. Cloud infrastructure compounds the problem: production latency spikes measured in seconds, not milliseconds, undercut the "real-time" proposition. The defining tension is not whether the technology works in demos — it does — but whether it can sustain the accuracy and responsiveness that live customer conversations demand. Production deployments at scale (Alorica hospitality, T-Mobile network-level) demonstrate real-world viability and ROI, but adoption remains minority-level (17% of enterprises) with 83% still relying on manual or traditional workflows. For most contact center operators, this is a space to watch and pilot, not to bet operations on.
Vendor activity is accelerating faster than deployment maturity. DeepL, Krisp, Microsoft, AWS, and TTEC now offer production-grade real-time translation APIs for contact centers. March 2026 independent benchmark testing (Slator) shows DeepL Voice achieving 79% fully-correct translated segments versus 42% for competitors, with 96% linguist preference and 88.6/100 stability scores—demonstrating that vendor-promoted quality parity is now measurable in real-world production settings. Major infrastructure providers are investing: T-Mobile deployed real-time translation at the network level across 50+ languages, and Amazon Connect S2S now GA in Seoul with Korean support, signaling infrastructure-level adoption investment. The Fora Soft 2026 vendor landscape identifies 21 competing platforms across five technology layers, with managed "build kit" approaches (wrapping open-source + commercial services) winning enterprise contracts at 40% lower cost than proprietary SaaS while delivering 70% of SaaS speed.
Yet adoption lags vendor proliferation. A March 2026 DeepL survey found only 17% of enterprises deployed next-generation AI translation tools, meaning 83% still rely on manual workflows or traditional systems despite increased investment. Customer support is the 23% primary adoption driver, but this remains a niche segment. Mordor Intelligence sizes the machine translation market at $2.74B (2026) growing to $5.58B (2030) at 19.5% CAGR, with real-time speech translation explicitly identified as a major trend. These numbers are substantial but indicate a category still in early-stage growth.
The barriers persist and are now better understood. Analyst assessment (Slator, March 2026) documents that AI translation fundamentally struggles with hallucination (33-60% rates depending on model), language confusion, idioms, and cultural nuances. Peer-reviewed synthesis (50+ studies) confirms AI achieves only 85-90% of human translator quality for high-resource language pairs (English-French) and 70-80% for distant pairs (English-Chinese). Technical barriers remain acute: background noise reduces accuracy by 40% in support-call environments; cloud infrastructure introduces 2-6 second latency in cascaded STT-translation-TTS pipelines; language-specific accuracy degradation (Mandarin transcription 5x worse than English per Sierra's 2026 benchmark; Vietnamese and Somali deployment barriers documented in peer-reviewed studies) constrains multilingual deployment. Production failures are now documented: Microsoft Azure Speech Service systematically filters code-switched English terms from Cantonese, and Retell AI voice agents fail language switching in Indian language scenarios despite explicit user requests—with financial impact reported by production users. These failures demonstrate that despite feature availability, current commercial tooling has not reached production-grade reliability for real-world multilingual translation at scale. Production-scale deployments (Alorica hospitality reporting 97% accuracy, 117% conversion growth) demonstrate viability in specific cost-sensitive contexts, but lack independent validation. Pocketalk's critical assessment (April 2026) notes that consumer and early-stage translation tools are insufficient for high-stakes support due to latency, accuracy, and compliance gaps. Liability exposure in regulated industries (healthcare, legal) remains unresolved. Until accuracy, latency, and code-switching gaps narrow, and production reliability improves, deployments will remain confined to low-stakes, cost-sensitive multilingual routing.
— Production failure: Retell AI voice agent fails language switching, exhibits accent errors (e.g. 'Splash' → 'supplash'), and outputs non-selected languages despite configuration; user reports direct revenue impact, exposing maturity gaps.
— Critical assessment: consumer translation tech (Apple AirPods, Meta smart glasses, T-Mobile) insufficient for high-stakes support—lacks latency, accuracy, compliance, and centralized control required for regulated industries and support escalations.
— Foundational architecture guide for cascaded S2S pipelines covers latency budgeting, streaming setup, language pair trade-offs, and production scaling; demonstrates TTS optimization reducing latency from 1.04 RTF to sub-500ms.
— Deepgram-AWS partnership integrates advanced STT (30% WER improvement in noisy/accented speech) with Amazon Connect for real-time transcription; demonstrates ecosystem maturation and vendor specialization in contact center translation.
— Comprehensive market analysis shows $3.8B market at 28% CAGR through 2030; production standard: sub-900ms latency, <12% WER, $0.05–$0.20/min; 21-vendor comparison reveals 'build kit' approach winning enterprise deals at 40% lower cost.
— Systematic evaluation of streaming TTS text normalization (dates, numbers, currencies) on 1000+ sentences reveals Async Flash v1.0 achieves 81.2% sentence-level accuracy while competitors drop to 67.8%, exposing support-call quality gaps.
— Amazon Connect S2S real-time translation now GA in Seoul with Korean; demonstrates vendor platform expansion into new geographic/language markets and production-grade deployment in regional contact centers.
— Open-source multilingual benchmark on 250 real customer service calls across 5 languages and 5 ASR providers shows no vendor wins everywhere; Mandarin accuracy 5x worse than English, highlighting deployment barriers.