Text-to-speech — voice cloning & custom voices — Creative & Generative Media

The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.

AI Maturity by Domain

Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail

DOMAIN

BLEEDING EDGEESTABLISHED

Text-to-speech — voice cloning & custom voices

LEADING EDGE

TRAJECTORY↑ Advancing

AI that clones specific voices or creates custom synthetic voices for branded content and personalisation. Includes few-shot voice cloning and brand voice creation; distinct from natural TTS which uses standard rather than replicated voices.

OVERVIEW

Voice cloning has solved the technology problem and run straight into a trust problem. Forward-leaning organisations — from Disney+ productions to ALS clinical programmes — are deploying cloned voices in production at scale. Technology maturity is proven: synthesis quality plateaued at near-human indistinguishability (98.2% perceptual match), Synthesia's production infrastructure handles consumer-grade input, and ElevenLabs achieved 67% Fortune 500 penetration with 340% YoY growth in deployments. Yet broader adoption stalls on interlocking barriers that technology cannot fix: users prefer human voices in high-stakes contexts (68%), emotional authenticity lags acoustic fidelity, and a critical asymmetry has emerged—clones achieve 97% fidelity but humans detect only 37.5%, enabling fraud at scale. By April 2026, enterprise ecosystem maturity advanced with Google Cloud moving custom voice synthesis to general availability with voice actor consent frameworks; simultaneously, fraud escalated dramatically with 60% of US companies reporting voice cloning attacks and documented $25M+ CFO impersonation cases. Regulatory frameworks are consolidating globally: Japan established Ministry of Justice expert panels for voice property protection, signaling regulatory maturation beyond Tennessee ELVIS Act. The defining tension at this tier is that commercial viability for vanguard adopters is proven and expanding, while the consent, fraud detection, and trust infrastructure needed for mainstream rollout does not yet exist—and the technology is now mature enough to be operationalized for unauthorized uses. Most mainstream organisations have not deployed; those building are now facing production-stage reliability constraints, fraud risk, and regulatory liability rather than technical capability limits.

CURRENT LANDSCAPE

ElevenLabs dominates the vendor landscape with roughly $330M ARR (April 2026) and 250,000 conversational agents in production, achieving 67% Fortune 500 adoption with 340% year-over-year deployment growth. Market expansion now extends beyond North America: ElevenLabs India expansion shows hundreds of enterprise customers and tens of millions in revenue with Meesho alone deploying 60,000 calls/day. Resemble AI and WellSaid Labs trail but serve regulated industries. Respeecher has carved out entertainment niche (Disney+, National Geographic). Global voice AI market reached $22.5B in 2026, with TTS sub-segment valued at $4.25B (2025, growing 15.9% CAGR). These numbers now reflect genuine production deployments rather than investor projections.

Enterprise platform integration has accelerated with ecosystem maturation: IBM integrated ElevenLabs TTS/STT into watsonx Orchestrate (March 2026) with 10,000+ voice library, compliance features (HIPAA, PCI), and 70+ language support, signaling enterprise-scale rollout. Google Cloud moved Custom Voice synthesis to general availability (April 2026) with governance framework ensuring voice actor consent verification, signaling vendor ecosystem maturity and adoption normalization. Production deployments cluster in high-ROI sectors: Deliveroo voice agents achieve 80% reach and 75% call success for rider onboarding. DataForest documented real-time cold calling voice agent deployment with measurable accuracy and cost metrics. ElevenLabs Impact Program transitioned from pilot to standard clinical care for ALS/MND voice restoration with hardware partnerships (Lenovo, Tobii Dynavox). Healthcare systems have returned 30 million clinician minutes; nine of ten top Norwegian banks deployed voice AI.

But adoption expansion now faces critical barriers. The gap between what the technology can do and what users will accept defines the ceiling: A/B testing shows human narration outperforming AI clones by 4.1x in saves and 2.7x in comments on social platforms, and 68% of users prefer human voices for financial or advisory contexts. Fraud risk has escalated dramatically by April 2026: 60% of US companies report voice cloning fraud attacks, with documented $25M CFO impersonation case in Hong Kong authorizing financial transfers via voice clone; humans detect only 37.5% of clones despite 97% fidelity, creating asymmetric vulnerability. Trend Micro and security research confirm evolution from novelty to industrial-scale fraud with production infrastructure. Voice actor displacement has become measurable: Korea Times reports income decline and consent gaps among creative voice workforce due to unauthorized cloning at scale. Three-quarters of 455+ voice agent builders struggle with production reliability—latency, accent bias, and codec failures in telephony. Regulatory barriers persist with new developments: Lehrman v. Lovo precedent (SDNY, July 2025) establishes state-law liability for unauthorized voice cloning; Japan established Ministry of Justice expert panel for voice likeness protection (April 2026); SAG-AFTRA dispute (160K actors) made unlicensed cloning a material legal liability; Tennessee ELVIS Act, FCC TCPA, emerging NO FAKES Act, and EU AI Act impose fragmented requirements that slow enterprise adoption beyond early movers.

TIER HISTORY

ResearchJan-2023 → Jan-2023

Bleeding EdgeJan-2023 → Apr-2024

Leading EdgeApr-2024 → present

EVIDENCE (97)

ElevenLabs hit with fresh lawsuit over use of voices by Pulitzer and Emmy-winning journalistsCase Studies2026-05-13

— Seven journalists sued ElevenLabs claiming unauthorized voice training without consent; demonstrates production-stage liability exposure and consent/licensing burden constraining enterprise adoption in regulated sectors.

ElevenLabs | Tools Directory | Create WithCase Studies2026-05-11

— Mahindra deployed ElevenLabs voice agents for automotive launch achieving 8% conversion uplift vs traditional call centers—quantified revenue-impacting production deployment beyond pilots.

HC protects MP Shashi Tharoor's personality rights, directs removal of deepfake videoNews Coverage2026-05-10

— Delhi High Court established voice and oratorical manner as protectable personality attributes under constitution; ordered removal of AI-generated deepfakes via voice cloning, demonstrating international legal enforcement precedent.

I Burned 2.4M Credits Testing the ElevenLabs API in 2026Case Studies2026-05-09

— Independent developer documented production deployment of ElevenLabs voice cloning across 8 use cases (agents, dubbing, audiobook automation) with quantified cost structure ($2.4M credits over 4 months) confirming production-grade maturity.

AI-Generated Voice, Synthetic Speech, and Voice Cloning: Scoping ReviewResearch Papers2026-05-09

— Scoping review synthesizing 226 studies mapped voice cloning deployment across education, healthcare, accessibility, commerce; identified critical asymmetry—humans detect only 37.5% of clones despite 97% fidelity vs automated detectors >99% accuracy.

ElevenLabs Hit $500M ARR Before Anyone Was WatchingAdoption Metrics2026-05-07

— ElevenLabs achieved $500M ARR (43% quarterly growth) with institutional investors (BlackRock, Wellington, NVIDIA) and named enterprise customers (Nvidia, Salesforce, Deutsche Telekom) in production deployment.

10 Voice AI Trends Transforming Call Centers in 2026 (With ROI Data)Industry Reports2026-05-05

— Call center trend analysis showed 75% of voice agent builders struggle with production reliability (latency, accent bias, codec failures), revealing deployment-stage technical barriers beyond synthesis quality despite enterprise ROI metrics.

xAI Voice Cloning API: Custom Voices Tutorial + Pricing (2026)Product Launches2026-05-03

— xAI launched Custom Voices API with consent-enforcing verification (passphrase + speaker embedding), deployed at production scale in Grok, Tesla, Starlink at 14-28x lower cost than ElevenLabs.

HISTORY

2023-H1: Voice cloning platforms ElevenLabs and Resemble AI achieved commercial general availability with millions of users and enterprise integrations. Market matured rapidly: ElevenLabs raised $19M Series A with 1M+ users and partnerships; WellSaid Labs differentiated on quality and ethics; Resemble AI integrated custom voices into LivePerson IVR. Parallel rise of misuse: celebrity voice clones on social media, FTC enforcement guidance issued (March), and regulatory uncertainty became a key adoption blocker. Safety features like neural watermarking launched but lagged generation quality.
2023-H2: Voice cloning moved from research/beta into enterprise production infrastructure. ElevenLabs exited beta (August) with Multilingual v2 supporting 30+ languages and deployed on Google Cloud; Resemble AI secured $8M Series A with 200+ business clients, validating enterprise demand. Developer adoption grew (hackathon projects combining voice cloning with speech-to-text and LLMs for telephony IVR). Market forecasts projected $7.9B by 2030 at 25%+ CAGR. Misuse persisted but regulatory frameworks remained unclear—detection tools lagged generation capability.
2024-Q1: ElevenLabs achieved unicorn status ($80M Series B) with 41% Fortune 500 penetration; new products (Dubbing Studio, Voice Library) extended deployment scenarios beyond text generation. Real-world ROI case studies emerged (custom voices improved call completion 64% to 87%, saved ₹21-23L annually). Regulatory pressure intensified: FTC launched Voice Cloning Challenge for detection research; FCC ruled AI voices subject to TCPA compliance, requiring consent and disclosures. Misuse escalated with political deepfakes (Biden audio), ISIS propaganda, and influencer scams. Detection research showed promise (81% accuracy) but poor generalization (79% on unseen data), widening the gap between generation sophistication and detection capability as the core adoption blocker.
2024-Q2: Commercial deployment proved ROI at scale—Waymark reduced voice production costs by 74% and increased video generation 387%; Vyond customers upgraded to enterprise based on voice quality improvements. Regulatory frameworks accelerated: Tennessee ELVIS Act provided state-level voice property protection (effective July 1); FTC's Voice Cloning Challenge winners produced four detection technologies (pattern recognition, liveness detection, watermarking, authentication). OpenAI withheld Voice Engine from broad release due to misuse risks, signaling industry caution. Voice fraud metrics alarmed: 350% increase in fraud incidents since 2013, with documented $243K CEO impersonation case. Technology maturity and clear ROI contrasted sharply with regulatory tightening and misuse escalation.
2024-Q3: Voice cloning achieved sustained product maturity and accessibility impact. ElevenLabs launched Impact Program (August) providing free voice cloning to ALS/MND patients worldwide with named beneficiaries achieving assistive communication outcomes; Reader app expanded globally (32 languages, celebrity voice licensing). WellSaid Labs established tiered enterprise pricing and feature set ($49-$199/month). Industry recognition continued: ElevenLabs awarded as speech translation leader by Speech Technology Magazine despite acknowledged platform misuse concerns and required safeguards. Academic research documented voice cloning's dual nature—legitimate applications and security risks in finance/elections—with policy recommendations. Regulatory compliance burden persisted as primary adoption blocker for mainstream use cases.
2024-Q4: Technology capability and human-perceptible realism reached new ceiling while detection gap widened. Research documented humans perceive cloned voices as genuine ~80% of the time but correctly identify clones only 60% of the time, establishing asymmetric realism-detection dynamic. ElevenLabs Impact Program completed year with 50 non-profit partnerships (46 US states, 15+ countries) and high-profile beneficiaries (Jennifer Wexton, ALS/MND patients). WellSaid Labs expanded with documented enterprise ROI (Frameworks agency: 1-week to 1-day turnaround). Developer integration failures emerged in Q4, revealing operational reliability challenges despite production scale. Regulatory environment consolidated with Tennessee ELVIS Act enforcement, FCC telephony requirements, and continued OpenAI withholding of Voice Engine despite technical maturity—signaling sustained industry caution. Voice fraud incidents remained elevated. The leading-edge positioning remained stable: technology proven commercially viable at Fortune 500 scale, accessibility impact documented, but deployment expansion blocked by detection-generation asymmetry, regulatory fragmentation, and liability exposure rather than technical capability.
2025-Q1: Voice cloning technology demonstrated advanced realism (peer-reviewed research confirmed humans perceive AI clones as genuine ~80% of the time) while ecosystem fragmentation and safeguard gaps became critical concerns. ElevenLabs raised $180M Series C at $3.3B valuation reporting 60% Fortune 500 adoption, 1,000 years of audio generated, and 250k conversational AI agents deployed. Parallel developments highlighted adoption barriers: Consumer Reports found 4 of 6 major tools lacked safeguards; LA Times investigation documented non-consensual voice cloning displacing voice actors; research revealed accent bias and digital exclusion in synthesis quality. Twilio integration of ElevenLabs voices expanded enterprise ecosystem; Resemble AI launched voice clone 2.0 claiming technical superiority. Leading-edge positioning remained stable but deployment expansion remained constrained by detection-generation asymmetry, lack of industry-standard safeguards, and ethical concerns rather than technical capability.
2025-Q2: Voice cloning matured into production-scale deployment with documented ROI and real-world impact. ElevenLabs maintained market leadership following $3.3B Series C (January 2025) with 60% Fortune 500 adoption and 250k conversational agents in production. Product advancement accelerated: Eleven v3 research preview (June) with enhanced expressiveness; new business tool achieving 95% accuracy and 40% production cycle reduction. Real-world production milestones documented: Disney+ voice recreation, Cadbury India's award-winning personalized ad campaign (Clio Gold), creator revenue growth (250-500% audience growth via platform Sayes). Market consolidation: $1.45B current value projected to $10B by 2030 (26% CAGR). Regulatory burden increased: emerging NO FAKES Act, fragmented compliance requirements across jurisdictions. Ethical risks escalated with advancing synthesis quality (98.1% naturalness, 89% emotional congruence), enabling new exploitation vectors (emotional contagion hacking, non-consensual voice replication). Mainstream adoption expansion remains blocked by liability, detection gaps (~60% human accuracy), and regulatory uncertainty despite technical readiness.
2025-Q3: Voice cloning solidified enterprise infrastructure maturity with product GA and accelerated market valuation. ElevenLabs launched ElevenAgents for enterprise conversational AI (August) with named customers (Revolut, Meesho, Deliveroo, Cisco, Deutsche Telekom) reporting concrete ROI: up to 66% cost-per-call reductions, 35% higher first-visit conversions, 25% customer satisfaction gains. Case study aggregation revealed 25+ deployments across industries (healthcare: 62% clinical documentation improvement; recruitment: 95% call completion rates; marketing: 2x faster content pipelines; accessibility: museum voice recreation). Valuation doubled to $6.6B (September tender offer), reflecting investor confidence. Peer-reviewed research (PLOS ONE, September) confirmed voice clones perceived as realistic as human voices but documented realism ceiling with no hyperrealism effect—confirming quality plateau and asymmetric perception-detection gap persists. Production deployment barriers surfaced: contact center deployments hampered by latency requirements, accent recognition failures, compliance gaps (HIPAA), and demo-to-reality performance degradation despite technical maturity. Regulatory maturation continued: New York court (July) established voice property protections via right-of-publicity law. Leading-edge positioning remained stable: technology proven commercially viable at scale with demonstrated ROI, but expansion into regulated sectors (financial services, contact centers) remains constrained by production-readiness gaps, liability exposure, and fragmented compliance burden rather than technical capability.
2025-Q4: Voice cloning technology demonstrated continued commercial maturity with expanded enterprise deployments and escalating fraud risk. ElevenLabs scaled to ~$300M ARR (October 2025) with confirmed Fortune 500 adoption holding at 41%; new enterprise deployments documented (Deliveroo rider onboarding and restaurant verification achieving 80% reach and 75% call success; ALS/MND Impact Program transitioned from pilot to standard clinical care with hardware partnerships). Technical research confirmed realism plateau: peer-reviewed study showed voice clones perceived as realistic as human voices with no hyperrealism effect. Critical adoption barrier emerged: empirical trust deficit research showed 68% of users prefer human voiceovers in high-stakes contexts (financial, advisory, credentials). Fraud risk accelerated sharply: deepfakes online grew 16x since 2023 (500K to 8M), major retailers report 1,000+ AI voice scam calls daily, expert analysis confirms technology crossed "indistinguishable threshold." Regulatory fragmentation persisted with NO FAKES Act advancement. Leading-edge positioning remained stable with demonstrated commercial viability and real-world impact, but mainstream expansion into regulated sectors remains constrained by fraud infrastructure gaps, trust deficits, and liability exposure rather than technical capability. Synthesis quality plateau (98.1% naturalness) reached ceiling; differentiating factor now is deployment-stage risk mitigation and fraud prevention, not incremental quality improvements.
2026-Jan: Voice cloning technology sustained commercial production maturity with niche ethical deployments (hospice care, ALS patient voice restoration) expanding real-world impact. User preference gaps emerged as primary adoption constraint: brand marketing showed 40% AI audio asset adoption on TikTok, but human narration consistently outperformed AI clones (4.1x more saves, 2.7x more comments). Voice agent builder survey (455+ companies including Amazon, Microsoft) showed 87.5% actively building but 75% struggle with technical reliability barriers; 55% cite user repetition frustration as top issue. Healthcare and banking sector adoption expanded (9 of 10 Norwegian banks deployed voice AI, returning 30M clinician minutes with 21x ROI). Fraud risk escalated to 162% year-over-year surge with 1,000+ AI voice scam calls daily reported; unauthorized voice cloning cases documented (BBC presenter voice used without consent). Regulatory burden persisted (Tennessee ELVIS Act, FCC TCPA, emerging NO FAKES Act, EU AI Act). Leading-edge positioning stable; adoption ceiling now defined by user preference for human voices (68% in high-stakes contexts), emotional authenticity gaps (98.2% indistinguishability yet 22% lower emotional truthfulness ratings), and fraud prevention infrastructure rather than synthesis quality.
2026-Feb: Voice cloning technology demonstrated sustained commercial production maturity with broadened entertainment deployments (Respeecher: Disney+ Mandalorian voice synthesis, National Geographic documentaries, international advertising production). Healthcare clinical deployment accelerated: ElevenLabs Impact Program transitioned to standard clinical care with expanded hardware partnerships (Lenovo, Tobii Dynavox) and voice cloning from <10 minutes audio achieving fast real-time synthesis (75-150ms latency). Market valuation confirmed at $610M USD (Ken Research). Critical adoption barriers intensified: legal licensing risks escalated from SAG-AFTRA strike context (160K actors disputing AI voice rights), with unlicensed cloning creating material liability; production reliability challenges documented (40% of voice agent failures due to latency, budget exhaustion, codec mismatches in telephony environments). Leading-edge positioning sustained but expansion into contact center and regulated sectors remains blocked by licensing/consent friction and production reliability gaps rather than synthesis quality.
2026-Mar: Voice cloning technology demonstrated sustained enterprise-scale deployment with geographic and platform expansion. Market metrics accelerated: 67% Fortune 500 adoption with 340% year-over-year deployment growth; global voice AI market reached $22.5B (2026); TTS market valued at $4.25B growing 15.9% CAGR. Geographic expansion: ElevenLabs India showed hundreds of enterprise customers, tens of millions revenue, with Meesho deploying 60,000 calls/day—signaling scale-up beyond North America. Platform integration: IBM integrated ElevenLabs TTS/STT into watsonx Orchestrate (March 2026) with compliance features (HIPAA, PCI) and 10,000+ voice library. Production case study: Synthesia documented voice cloning quality challenges solved through audio preprocessing—establishing infrastructure maturity despite consumer-grade input variability. Critical adoption barrier escalated: security/fraud asymmetry—voice clones achieve 97% fidelity but humans detect only 37.5% of clones (37.5% detection gap). Documented fraud case: Hong Kong finance director authorized $25M transfer via CFO voice clone; 81% of firms report AI fraud but only 26% feel prepared—fraud now operationalized at production scale. Leading-edge positioning remained stable: deployment expansion into enterprise platforms and geographic markets confirmed, but fraud-detection gap and regulatory liability define adoption ceiling for regulated sectors (financial services, government).
2026-Apr: Voice cloning ecosystem matured and fraud emergency accelerated simultaneously. Enterprise platform maturity advanced: Google Cloud moved Custom Voice synthesis to general availability with a governance framework requiring voice actor consent verification, signaling vendor ecosystem normalization. Auto-dubbing deployments confirmed production quality from 30-second voice samples, enabling creator-scale multilingual audience expansion despite regulatory headwinds. Production deployments demonstrated concrete ROI: DataForest documented real-time voice agent for cold calling with measurable accuracy and cost metrics. Fraud escalation became documented reality: 60% of US companies reported voice cloning fraud attacks; CybelAngel threat intelligence confirmed the $25M Hong Kong CFO impersonation case with financial transfer authorization via voice clone; humans detect only 37.5% of clones despite 97% fidelity, creating an asymmetric vulnerability now operationalized at industrial scale. Korea Times reported measurable income decline among voice actors from unauthorized cloning. Regulatory developments consolidated: Japan established Ministry of Justice expert panel for voice/likeness protection. Leading-edge positioning remained stable with documented enterprise ecosystem maturity, but fraud-detection asymmetry and regulatory fragmentation define the adoption ceiling — mainstream expansion blocked not by technical capability but by fraud prevention infrastructure, consent/licensing burden, and regulatory liability in regulated sectors.
2026-May: Voice cloning ecosystem bifurcated: vanguard deployments expanded while liability and production constraints intensified. Market expansion: ElevenLabs reached $500M ARR (43% quarterly growth) with Series D at $11B valuation and institutional backing (BlackRock, Wellington, NVIDIA); ecosystem competition emerged with xAI Custom Voices API (14-28x cheaper, consent-verified deployment) launched May 2, and Yellow.ai Nexus Vox shipping brand voice cloning in 500+ languages with sub-second deployment. Technical maturity confirmed: peer-reviewed synthesis of 226 studies (May 2026) mapped voice cloning across education, healthcare, accessibility, commerce; identified asymmetry—humans detect 37.5% of clones despite 97% fidelity while automated detectors exceed 99% accuracy. Revenue-impacting deployments quantified: Mahindra achieved 8% conversion uplift with voice agents during automotive launch; independent developer case study documented production maturity across 8 voice cloning use cases with quantified cost metrics ($2.4M credits, Jan-Apr 2026). Critical barriers hardened: Delhi High Court (May 10) established voice and oratorical manner as personality-right protected under constitution; seven Pulitzer and Emmy-winning journalists filed suit against ElevenLabs claiming unauthorized training on their voices without consent, adding named-professional class-action exposure to existing liability landscape; call center deployment survey showed 75% of voice agent builders struggle with latency, accent bias, codec reliability despite documented 8% ROI. Regulatory fragmentation accelerated: Washington state law (June 10 deadline) explicitly prohibits commercial voice cloning without written consent, naming ElevenLabs and competitors; German court precedent (Aug 2025) awarded €4K+ damages for synthetic voice imitation. Leading-edge positioning: commercial viability for niche/vanguard use cases proven at $500M+ scale with revenue-impacting ROI, but adoption expansion now constrained by production reliability requirements (75% builder struggle), consent/licensing liability (journalist lawsuits, court precedents), and regulatory fragmentation rather than synthesis capability.

TOOLS

ElevenLabs Resemble AI WellSaid Labs