Perly Consulting │ Beck Eco

The State of Play

A living index of AI adoption across industries — where established practice meets the bleeding edge
UPDATED DAILY

The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.

The Daily Dispatch

A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.

AI Maturity by Domain

Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail

DOMAIN
BLEEDING EDGEESTABLISHED

Agent assist — auto-draft with human review

GOOD PRACTICE

TRAJECTORY

Stalled

AI that automatically drafts full responses for agents to review, edit, and send during customer interactions. Includes tone-matched response generation and policy-aware drafting; distinct from response suggestion which offers options rather than complete drafts.

OVERVIEW

Auto-draft with human review has become the proven pattern for AI in customer support. The approach -- AI generates a full, tone-matched response draft; the agent edits and sends it -- is now a GA feature across tier-1 platforms, with documented ROI at enterprise scale. The question for most organisations is how to roll it out effectively, not whether it works.

What makes auto-draft durable is what it chose not to automate. Fully autonomous AI agents face high failure rates and mounting governance concerns; auto-draft sidesteps these by keeping the human in the approval loop. That architectural choice, once seen as a concession, has proven to be the practice's competitive advantage. Deployments that preserve agent judgment deliver measurable gains in handle time, resolution rate, and satisfaction. Those that skip the review gate face spiralling incident rates and stalled scaling.

CURRENT LANDSCAPE

Zendesk and Intercom ship auto-draft as standard platform infrastructure, not add-on pilots. Zendesk's May 2026 Copilot updates introduce parallel composer workflows, confidence-gating of suggestions, and AI-generated procedures—features that deepen human control rather than expand autonomy. Intercom publishes a formal automation-rate KPI for Fin, treating draft-and-review throughput as a production metric. These vendor iterations signal maturity: the focus has shifted from "can we generate drafts?" to "how do we help agents review them faster?"

Deployment is mainstream at 55–66% adoption. Market data from May–June 2026 shows 66% of service organizations use AI agents (1.7× YoY growth), with Zendesk enterprise customers achieving median 41.2% deflation across all channels. New case studies document specific auto-draft wins: Intercom Fin across 180 customers (34% AHT reduction, 52% resolution, 78% CSAT), large-scale research across 17,170 businesses (38.7% resolution time improvement, 42.4% CSAT lift). Hybrid 3-layer models (autonomous for routine, agent-assist for complex triage, escalation for edge cases) outperform single-mode deployments. AI-assisted human interactions achieve 84% CSAT—nearly matching 82–86% fully human and far exceeding 68–74% chatbot-only. Humans with AI tools outperform humans alone or machines alone.

However, the deployment-to-production gap is stark. Only 12% of AI agent pilots reach production scale; successful deployments maintain human-in-the-loop workflows for 60–90 days while building observability and governance; failed deployments remove human oversight in under 2 weeks under ROI pressure. Gartner's May 2026 forecast quantifies the ROI reality: $206.5B in 2026 spending, yet only 23% report significant value; 80% of pilots cut workforce on expectations, not measured results. Only 5.5% of enterprises see meaningful gains. The gap is not capability—it is execution discipline. Intercom's survey of 2,400 service professionals shows 82% invested but only 10% mature; 87% of mature teams report quality gains versus 43% of explorers. The difference: systematized governance, change management, careful phased scaling, and human review gates kept intact.

Governance and visibility gaps are structural barriers blocking broader deployment. Economist Enterprise survey of 804 decision-makers reveals 98% experienced disruptive agent incidents; 2/3 cannot observe agent actions in real-time; only 30% have tested rollback capability. Organizations deploying AI faster than security can govern face 74% rollback rates (Sinch survey, 2,527 leaders); rollback is highest among mature governance teams (81%), indicating that structured oversight surfaces failures earlier and triggers disciplined recovery. Hallucination remains endemic: 30–33% on major models, and 88% of organizations lack full security approval. The human-review gate is not a temporary constraint but the practice's permanent competitive advantage. Where auto-draft workflows preserve agent judgment, escalation clarity, and approval gates, they deliver sustained gains. Where organizations attempt to remove the human layer, incident rates spike and projects stall—Air Canada's 2024 chatbot liability ruling established legal precedent that human review is a governance necessity, not optional.

TIER HISTORY

ResearchJun-2023 → Jul-2023
Bleeding EdgeJul-2023 → Oct-2024
Leading EdgeOct-2024 → Jan-2025
Good PracticeJan-2025 → present

EVIDENCE (94)

— Gartner governance framework describes 'Advise' autonomy level (AI generates drafts/recommendations, humans review all outputs). Predicts 40% enterprise demotions by 2027 due to governance gaps discovered post-deployment.

— Cresta positions agent assist as augmentation layer in 3-tier system. Reports 78% of customer conversations handled by humans+AI together; advocates governance-first approach before automation deployment.

— Economist Enterprise survey (804 decision-makers): 98% experienced disruptive agent incidents; 2/3 cannot observe what agents did; only 30% have robust rollback capability. Structural visibility gap validates human review necessity.

— Multiple named organizations (Air Canada, Klarna, Zillow, Morgan Stanley, NHS) with documented autonomous agent failures. Provides critical negative evidence validating necessity of human review gates in production.

— SoftwareSeni analysis: only 12% of pilots reach production; successful deployments kept humans in loop 60-90 days; failed ones averaged <2 weeks. Human-in-loop timing is single largest success factor.

— Documents Air Canada chatbot legal liability precedent (2024 court ruling); argues approval/review layers prevent cascading errors. Validates human review as both governance necessity and competitive advantage.

— Call It Dev reliability guide: 70-80% of interactions handled cleanly, 10-20% ambiguous/risky, small remainder hard. Human-in-the-loop on hard 20% and staged rollout are core reliability practices.

— Resolx survey (17,170 businesses, 37M conversations): 38.7% resolution time improvement, 42.4% CSAT lift with AI writing assistants. Explicitly defines auto-draft as agent-workflow integration tool, not autoresponder.

HISTORY

  • 2023-H2: SupportLogic, Maven, and Macha released production auto-draft features with agent review workflows. Evidence shows response generation with tone control and editable draft modes gaining traction in vendor roadmaps; concurrent critical coverage highlights risks of AI implementation without proper human involvement.
  • 2024-Q1: Auto-draft moved into mainstream platform adoption. Zendesk and Microsoft shipped GA auto-draft tools; Gartner reported 94% of customer service leaders exploring GenAI copilots for agent assist. Vendor implementations converge on draft-review workflows, but satisfaction gaps remain (80% see value, 41% satisfied); risks around hallucination and prompt injection documented.
  • 2024-Q2: Zendesk positioned Agent copilot as core platform feature with proactive guidance; Intercom launched Fin copilot for conversational response generation. Auto-draft consolidates as mainstream capability in tier-1 platforms, moving from experimental to standard agent-assist offering in major contact center stacks.
  • 2024-Q3: Zendesk and Intercom released GA auto-draft products with confirmed human review workflows; independent roundtable coverage confirms agent assist as central investment area across ecosystem (Avaya, AWS, Genesys, NICE, Talkdesk, Zoom). Telecom deployments show 25-90% improvements in agent productivity and troubleshooting. Adoption surveys show 80% positive sentiment on AI's impact, though critical assessments highlight persistent ROI verification challenges and low satisfaction gaps in real deployments.
  • 2024-Q4: Production auto-draft deployments accelerated across tier-1 platforms. Zendesk and Intercom reported specific customer outcomes: email automation (64% volume, 10-point CSAT lift), agent productivity multipliers (3x ticket throughput), and resolution rate benchmarks (51-65% autonomous resolution). Broad enterprise adoption (68%) contrasted with ROI realization challenges (32% see significant ROI), revealing adoption-execution gap. Ecosystem consolidation evident: Genesys deprecated earlier Agent Assist in favor of Agent Copilot. Critical assessments documented persistent implementation barriers: hallucinations, over-automation risks, need for specialized training, and human supervision requirements highlighted as prerequisites for safe deployment.
  • 2025-Q1: Auto-draft entered production maturity phase across enterprise tier-1 platforms. KPMG analyst validation confirmed enterprise-wide shift from experimentation to large-scale production deployment. Named customer case (Freedom Furniture) demonstrated 92% faster resolution and 17% CSAT improvement from agent copilot workflows. However, critical assessments reinforced that agent-assist technology delivers immediate ROI while autonomous systems remain premature; Zendesk production incident revealed reliability challenges in large-scale deployment, highlighting need for careful implementation and monitoring.
  • 2025-Q2: No new independent deployment evidence identified. Vendor announcements and agentic AI discussions dominated the window; no named customer case studies or adoption metrics specific to auto-draft in customer service operations during this period.
  • 2025-Q3: Auto-draft consolidates as the proven pattern within agentic AI. Broader agentic AI adoption faces friction: EY survey shows only 34% implementation despite 55% intent for customer support; CMU/Gartner research predicts 70% failure rate and 40% project cancellations by 2027 for autonomous agents. Google Agent Assist deployment guide documents 10-15% AHT improvement with proper implementation. Q3 evidence shows sharp bifurcation: human-in-the-loop auto-draft advancing into maturity and sustained ROI, while fully autonomous systems face mounting skepticism. Auto-draft's success hinges on maintaining human review gates and agent agency—validation that augmentation strategies outperform replacement automation.
  • 2026-Jan: GA feature launches demonstrate maturity: Zendesk releases AI-generated procedure drafts (3 per week), Intercom publishes automation rate KPI with production tracking. However, adoption-execution gap widens: Intercom survey shows 82% invested but only 10% mature (87% of mature teams see quality improvements vs 43% of explorers). Agentic AI failure rates spike: RAND/Gartner research documents 88% project failure, 11% production deployment, 40% cancellations forecast by 2027. Technical reliability concerns deepen: drift, inconsistency, and engineering cost barriers highlighted across industry analysis. Bifurcation sharpens: human-in-the-loop auto-draft advancing to enterprise scale with sustained ROI, while autonomous systems face escalating cancellations and skepticism.
  • 2026-Feb: Auto-draft accessibility expands: Zendesk rolls out capped AI writing tools (tone control, expand/simplify) to Professional+ plans in Feb 2026, broadening feature access from enterprise to mid-tier. Governance challenges surface: Gravitee's survey of 900+ executives finds 81% deployed AI agents but only 14.4% have security approval and 88% report incidents—validating mandatory human review as critical architectural control rather than limitation. Production deployments mature: Named customers (Telus 40 min/interaction, Suzano 95% query time reduction, Danfoss 80% automation) demonstrate enterprise-scale ROI, reinforcing that AI agents deliver value when properly implemented with human gates.
  • 2026-Apr: Governance tooling matures further: Zendesk's March 2026 release adds auto-assist event logging for full audit trails and pre-approved action workflows for low-risk tasks, confirming governance layers are now native to the product. Deployment breadth consolidates around 55% adoption (Metrigy, 656 companies), with Nucleus Research documenting measurable resolution and effort gains across 30+ production customers. Consumer risk evidence sharpens the case for human review: Qualtrics' survey of 20,000+ consumers in 14 countries documents AI customer service failing at 4x the rate of other AI applications, while Stanford-CMU research shows hybrid human-AI teams outperform autonomous systems by 68.7%—reinforcing the human-review gate as both a governance necessity and a performance advantage.
  • 2026-May (early): Quantified ROI consolidates across multiple benchmarks: Digital Applied documents hybrid escalation model at 4.25/5 CSAT with $0.62 per-resolution cost (vs $7.40 human-only); Balto ROI analysis documents 20% AHT reduction as a standard deployment outcome; enterprise AI benchmarking shows 620% average ROI within 18 months where tier-1 and tier-2 queries are handled with 78% autonomous resolution before escalation. Agent assist productivity metrics firm up: 9x cost reduction per task, 8.7 hours saved weekly, 4.2x productivity multiplier, 4.1-month payback period across deployments. Governance remains essential: Hiver survey of 700+ leaders finds 90% uncomfortable with AI representing brand directly to customers, confirming that the human-review gate is as much a trust requirement as a performance control.
  • 2026-May (mid): Ecosystem maturity and market validation accelerate. AWS publishes agent assist as GA product with three named customers (Orbit, Wolters Kluwer, Traeger) achieving 10–20% AHT and productivity improvements. Liveops' large-scale survey (815 enterprise executives) confirms 73% prefer hybrid AI-human models for CX, with only 6% choosing AI-only—direct market rejection of autonomous-first approaches. Microsoft releases Agent 365 (May 1, 2026) with mandatory supervisor sign-off on drafted communications, proving auto-draft-with-human-review pattern in regulated industries (HIPAA, FINRA, FedRAMP). Talkdesk positions live agent assistance with AI-driven recommendations as core GA automation capability. Counterbalancing: Sinch survey of 2527 leaders reveals 74% rollback rate for deployed autonomous customer agents, climbing to 81% among mature governance teams—powerful negative signal for fully autonomous approaches. Swept.ai analysis establishes legal liability framework (Air Canada v. Moffatt precedent) where human review prevents customer-facing hallucinations and policy fabrications. The May evidence uniformly validates the human-in-the-loop pattern while documenting autonomous system failures.
  • 2026-May (late): Late-month evidence consolidates ROI reality and governance necessity. Gartner's May 2026 forecast reveals structural adoption-ROI gap: $206.5B AI agent spending but only 23% report significant ROI; 80% of pilot cuts made on expectations, not measured results (critical negative signal balancing prior optimism). Market data shows 66% service orgs use AI agents (1.7× YoY) with Zendesk enterprise median 41.2% deflation and 30–40 point gap vs vendor claims; hybrid 3-layer models (autonomous + assist + escalation) outperform single-mode. Stealth Agents benchmark establishes CSAT signature: AI-assisted humans (84%) nearly match fully human (82–86%) while far exceeding chatbot-only (68–74%), validating practice's value proposition. Zendesk's May 26 Copilot update introduces parallel composer and confidence-gating, signaling feature maturity shifting from draft-generation to human-experience optimization. UC Santa Cruz + MIT research (AgentAtlas) introduces CONFIRM as measurable control-decision in agent taxonomy, providing academic validation for human-review gate. Hallucination analysis from Seekr documents 33% rates on major models and only 5.5% of enterprises achieving value from agents, reinforcing that human review is not a temporary constraint but permanent architectural necessity. The May evidence shows market-wide adoption acceleration (66%) meeting headlong into ROI reality (5.5% success rate), with governance and human-oversight practices as decisive differentiators between mature teams (87% improvement rates) and explorers (43%).
  • 2026-Jun: Hallucination risk and the trust architecture of auto-draft are now the central focus. Production CX analysis (Inbenta) documents hallucination rates from 22–94% across models, shifting the debate from whether to use auto-draft to which structural controls (source linkage, deterministic retrieval) make human review tractable. A Sinch survey of 2,527 enterprise decision-makers finds 74% of AI customer service deployments were shut down or rolled back post-launch, an 81% failure rate even among mature governance teams—strong external validation for the human-review gate as durable advantage. Real deployment signals reinforce this: Unity's shadow-mode auto-draft workflow automated 8,000 tickets in January 2026 with $1.3M operational savings while preserving human sign-off. Algolia analysis explicitly validates agent-assisted drafting as a distinct productive-middle-path layer, and Aspect think-tank work identifies trust as the foundational adoption lever—one wrong answer empties the trust bucket, coaching-led adoption measurably outperforms compliance-led rollout. Late-June evidence adds scale and governance depth: a Resolx survey of 17,170 businesses (37M conversations) documents 38.7% resolution time improvement and 42.4% CSAT lift with AI writing assistants that keep agents in control; Intercom Fin (180 customers, 14 months) adds 34% AHT reduction, 52% resolution, and 78% CSAT as a named-customer benchmark. Gartner's 'Advise' autonomy level—AI drafts, humans review all outputs—is now the formal governance category, with Gartner predicting 40% of enterprises will be demoted from higher autonomy tiers by 2027 due to governance gaps found post-deployment. Cresta reports 78% of customer conversations are now handled by human and AI working together, confirming the auto-draft pattern as the dominant production architecture.