The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail
AI that automatically drafts full responses for agents to review, edit, and send during customer interactions. Includes tone-matched response generation and policy-aware drafting; distinct from response suggestion which offers options rather than complete drafts.
Auto-draft with human review has become the proven pattern for AI in customer support. The approach -- AI generates a full, tone-matched response draft; the agent edits and sends it -- is now a GA feature across tier-1 platforms, with documented ROI at enterprise scale. The question for most organisations is how to roll it out effectively, not whether it works.
What makes auto-draft durable is what it chose not to automate. Fully autonomous AI agents face high failure rates and mounting governance concerns; auto-draft sidesteps these by keeping the human in the approval loop. That architectural choice, once seen as a concession, has proven to be the practice's competitive advantage. Deployments that preserve agent judgment deliver measurable gains in handle time, resolution rate, and satisfaction. Those that skip the review gate face spiralling incident rates and stalled scaling.
Zendesk and Intercom ship auto-draft as standard platform infrastructure, not add-on pilots. Zendesk's May 2026 Copilot updates introduce parallel composer workflows, confidence-gating of suggestions, and AI-generated procedures—features that deepen human control rather than expand autonomy. Intercom publishes a formal automation-rate KPI for Fin, treating draft-and-review throughput as a production metric. These vendor iterations signal maturity: the focus has shifted from "can we generate drafts?" to "how do we help agents review them faster?"
Deployment is mainstream at 55–66% adoption. Market data from May–June 2026 shows 66% of service organizations use AI agents (1.7× YoY growth), with Zendesk enterprise customers achieving median 41.2% deflation across all channels. New case studies document specific auto-draft wins: Intercom Fin across 180 customers (34% AHT reduction, 52% resolution, 78% CSAT), large-scale research across 17,170 businesses (38.7% resolution time improvement, 42.4% CSAT lift). Hybrid 3-layer models (autonomous for routine, agent-assist for complex triage, escalation for edge cases) outperform single-mode deployments. AI-assisted human interactions achieve 84% CSAT—nearly matching 82–86% fully human and far exceeding 68–74% chatbot-only. Humans with AI tools outperform humans alone or machines alone.
However, the deployment-to-production gap is stark. Only 12% of AI agent pilots reach production scale; successful deployments maintain human-in-the-loop workflows for 60–90 days while building observability and governance; failed deployments remove human oversight in under 2 weeks under ROI pressure. Gartner's May 2026 forecast quantifies the ROI reality: $206.5B in 2026 spending, yet only 23% report significant value; 80% of pilots cut workforce on expectations, not measured results. Only 5.5% of enterprises see meaningful gains. The gap is not capability—it is execution discipline. Intercom's survey of 2,400 service professionals shows 82% invested but only 10% mature; 87% of mature teams report quality gains versus 43% of explorers. The difference: systematized governance, change management, careful phased scaling, and human review gates kept intact.
Governance and visibility gaps are structural barriers blocking broader deployment. Economist Enterprise survey of 804 decision-makers reveals 98% experienced disruptive agent incidents; 2/3 cannot observe agent actions in real-time; only 30% have tested rollback capability. Organizations deploying AI faster than security can govern face 74% rollback rates (Sinch survey, 2,527 leaders); rollback is highest among mature governance teams (81%), indicating that structured oversight surfaces failures earlier and triggers disciplined recovery. Hallucination remains endemic: 30–33% on major models, and 88% of organizations lack full security approval. The human-review gate is not a temporary constraint but the practice's permanent competitive advantage. Where auto-draft workflows preserve agent judgment, escalation clarity, and approval gates, they deliver sustained gains. Where organizations attempt to remove the human layer, incident rates spike and projects stall—Air Canada's 2024 chatbot liability ruling established legal precedent that human review is a governance necessity, not optional.
— Gartner governance framework describes 'Advise' autonomy level (AI generates drafts/recommendations, humans review all outputs). Predicts 40% enterprise demotions by 2027 due to governance gaps discovered post-deployment.
— Cresta positions agent assist as augmentation layer in 3-tier system. Reports 78% of customer conversations handled by humans+AI together; advocates governance-first approach before automation deployment.
— Economist Enterprise survey (804 decision-makers): 98% experienced disruptive agent incidents; 2/3 cannot observe what agents did; only 30% have robust rollback capability. Structural visibility gap validates human review necessity.
— Multiple named organizations (Air Canada, Klarna, Zillow, Morgan Stanley, NHS) with documented autonomous agent failures. Provides critical negative evidence validating necessity of human review gates in production.
— SoftwareSeni analysis: only 12% of pilots reach production; successful deployments kept humans in loop 60-90 days; failed ones averaged <2 weeks. Human-in-loop timing is single largest success factor.
— Documents Air Canada chatbot legal liability precedent (2024 court ruling); argues approval/review layers prevent cascading errors. Validates human review as both governance necessity and competitive advantage.
— Call It Dev reliability guide: 70-80% of interactions handled cleanly, 10-20% ambiguous/risky, small remainder hard. Human-in-the-loop on hard 20% and staged rollout are core reliability practices.
— Resolx survey (17,170 businesses, 37M conversations): 38.7% resolution time improvement, 42.4% CSAT lift with AI writing assistants. Explicitly defines auto-draft as agent-workflow integration tool, not autoresponder.