Perly Consulting │ Beck Eco

The State of Play

A living index of AI adoption across industries — where established practice meets the bleeding edge
UPDATED DAILY

The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.

The Daily Dispatch

A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.

AI Maturity by Domain

Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail

DOMAIN
BLEEDING EDGEESTABLISHED

Agent assist — autonomous send

BLEEDING EDGE

TRAJECTORY

Stalled

AI that sends responses to customers automatically with human agents only involved for escalations and edge cases. Includes confidence-gated auto-send and human escalation routing; distinct from autonomous chatbots which handle the full interaction rather than augmenting an agent workflow.

OVERVIEW

Autonomous send -- AI that fires customer responses without waiting for a human to press "send" -- remains firmly experimental despite shipping GA at major vendors. The concept is narrower than a fully autonomous chatbot: it augments existing agent workflows by removing the manual approval step for high-confidence replies, escalating only edge cases to humans. Confidence-gated execution architectures (85-92% threshold for send, 65-80% for draft, <65% for escalation) are now standard in production systems. Yet independent May 2026 research reveals the core tension: while vendors report 70-84% autonomous resolution at 9,000+ customers (HubSpot) and 35,000+ deployments globally (Text), only 24% of consumers in production environments actually experienced full resolution without human intervention. The binding constraint remains reliability and trust. Critical failures continue: Klarna rehired humans after CSAT collapse, Commonwealth Bank reversed layoffs following tribunal challenge, DPD disabled its system after swearing at a customer, and Air Canada faced legal liability for autonomous policy fabrications. Practitioner consensus (MoClaw 2026) emphasizes mandatory human gating: "Customer-facing send without approval... Always gate." Once an autonomous message sends, it cannot be recalled. The gap between capability (70%+ vendor metrics) and actual reliability (24% consumer experience) signals the practice remains early-stage deployment despite product maturity.

CURRENT LANDSCAPE

Vendor adoption is demonstrable but consumer reality lags claims. May 2026 evidence shows HubSpot Customer Agent autonomously resolving 70% of conversations across 9,000+ customers (up from 20% in 12 months), Text AI deployed at 35,000+ companies with 74% autonomous resolution, and Stratco Australia doubling previous human support volumes by achieving 80% autonomous query resolution. Go Autonomous documents autonomous order confirmation sending in production across European manufacturers with 43% capacity release. These represent genuine scale deployments with confidence-gated execution (85-92% auto-send thresholds, 65-80% draft, <65% escalation).

Yet independent research reveals the maturity gap. Ada/NewtonX's May 2026 survey of actual consumer experiences found only 24% reported full autonomous resolution without human intervention—a critical reality check against vendor claims of 70-80% autonomous send rates. Practitioner consensus emphasizes mandatory human review: MoClaw's May 2026 assessment states unambiguously that "customer-facing send without human approval" is a failure pattern and "always gate" is the safe model for customer communication. The trust gap persists: only 29% of enterprises allow unsupervised agent actions despite 88% planning increased budgets (ace8 mid-2026 assessment). Market adoption is wide (35,000+ Text deployments, 9,000+ HubSpot customers) but production readiness is narrow—success depends on deployment discipline (infrastructure validation, confidence thresholds, escalation governance) rather than vendor choice. Regulated markets show stronger hesitation: AI workflows outnumber autonomous agents 5:1, with 78% citing EU AI Act compliance as the primary barrier.

The metric-inflation problem is now explicitly recognized: Fini Labs' May 2026 research found 71% of support leaders cite "inflated automation metrics" as their top blocker to trusting AI vendors. Vendor self-report bias is real—Decagon claims 80% deflection while Zendesk's enterprise-wide median is 41.2%. Governance failures are widespread: Sinch's May 2026 survey of 2,500+ customer service leaders found 62% have autonomous AI agents in production, but 74% reported rolling back or disabling them due to governance failures (31% cited customer data exposure, 22% hallucinations, 16% lack of auditability). Staged rollout approaches show promise (Salesforce survey: 70% report measurable value within 60 days), but scaling remains difficult: realistic ROI assumes 3-month payback with 20-35% year-one cost reduction—far below vendor claims of 60-80%.

TIER HISTORY

ResearchJan-2025 → Apr-2025
Bleeding EdgeApr-2025 → present

EVIDENCE (88)

— Microsoft Dynamics 365 released production-ready Autonomous Email Resolution performing intent identification, response generation, autonomous sending, and case creation without agent review, confirming autonomous send moving to mainstream enterprise platform.

— Sinch report: 74% of autonomous AI agent deployments reversed after go-live due to governance failures; demonstrates that autonomous agents including autonomous send face real barriers and failures in production.

— Independent review of Decagon platform serving Hertz, Notion, Rippling, Duolingo, Faire, ClassPass, Noom, Substack, Curology with 80% deflection rates and 90%+ autonomous resolution, documenting production-scale autonomous send adoption.

— U.S. CAN-SPAM compliance framework: penalties $53,088 per individual email (2026 adjusted rate); autonomous sending must comply with header accuracy, unsubscribe, and opt-out enforcement within 10 days.

— Newo.ai reports 99.6% Lead Success Score across 100,000 analyzed calls, confirming autonomous agents reliably execute core business tasks without revenue loss; deployment across 22 industries, 30 countries, 90 languages.

— Fini Labs' comparative analysis of guardrails and the Air Canada tribunal case where AI chatbot invented policy; 'when an AI agent answers with confident wrong response, the business owns that answer including refund and compliance exposure.'

— eesel expert guide with Gridwise case study: 73% tier-1 resolution in first month with confidence-based routing (auto-send for routine, escalate for refunds/compliance); directly documents agent-assist autonomous send in production.

— Azeon synthesis: 74% rollback rate due to accuracy (hallucinations) and privacy/security; shift from chatbots to agentic agents documented but governance spending exceeds AI development (75-76% trust/security vs 63% technology investment).

HISTORY

  • 2025-Q1: Market focus on autonomous chatbots and agent suggestion tools; autonomous send (confidence-gated auto-send with escalation) not yet prominently demonstrated in public case studies or vendor positioning.
  • 2025-Q2: Enterprise adoption accelerates (78% using autonomous agents in support) but reliability gaps emerge (73% failure rate). Vendors ship adaptive reasoning for multi-step automation; governance requirements harden around ethics reviews and human-in-the-loop controls. Academic benchmarks show 30-35% task success rates, signalling maturity ceiling and adoption risk.
  • 2025-Q3: Zendesk reports 60k+ autonomous requests per quarter in production with 120% increase in generative response quality; Klarna demonstrates 2/3 of service chats automated with 80% AHT reduction. Market growth accelerates (expected $47.8B by 2030) but trust collapse continues—only 27% of organizations trust fully autonomous agents (down from 43%), and Gartner predicts 40% of agentic AI projects will be cancelled by end of 2027 due to cost, ROI clarity, and risk control gaps.
  • 2025-Q4: Zendesk and Microsoft (Dynamics 365) release autonomous send GAs, enabling agents as default responders and autonomous case resolution with email sends. However, adoption hesitancy intensifies: only 15% of IT leaders actively consider fully autonomous agents; real-world deployments show 50% failure rates in token-limited environments. Gartner notes quality rollbacks at Klarna and Duolingo. HBR assessment concludes autonomous agents are not production-ready for consumer-facing customer support, signalling category-wide execution gaps despite product availability.
  • 2026-Jan: Named customer support deployments demonstrate material ROI—Salesforce Agentforce achieved 70% autonomous resolution in peak seasonal load; Klarna's 2.3M-conversation milestone and sub-2-minute resolution times establish scale case study. Contact center analysts report 50% cost-per-call reductions in production. Yet enterprise adoption plateau persists: only 11% in production as of January, with 30% exploring and 38% piloting. Analyst consensus predicts 40% project cancellations by 2027 due to governance gaps, cost surprises, and scaling barriers.
  • 2026-Feb: Zendesk GA ships auto-assist custom action execution without approval (Feb 27), advancing product maturity. Adoption accelerates: 65% of enterprises using AI agents with 81% scaling beyond pilots and 39% realizing customer support impact. However, reliability concerns intensify: Zendesk outage (Feb 26) prevents agent reply sends for 5.5 hours; research synthesis finds 18 months of capability gains yield zero reliability improvements; practitioner analysis documents systematic failure patterns (reward hacking 30%, phantom verification, shortcut spirals). Enterprise scaling barriers persist: only 24% successfully move pilots to production; 40% project cancellations predicted by 2027.
  • 2026-Apr: Zendesk GA'd pre-approved autonomous action execution in March 2026 (refunds, status updates, replies without per-interaction approval), the clearest platform signal yet that autonomous send is moving toward mainstream. But the failure evidence dominates: Klarna rehired humans after CSAT dropped from autonomous deployment, Commonwealth Bank reversed AI-driven layoffs after tribunal challenge, DPD disabled its system after a profanity incident, and Air Canada faced legal liability for autonomous policy fabrications. Temporal.io research quantifies the infrastructure gap—85% per-step reliability yields only 20% end-to-end success on 10-step tasks—while InflectionCX's operator analysis finds 42% of AI initiatives abandoned and 95% of enterprise pilots deliver no measurable P&L impact. Market pressure (79% of consumers still preferring human contact) and compound failure dynamics keep the practice experimental despite product GA.
  • 2026-May (mid-month update): Vendor scale confirmed but consumer reality reveals maturity gap. HubSpot Customer Agent hits 70% autonomous resolution across 9,000+ customers; Text AI at 35,000+ companies with 74% autonomy, Stratco Australia achieving 80% autonomous resolution at 11,000+ chats. Go Autonomous documents autonomous order confirmation sending in production (43% capacity release). Confidence-gated execution standard (85-92% auto-send, 65-80% draft, <65% escalation). However, Ada/NewtonX independent research (May 2026) finds only 24% of consumers in production experienced full autonomous resolution—revealing significant gap between vendor metrics and actual maturity. Practitioner consensus strengthens: MoClaw (May 2026) documents "customer-facing send without approval" as failure pattern and mandates human gating. Trust barrier persists: only 29% of enterprises allow unsupervised actions despite 88% planning budgets (ace8). Selective production deployments demonstrate genuine scale: Salesforce Agentforce resolved 84% of cases autonomously across 380,000+ support interactions in Q1 2026; Lucidya documents end-to-end autonomous resolution completing full workflows (identity validation, policy checking, refunds, system updates) in 4 minutes vs 48 minutes manually at 85% autonomous closure rate. Tier-1 economics solidify: $8,800–$14,300 monthly savings per 1,000 tickets, and IDC/Microsoft data shows 171% first-year ROI in top-quartile deployments. However, the bimodal ROI distribution hardens as the defining signal: analysis of 600+ deployments shows only 12% clear 300%+ ROI while 88% operate at or below break-even; in regulated European markets, AI workflows outnumber autonomous agents 5:1 with 78% citing EU AI Act compliance as the primary barrier. Deployment discipline—not vendor choice—determines which side of the distribution an organisation lands on.
  • 2026-May (late): Metric inflation and rollback evidence dominate late-month signal. Fini Labs documents 71% of support leaders cite "inflated automation metrics" as their top barrier to trusting AI vendors—Decagon claims 80% deflection while Zendesk's enterprise median is 41.2%, a 30-40 point self-report gap. Sinch's survey of 2,500+ leaders finds 74% rolled back or disabled autonomous agents due to governance failures (31% customer data exposure, 22% hallucinations, 16% lack of auditability). Klarna's trajectory reinforces the warning: deployed autonomous agents for 2/3 of chats, reversed after quality degradation, and Gartner now predicts 50% of companies that cut customer service staff for AI will rehire by 2027. Zendesk introduces a billing distinction between "Contained" (AI-only, unverified) and "Verified" (AI with confirmation signals) autonomous resolutions, signaling product maturation through separate tracking of quality tiers. A madewithlove production case study validates a staged autonomy model (shadow mode → internal notes → auto-send) where agent edit signals feed a learning loop—the clearest public documentation of a safe progressive rollout pattern.
  • 2026-Jun: Product GA accelerates across platforms; enterprise rollbacks and governance barriers dominate. Microsoft Dynamics 365 releases Autonomous Email Resolution GA (June 25)—intent identification, autonomous generation and sending, case creation without agent review. Decagon's platform documentation shows production deployments across Hertz, Notion, Rippling, Duolingo, Faire, ClassPass with 70-90%+ autonomous resolution rates (vendors achieving scale 8,000+ enterprise customers). eesel documents Gridwise case: 73% tier-1 autonomous resolution in first month; SoundHound/CCW Digital survey finds 96% of production deployments met/exceeded ROI expectations, 28% resolving complex issues end-to-end without human. Newo.ai reports 99.6% Lead Success Score across 100,000 calls—autonomous voice agents reliably executing business tasks across 22 industries, 30 countries. However, rollback signal strengthens critically: Azeon synthesis finds 74% of deployments rolled back/disabled due to accuracy (hallucinations), privacy/security, customer backlash; governance spending now exceeds AI development. Regulatory landscape tightens: CAN-SPAM penalties reach $53,088/email (adjusted 2026 rate); EU AI Act enforcement accelerates on autonomous decision-making (GDPR Article 22, AI Act Article 13). Thread Transfer analysis of 1,200+ agent projects reveals brutal funnel: 100% demoed, 38% internal pilots, 11% production, 4% with positive ROI at 6 months—surviving patterns are narrow, domain-focused with tight guardrails. SumatoSoft executive survey (72 respondents): 96% maintain human-in-the-loop for customer-facing work, zero respondents reported fully autonomous customer-facing AI—contradicting autonomous send maturity despite product GA. The June landscape shows capability availability (mainstream GA, enterprise customers, production deployments at scale) decoupling sharply from deployment success (74% reversals, 4% survival rate, universal human-in-the-loop requirement). CCW Vegas 2026 reporting and the Sinch 74% rollback figure are now corroborated by CCW industry synthesis, further cementing the bimodal outcome: narrow-scope, guardrail-heavy deployments survive; broad autonomous send without explicit inhibition logic does not.