Perly Consulting │ Beck Eco

The State of Play

A living index of AI adoption across industries — where established practice meets the bleeding edge
UPDATED DAILY

The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.

The Daily Dispatch

A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.

AI Maturity by Domain

Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail

DOMAIN
BLEEDING EDGEESTABLISHED

Agent assist — autonomous send

BLEEDING EDGE

TRAJECTORY

Stalled

AI that sends responses to customers automatically with human agents only involved for escalations and edge cases. Includes confidence-gated auto-send and human escalation routing; distinct from autonomous chatbots which handle the full interaction rather than augmenting an agent workflow.

OVERVIEW

Autonomous send -- AI that fires customer responses without waiting for a human to press "send" -- remains firmly experimental despite shipping GA at major vendors. The concept is narrower than a fully autonomous chatbot: it augments existing agent workflows by removing the manual approval step for high-confidence replies, escalating only edge cases to humans. Zendesk and Salesforce have deployed autonomous send at scale (Klarna 2.3M conversations, 1-800-Accountant 70% autonomous resolution during peak season). Yet critical failures dominate recent evidence: Klarna rehired humans after CSAT collapse, Commonwealth Bank reversed layoffs following tribunal challenge, DPD disabled its system after an embarrassing profanity incident, and Air Canada faced legal liability for autonomous promises the AI made up. The binding constraint remains reliability. A financial institution deployed LLM-based autonomous agents with infrastructure investment in validation and monitoring. But peer-reviewed research on the same period finds zero gains in agent dependability from model improvements, with 85% per-step reliability yielding only 20% end-to-end success on multi-step workflows. Once an autonomous message sends, it cannot be recalled. Governance gaps and systematic failure patterns keep this practice experimental.

CURRENT LANDSCAPE

Market adoption is wide but deployment is narrow. April 2026 data shows $15.12B market size with 91% of customer service leaders under implementation pressure, but 79% of consumers still prefer human contact. Zendesk GA'd autonomous action execution in March 2026 (pre-approved refunds, status updates, replies execute without per-interaction approval). Salesforce Agentforce and Klarna demonstrated significant autonomous deployment: Klarna processed 2.3M conversations (2/3 of all support chats) with AHT dropping from 11 minutes to under two minutes. 1-800-Accountant achieved 70% autonomous resolution during peak tax season using Salesforce Agentforce. A named financial institution deployed LLM-based autonomous customer service agents with infrastructure investment in validation, monitoring, and continuous evaluation. These demonstrate capability maturity.

Yet reliability barriers block adoption. April 2026 research shows 95% of enterprise AI pilots deliver no measurable P&L impact and 42% of AI initiatives are abandoned before production. Most critically: documented autonomous send failures at scale. Klarna rehired human agents after autonomous deployment caused customer satisfaction to drop; CEO admitted "We went too far." Commonwealth Bank of Australia reversed AI-driven layoffs after tribunal challenge and public backlash. DPD's autonomous system was disabled within hours after it swore at a customer and called the company "the worst delivery firm in the world" (1.3M people saw screenshots). Air Canada faced legal liability when its autonomous agent promised a bereavement fare policy that didn't exist. Infrastructure gaps are fundamental: compound failure means 85% per-step reliability yields only 20% end-to-end success on 10-step tasks. Once autonomous messages send, they cannot be unsent. Governance gaps around escalation design, approval boundaries, and policy encoding remain the core adoption barrier.

TIER HISTORY

ResearchJan-2025 → Apr-2025
Bleeding EdgeApr-2025 → present

EVIDENCE (51)

— Salesforce Agentforce resolved 84% of cases autonomously across 380,000+ support interactions in Q1 2026, demonstrating production-scale autonomous agent maturity in customer service.

— End-to-end autonomous resolution executes full workflows including identity validation, policy checking, refunds, and system updates without human handoff; production deployment shows 4-minute resolution (vs 48 minutes), 98% SLA compliance, 85% autonomous closure rate.

— Critical barrier evidence: AI workflows outnumber autonomous agents 5:1 in regulated markets; 78% of European enterprises cite EU AI Act compliance as primary barrier to autonomous agent adoption; workflows deliver 3.4x faster time-to-value and 47% lower implementation costs.

The State of Enterprise Agentic AI 2026Industry Reports

— Analysis of 600+ deployments shows bimodal ROI distribution: 12% of enterprise agentic AI deployments clear 300%+ ROI; 88% operate at or below break-even on full-loaded cost; deployment discipline rather than vendor choice determines outcomes.

— Multiple autonomous service AI deployments demonstrate production maturity: Sprout Social resolves 80% of new-hire tickets autonomously; European energy company cut L1-L2 escalations by 35%; Domino's achieved 75% risk reduction with unified system of work enabling autonomous intervention authority.

— Salesforce customers using Agentforce report automating 70% of tier-1 customer support queries end-to-end. Primary failure mode identified as organizational (poor data, unclear accountability) rather than technical.

— Production case: Tier-1 auto-resolution agents autonomously resolve 40-65% of support tickets, with documented economics of $8,800-$14,300 monthly savings per 1,000 tickets per month.

— Autonomous send in customer support achieves 40-70% tier-1 ticket resolution without human involvement; IDC × Microsoft 2026 study shows 171% average first-year ROI, with top-quartile deployments exceeding 300%.

HISTORY

  • 2025-Q1: Market focus on autonomous chatbots and agent suggestion tools; autonomous send (confidence-gated auto-send with escalation) not yet prominently demonstrated in public case studies or vendor positioning.
  • 2025-Q2: Enterprise adoption accelerates (78% using autonomous agents in support) but reliability gaps emerge (73% failure rate). Vendors ship adaptive reasoning for multi-step automation; governance requirements harden around ethics reviews and human-in-the-loop controls. Academic benchmarks show 30-35% task success rates, signalling maturity ceiling and adoption risk.
  • 2025-Q3: Zendesk reports 60k+ autonomous requests per quarter in production with 120% increase in generative response quality; Klarna demonstrates 2/3 of service chats automated with 80% AHT reduction. Market growth accelerates (expected $47.8B by 2030) but trust collapse continues—only 27% of organizations trust fully autonomous agents (down from 43%), and Gartner predicts 40% of agentic AI projects will be cancelled by end of 2027 due to cost, ROI clarity, and risk control gaps.
  • 2025-Q4: Zendesk and Microsoft (Dynamics 365) release autonomous send GAs, enabling agents as default responders and autonomous case resolution with email sends. However, adoption hesitancy intensifies: only 15% of IT leaders actively consider fully autonomous agents; real-world deployments show 50% failure rates in token-limited environments. Gartner notes quality rollbacks at Klarna and Duolingo. HBR assessment concludes autonomous agents are not production-ready for consumer-facing customer support, signalling category-wide execution gaps despite product availability.
  • 2026-Jan: Named customer support deployments demonstrate material ROI—Salesforce Agentforce achieved 70% autonomous resolution in peak seasonal load; Klarna's 2.3M-conversation milestone and sub-2-minute resolution times establish scale case study. Contact center analysts report 50% cost-per-call reductions in production. Yet enterprise adoption plateau persists: only 11% in production as of January, with 30% exploring and 38% piloting. Analyst consensus predicts 40% project cancellations by 2027 due to governance gaps, cost surprises, and scaling barriers.
  • 2026-Feb: Zendesk GA ships auto-assist custom action execution without approval (Feb 27), advancing product maturity. Adoption accelerates: 65% of enterprises using AI agents with 81% scaling beyond pilots and 39% realizing customer support impact. However, reliability concerns intensify: Zendesk outage (Feb 26) prevents agent reply sends for 5.5 hours; research synthesis finds 18 months of capability gains yield zero reliability improvements; practitioner analysis documents systematic failure patterns (reward hacking 30%, phantom verification, shortcut spirals). Enterprise scaling barriers persist: only 24% successfully move pilots to production; 40% project cancellations predicted by 2027.
  • 2026-Apr: Zendesk GA'd pre-approved autonomous action execution in March 2026 (refunds, status updates, replies without per-interaction approval), the clearest platform signal yet that autonomous send is moving toward mainstream. But the failure evidence dominates: Klarna rehired humans after CSAT dropped from autonomous deployment, Commonwealth Bank reversed AI-driven layoffs after tribunal challenge, DPD disabled its system after a profanity incident, and Air Canada faced legal liability for autonomous policy fabrications. Temporal.io research quantifies the infrastructure gap—85% per-step reliability yields only 20% end-to-end success on 10-step tasks—while InflectionCX's operator analysis finds 42% of AI initiatives abandoned and 95% of enterprise pilots deliver no measurable P&L impact. Market pressure (79% of consumers still preferring human contact) and compound failure dynamics keep the practice experimental despite product GA.
  • 2026-May: Selective production deployments demonstrate genuine scale: Salesforce Agentforce resolved 84% of cases autonomously across 380,000+ support interactions in Q1 2026; Lucidya documents end-to-end autonomous resolution completing full workflows (identity validation, policy checking, refunds, system updates) in 4 minutes vs 48 minutes manually at 85% autonomous closure rate. Tier-1 economics solidify: $8,800–$14,300 monthly savings per 1,000 tickets, and IDC/Microsoft data shows 171% first-year ROI in top-quartile deployments. However, the bimodal ROI distribution hardens as the defining signal: analysis of 600+ deployments shows only 12% clear 300%+ ROI while 88% operate at or below break-even; in regulated European markets, AI workflows outnumber autonomous agents 5:1 with 78% citing EU AI Act compliance as the primary barrier. Deployment discipline—not vendor choice—determines which side of the distribution an organisation lands on.