Perly Consulting │ Beck Eco

The State of Play

A living index of AI adoption across industries — where established practice meets the bleeding edge
UPDATED DAILY

The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.

The Daily Dispatch

A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.

AI Maturity by Domain

Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail

DOMAIN
BLEEDING EDGEESTABLISHED

Self-service content & community management

BLEEDING EDGE

TRAJECTORY

Stalled

AI that generates self-service help content, guided troubleshooting flows, and moderates community forums with suggested responses. Includes FAQ generation and community response drafting; distinct from knowledge base management which maintains structured knowledge rather than user-facing self-service experiences.

OVERVIEW

This practice splits cleanly into two stories with very different maturity profiles. AI-generated self-service content—FAQ drafting, guided troubleshooting, tier-1 ticket resolution—has reached production scale at forward-leaning organisations, with automation rates of 60-80% and measurable CSAT gains. Knowledge base platforms now ship with 98% accuracy benchmarks (Fini), SOC 2/GDPR/HIPAA/PCI-DSS compliance, and 48-hour deployment timelines; the technology layer works at scale. Community moderation, the other half, remains experimental and frequently damaging: accuracy sits around 62% for nuanced harm-distinction tasks (rising to 85-92% for high-signal categories like spam, but degrading sharply for misinformation and cultural context), and high-profile failures continue to erode user trust. Recent evidence reveals critical systematic limitations: AI moderation shows partisan bias in content judgment, fails catastrophically on non-English content (98% of 2,000+ African languages invisible to systems), and produces enforcement failures at billion-user deployment scale (X/Twitter child safety failures documented at scale).

The vendor tooling from Zendesk, Intercom, and community platforms like Discourse is genuinely capable, with GA features shipping steadily since late 2024 and accelerating in June 2026 (Forethought autonomous agents, unified measurement standards, Discourse privacy-first suite). But organisational readiness has not kept pace. Only 25% of organisations have successfully operationalized AI customer service; 75% own tools but haven't integrated them. A paradoxical finding emerged in mid-2026: 74% of enterprises rolled back deployed AI agents after launch, with rollback rates climbing to 81% among organisations with mature governance infrastructure—meaning that better monitoring and evaluation practices actually surfaced failures other orgs missed rather than preventing them. Nearly 40% of new deployments fail due to governance gaps. Additionally, self-service adoption itself faces a supply gap: 69% of consumers attempt self-service first, yet less than one-third of companies actually offer self-service options. When present, only 14% of issues resolve via self-service alone. Consumer acceptance remains mixed: Gartner data shows 64% of customers prefer companies not use AI for service and 53% would switch brands over poor implementation. The primary constraint is knowledge-base quality: RAG-based systems achieve 70-85% accuracy on realistic customer queries, while industry benchmarks show 64% of users abandon self-service assistants within two interactions when answers are wrong or vague—meaning technology maturity does not translate to adoption without upstream content work. Median tier-1 deflection rates sit at 41.2% across enterprise deployments, with top-quartile performers reaching 58.7% and achieving 7.3x cost advantages ($1.84 vs. $13.50 per resolution), yet ecommerce-focused deployments outperform tech industry baselines, achieving 50-70% deflection when carefully scoped. Human-in-the-loop approaches show promise in constrained settings, yet fully autonomous moderation at scale remains bounded by accuracy, fairness, and cultural-context limitations. The practice is bleeding-edge: real value exists for carefully scoped self-service use cases, but broader adoption carries material risks that most organisations are not yet equipped to manage.

CURRENT LANDSCAPE

On the self-service side, named deployments continue to deliver results at scale. Klarna's AI assistant resolved 2.3 million conversations in its first month, handling the workload of 700 full-time agents, cutting resolution time from 11 minutes to under 2 minutes, and reducing repeat inquiries by 25%. Bank of America's Erica surpassed 3 billion customer interactions with 98% resolution without human escalation. TeamSystem automated 80% of repetitive inquiries across 100,000+ monthly questions using Zendesk AI Agents, with 99% email automation and improved CSAT. Intercom's Fin handles 15,000+ conversations per month at 60% resolution for Hospitable, and mature deployments surveyed across 2,400 professionals report 70-95% CSAT. Vendor platforms continue to mature: Fini AI Knowledge Bases reports 98% accuracy with zero hallucinations across 2M+ processed queries, with SOC 2 Type II, ISO 27001, GDPR, PCI-DSS Level 1, and HIPAA certifications, deploying in 48 hours with 20+ native integrations. Zendesk expanded AI agent capabilities to all customers (effective May 11, 2026), removed the Essential/Advanced tier distinction, and released AI-generated procedure drafts to GA—signaling that autonomous AI infrastructure is transitioning from premium to standard across the support industry. Yet the market shows a critical adoption gap: 69% of consumers attempt self-service first, yet less than one-third of companies actually offer self-service tools; where tools exist, only 14% of issues resolve fully via self-service. Industry benchmarks reveal the core constraint: 64% of users abandon AI self-service assistants within the first two interactions when answers are wrong or vague, indicating that knowledge-base quality—not technology—is the primary adoption blocker. Median tier-1 deflection across enterprise deployments sits at 41.2%, with top-quartile performers reaching 58.7% and 7.3x cost advantages. Ecommerce-focused deployments significantly outperform tech industry baselines, achieving 50-70% deflection rates when carefully scoped. Customer acceptance remains a constraint: Gartner survey of 5,728 customers shows 64% prefer companies not use AI for customer service, and 53% would switch brands over poor implementation.

Community moderation tells a different story. Meta deployed in-house AI replacement for human contractors on March 19, 2026, handling scams, terrorism, CSAM, and impersonation at billion-user scale with claimed 60% error reduction and simultaneous launch of a sub-5-second AI support assistant covering 98% of global population by language. Yet operational accuracy remains constrained: spam and scam detection achieves 95-98%, but hate speech falls to 85-92%, misinformation degrades to 70-80%, and self-harm detection reaches only 82-88% accuracy. Recent failures expose deeper systemic problems. Peer-reviewed research from the University of Queensland documents that LLMs exhibit partisan bias in content moderation—larger models internalize ideological framings, causing them to judge criticism of their in-group as more harmful than attacks on opponents. X/Twitter's AI-heavy moderation after Elon Musk's acquisition shows enforcement collapse: child safety reports dropped from 8.9M to only 14,571 removals; hate speech suspensions plummeted from 104,565 to 2,361. Language coverage represents another critical barrier: only 42 of 2,000+ African languages appear meaningfully in AI systems, rendering 98% of languages invisible to moderation—TikTok Kenya evidence shows Q1-Q2 2025 removed 450,000+ videos with no semantic understanding of local content. Character.AI's mass bot deletion in February 2026 caused collateral damage to legitimate content, prompting user backlash. Research on Stack Exchange's 2023 moderation strike documented how AI-generated content flooded review queues and drove moderator attrition. An empirical analysis of 2.3 million moderation decisions across 14 enterprise clients found only 62% accuracy in distinguishing harm advocacy from prevention, with over-moderation costs exceeding $340,000 annually per organisation. Production-scale hybrid architectures (deterministic filters, specialized models, LLM fallback) achieve 93.1% accuracy with 63% cost reduction, confirming that human-in-the-loop design rather than pure automation is the practitioner-validated path. Human-in-the-loop architectures allocate roughly 60-70% of violations to automated action, 20-30% to AI-assisted human review, and 10-15% to human judgment alone. Depop's case study (55M users) demonstrates shift from reactive enforcement to proactive education, with AI surfacing patterns for human judgment rather than autonomous decision-making. Utopamedia's deployments at kaksplus.fi and Anna in Finland show that well-configured hybrid moderation improves moderator wellbeing while maintaining 24/7 safety. theAsianparent's Southeast Asian deployment achieved 95-98% reduction in manual moderation work across 13 countries and 11 languages, deployed in two weeks. New research (arXiv 2026-06-11) on culturally grounded moderation with minority communities in Bangladesh shows that RAG-enhanced LLM responses improve accuracy and context awareness across ethnic and linguistic lines, suggesting a practitioner-validated path forward for previously invisible language populations.

Adoption barriers are widening. Only 25% of organisations have successfully operationalized AI customer service automation; 75% own tools but haven't integrated them into workflows. A critical mid-2026 signal emerged: 74% of enterprises rolled back deployed AI agents after launch, with rollback rates climbing to 81% among organisations with mature governance infrastructure—the rollback rate is higher for better-governed organisations because they actually detect failures that less-mature orgs miss. Approximately 1 in 3 organisations deploying AI self-service fail, primarily due to upstream issues: fragmented knowledge sources, stale content, missing governance processes, and misaligned success metrics (optimizing for containment rather than resolution accuracy). Consumer sentiment is cooling: 1 in 5 consumers report zero benefit from AI customer service (a failure rate 4x higher than general AI), and 70% would switch brands after a single frustrating interaction. A strong adoption barrier emerged in June 2026: brand and community managers explicitly reject full automation for customer-facing moderation, with nearly every customer requesting human oversight for reputation-critical decisions. Regulatory compliance is increasingly a differentiated requirement: DSA, UK Online Safety Act, GDPR Article 22, California AB 587, and EU AI Act now mandate transparency reports, error-rate disclosure, and human-review requirements—governance infrastructure is now a regulated differentiator by jurisdiction. Economics remain uneven. One Intercom Fin user abandoned the platform after $12,000 in spend, citing unsustainable cost-to-resolution ratios in a low-margin business. Across the broader market, AI-driven customer support deployments fail at four times the rate of other AI applications, primarily from governance gaps. A Canadian health-care community pilot showed that human-in-the-loop triage can improve newcomer retention, but that approach requires careful scoping—it does not scale without it. Community managers report that AI-generated answers often lack accuracy for complex products and must respect gating and confidentiality constraints that pure AI systems cannot enforce. Over-control in community moderation erodes trust and participation, as platforms that shift toward heavy automation risk a "museum effect" where communities feel managed rather than peer-driven.

TIER HISTORY

ResearchJan-2023 → Apr-2024
Bleeding EdgeApr-2024 → present

EVIDENCE (133)

— Technical tutorial on practical Zendesk-Notion integration deployed same-day (2026-06-27), supporting unified knowledge access across Help Center search and AI agents; demonstrates real-world implementation of multi-source content consolidation.

— Benchmarking synthesis: median tier-1 deflection 41.2% (top-quartile 58.7%); $1.84 per AI-handled resolution vs $13.50 human baseline; $47.82B market valuation (2030 projection at 25.8% CAGR)—baseline metrics for self-service ROI at enterprise scale.

— Meta deployed LLMs replacing ~50% of human content/advertising review; claims 13% fewer enforcement errors vs humans and 10% more active violations caught; scales language coverage to 98% global population but Oversight Board warns of dual enforcement flaws—major platform automation with acknowledged fairness risks.

— Zendesk GA infrastructure upgrade (July-September 2026 rollout) enables external knowledge source integration (Notion, Confluence, etc.), locale-scoping, and permission-aware help center connections—directly demonstrates ecosystem maturity in knowledge consolidation for self-service.

— 2026 chatbot implementation guide: 68% average deflection rate (Tidio 2024), 60%+ achievable in B2B SaaS within 90 days; distinguishes rule-based, AI-powered, and agentic approaches; details eight use cases from FAQ automation to escalation routing.

— Seven documented deployments across industries (fashion, SaaS, trades, healthcare, recruitment, hospitality): 55-72% automation rates within 60-90 days; common pattern shows speed is universal adoption win independent of industry or business model.

— UK fashion retailer (85k customers) deployed self-service chatbot for order tracking and returns, achieving 61% automation, 4-hour to 28-second first response, and 5-month payback; secondary finding: reduced support-driven refund requests 18%.

— Critical analysis identifies four self-service failure modes (coverage gaps, findability gaps, clarity gaps, trust gaps); reveals deflection metric systematically overstates success by treating abandoned attempts identically to genuine resolutions—negative signal on adoption metrics reliability.

HISTORY

  • 2023-H1: Generative AI entered the self-service and community moderation space with strong consumer demand (67% predict AI will transform service) but significant deployment challenges emerged—content moderation showed only 21% effectiveness in preventing harmful engagement, and algorithmic bias against Global South users highlighted fairness concerns; Stack Overflow community strike revealed resistance to automated moderation policies.
  • 2023-H2: Production-scale deployments accelerated with major platforms running millions of moderation decisions daily under regulatory pressure; AWS released vendor tooling for AI moderation; independent research on Reddit and critical practitioner commentary revealed that enforcement remains labor-intensive and unreliable, with communities developing their own rules against AI-generated content, confirming that autonomous moderation at scale remains unproven.
  • 2024-Q1: Enterprise adoption signals strengthened (70% reimagining journeys, 83% claiming ROI from CX AI), but technical evidence revealed accuracy ceilings—LLM-based moderation achieves only 64% accuracy on rule-based tasks with unpredictable variance; academic research demonstrated community-driven approaches superior to centralized automation for nuanced decisions; real deployments surfaced failures (chatbot hallucinations, false positives in creative writing); industry lacks standardized success metrics, requiring baselines against self rather than benchmarks. Gap between business optimism and technical/practitioner reality remained wide.
  • 2024-Q2: Vendor tooling matured and platform-scale deployments expanded: Meta launched AI-generated content labeling across Facebook, Instagram, and Threads; Zendesk announced AI agents and copilot with production availability; independent case study showed 23% automation gains with 20% time reduction per ticket. However, critical limitations persisted: survey data revealed dismal financial payoff from AI projects; MENA research documented platform moderation failures (77% incorrect Arabic content deletion); field experiments confirmed human-in-the-loop practices remain essential for quality content review. Practitioners reported technology too new to fully understand, with 50%+ resolution rates considered strong progress, suggesting immature operational readiness.
  • 2024-Q3: Vendor expansion continued with Intercom's Fin multilingual support reaching GA across 45 languages with 7-point resolution improvement; real deployments achieved significant self-service gains (35% → 75-80% in single month). Critical practitioner analysis emphasized sustained challenges: AI struggles with cultural context and sentiment nuance in community management, data privacy concerns persist, and adoption barriers remain high. Tension between measurable automation gains and unresolved limitations in handling edge cases and cultural sensitivity continued.
  • 2024-Q4: Vendor maturity accelerated with Zendesk GA of omnichannel AI agents (Esusu: 64% email automation) and Intercom Fin reaching #1 on G2 with 65% end-to-end resolution; B2B adoption surged (Forrester: 89% of buyers adopted GenAI as primary self-service source). Yet fundamental scalability constraints emerged: BCG found 74% of companies struggle to achieve and scale AI value; Meta admitted high moderation error rates and over-enforcement. Core tension unresolved—vendor success stories contrasted with broad enterprise struggles and persistent accuracy/fairness limitations in autonomous systems.
  • 2025-Q1: Vendor tooling solidified with Zendesk AI Agents GA (January) delivering 35-50% ticket reduction and documented savings; Intercom's Fin achieved 94% resolution at Frends with full autonomous handling. Self-service portal adoption accelerated (42% → 73% year-over-year, 10+ hours weekly time savings per CS team). However, systemic constraints persisted: Meta's January policy shift from aggressive automation to Community Notes, achieving 50% fewer mistakes but signaling acknowledgment of prior accuracy problems. Oversight Board independent analysis (February) documented automation limitations—bias amplification, context blindness, disproportionate harm to marginalized groups. Enterprise skepticism deepened: IBM analysis (March) found 99% of developers exploring agents but noted unproven ROI and immature operational foundation. Pattern continued: vendor wins and self-service adoption metrics masked broader challenges in accuracy, fairness, and financial return on AI moderation and automation.
  • 2025-Q2: Self-service adoption continued with documented case studies (Zendesk, Intercom), but organizational deployment proved challenging. Slalom survey (May) of C-suite executives found 69% reported AI adoption slowdown at organizational level, indicating widespread difficulty scaling proofs-of-concept. Community moderation research (April-May) documented systematic failures: Hertie School and Weizenbaum Institut studies revealed commercial moderation APIs amplify bias and fail on linguistic/cultural content. Glue Up analysis reported 62% of companies lost revenue due to biased AI decisions, 61% lost customers (primarily marginalized users). Core tension unresolved: vendor maturity and self-service metrics remained positive, but real-world deployments required human-in-the-loop practices, and autonomous moderation remained constrained by accuracy and fairness limits.
  • 2025-Q3: Vendor self-service automation advanced with Intercom optimizing Fin to 75%+ resolution in production and research on feedback classification (ModernBERT), while adoption surveys showed 90% of CX leaders reported AI tool ROI. Community moderation failures multiplied at scale: Meta's August ban wave affected thousands with false-positive account deletions and mass group removals, signal-posting accuracy failures in billion-user production systems. Regulatory pressures accelerated (EU DSA, UK Online Safety Act), forcing platforms toward AI-human hybrid systems; Meta shifted from aggressive automation to Community Notes. Practitioner and academic research (Stimson Center, Q3) reinforced that community-driven deployment with cultural sensitivity and local context is prerequisite for adoption. Deployment bifurcation evident: vendor platforms with clear scoping (FAQ, tier-1 routing) succeeded; broader organizational rollout remained constrained by integration friction and cultural adoption barriers. Autonomous moderation without oversight remained limited to low-context, high-volume cases; nuanced decisions continued to require human judgment.
  • 2025-Q4: Enterprise adoption accelerated with 82% of leaders using Gen AI weekly and 75% reporting positive ROI (Wharton); self-service platforms reached stable maturity with Hospitable case showing Fin handling 90% of conversations (15,000/month) at 60% resolution. Community moderation problems widened: Cornell research documented "triple threat" of AI-generated content (quality degradation, social disruption, governance challenges), while Daily Hacker News moderation crisis exposed systemic failures (300% error increase, trust dropped to 45%). Trust erosion emerged as core limiting signal alongside accuracy constraints, indicating that vendor maturity in self-service contrasts with continued unsolved challenges in autonomous community moderation at scale.
  • 2026-Jan: Vendor GA features accelerated (Zendesk AI-generated procedures, Microsoft configurable moderation) signaling continued platform maturity, while organizational adoption barriers sharpened. Intercom research on 166 support teams showed 95% workflow transformation and 28% Tier 1 headcount reduction. Yet deployment economics stalled—user abandoned Fin after $12k spend; empirical analysis of 2.3M moderation decisions documented only 62% accuracy with $340k+ annual over-moderation costs; nearly 40% of new deployments failed due to governance gaps. Bifurcation widened: vendor tooling mature with clear case studies, but organizational success increasingly requires human-in-the-loop governance, careful scoping, and business model fit assessment. Moderation accuracy and cost constraints remain fundamental blockers to broader autonomous deployment.
  • 2026-Feb: Vendor GA capabilities continued with benchmarking and optimization focus: Intercom published Transformation Report showing 2,400+ professionals with 70-95% CSAT in mature deployments; Zendesk case study documented TeamSystem automation of 80% of repetitive inquiries at scale. Community governance remained critical blocker: Stack Exchange research on 2023 strike showed AI-generated content flooding moderation pipelines and triggering community exit; Character.AI mass deletion wave (Feb 2026) exposed overzealous automation and collateral damage, signaling trust erosion. Architectural constraints emerged: Fin AI limited to single-agent design and cloud-only deployment, reducing enterprise flexibility. Human-in-the-loop approaches showed promise in specific contexts (Canadian health-care community pilot improved retention). Core tension unresolved: self-service content/FAQ automation achieved 80% ROI-positive deployments, yet autonomous moderation remained constrained by accuracy (62%), cost barriers, and governance gaps affecting 40% of new deployments.
  • 2026-Apr: Deployment failure signals intensify: Forrester predicts ~1 in 3 organisations deploying AI self-service will fail, with root causes upstream in fragmented knowledge, stale content, and governance gaps rather than technology. Qualtrics data shows 1 in 5 consumers saw zero benefit from AI customer service — a failure rate 4x higher than general AI — with analysts warning that AI amplifies cost-cutting without improving experience. Counterweight remains strong at the top: Klarna AI resolved 2.3M conversations in its first month (700 FTE equivalent, resolution time 11 min to under 2 min), and Bank of America Erica surpassed 3B interactions at 98% resolution without escalation. Adoption gap persists: 69% of consumers seek self-service first yet fewer than one-third of companies offer it, and only 14% of issues resolve fully via self-service.
  • 2026-May: Community moderation accuracy failures accumulate from multiple directions. Peer-reviewed University of Queensland research confirms LLMs exhibit partisan ideological bias in moderation decisions; X/Twitter data documents enforcement collapse (8.9M child safety reports yielding only 14,571 removals); and TikTok Kenya evidence shows 98% of 2,000+ African languages are effectively invisible to AI moderation systems. FTC enforcement of the Take It Down Act (effective May 19, 2026) adds regulatory pressure—12 AI platforms received warning letters requiring 48-hour removal of nonconsensual imagery, making compliance infrastructure a legal requirement. On the positive side, AWS Rekognition's GA high-volume image moderation validates production deployments at scale (CoStar 150k images/day, Dream11 100M users, SmugMug 100M+ members), and AI knowledge base benchmarks confirm 98% accuracy with zero hallucinations across 2M+ queries with full SOC 2/GDPR/HIPAA/PCI-DSS compliance, deploying in 48 hours. Consumer rejection of AI self-service deepens: Gartner survey of 5,728 customers finds 64% prefer companies avoid AI for service, and UC Berkeley Haas research identifies five frustration sources that cause 70% of consumers to switch brands after a poor interaction. On self-service economics, B2B SaaS benchmarks show top-quartile performers achieving 58.7% tier-1 deflection with a 7.3x cost advantage ($1.84 vs $13.50 per resolution), but median deployments achieve only 41.2% deflection—indicating wide variance by implementation maturity. Production-scale hybrid moderation architectures (3-layer: deterministic filters, specialised models, LLM fallback) achieve 93.1% accuracy with 63% cost reduction, confirming that human-in-the-loop design rather than pure automation is the practitioner-validated path. The bifurcation between large-scale successes (Klarna, Bank of America Erica) and the structural limitations of autonomous moderation and self-service at the mass-market level continues to widen.
  • 2026-Jun: Vendor platform consolidation accelerates: Zendesk GA's Forethought autonomous agents (June 4) for email/voice/chat available to non-Zendesk customers, GA's expanded AI agent capabilities to all plans (May 11), and standardises unified conversation statuses across channels—signalling autonomous self-service infrastructure shifting from premium to commodity. Zendesk also GA's external knowledge source integration (Notion, Confluence, locale-scoped help centers) enabling permission-aware multi-source knowledge consolidation, with practical implementation documented same-day via Zendesk-Notion connector. Discourse GA's privacy-first AI suite (moderation, semantic search, spam detection, sentiment, auto-triage) with no training on customer data adds a community-platform option. Practitioner evidence on moderation reinforces human-in-the-loop as the only viable path at scale: Utopia AI/Otavamedia deployment improves moderator wellbeing while maintaining 24/7 coverage; theAsianparent achieved 95-98% reduction in manual moderation across 13 Southeast Asian countries in two weeks; but a 20-year moderation industry veteran documents that nearly every brand customer explicitly rejects full automation for reputation-critical decisions. Meta's LLM deployment replacing ~50% of human content/advertising review (13% fewer errors, 10% more violations caught, 98% global language coverage) stands as the most significant platform-scale moderation milestone, but the Oversight Board warns of dual enforcement flaws. Self-service benchmarks firm up: industry synthesis shows median tier-1 deflection at 41.2% (top-quartile 58.7%), $1.84 per AI-handled resolution vs $13.50 human—with seven named multi-industry deployments achieving 55-72% automation within 60-90 days. Critical constraint reconfirmed: KB quality rather than AI model choice drives resolution variance (30% of enterprise help center articles unreviewed 12+ months; one case improved Zendesk AI resolution from 25% to 79% purely via KB restructuring). New RAG research on culturally grounded moderation with Bangladeshi minority communities shows improved accuracy for previously invisible language populations, while the $13B moderation market (14% CAGR) confirms sustained investment despite persistent accuracy and governance constraints.