Perly Consulting │ Beck Eco

The State of Play

A living index of AI adoption across industries — where established practice meets the bleeding edge
UPDATED DAILY

The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.

The Daily Dispatch

A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.

AI Maturity by Domain

Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail

DOMAIN
BLEEDING EDGEESTABLISHED

Self-service content & community management

BLEEDING EDGE

TRAJECTORY

Stalled

AI that generates self-service help content, guided troubleshooting flows, and moderates community forums with suggested responses. Includes FAQ generation and community response drafting; distinct from knowledge base management which maintains structured knowledge rather than user-facing self-service experiences.

OVERVIEW

This practice splits cleanly into two stories with very different maturity profiles. AI-generated self-service content -- FAQ drafting, guided troubleshooting, tier-1 ticket resolution -- has reached production scale at forward-leaning organisations, with automation rates of 60-80% and measurable CSAT gains. That half works. Community moderation, the other half, remains experimental and frequently damaging: accuracy sits around 62% for nuanced harm-distinction tasks (rising to 85-92% for high-signal categories like spam, but degrading sharply for misinformation and cultural context), and high-profile failures continue to erode user trust. Recent evidence reveals critical systematic limitations: AI moderation shows partisan bias in content judgment, fails catastrophically on non-English content (98% of 2,000+ African languages invisible to systems), and produces enforcement failures at billion-user deployment scale (X/Twitter child safety failures documented at scale).

The vendor tooling from Zendesk and Intercom is genuinely capable, with GA features shipping steadily since late 2024. But organisational readiness has not kept pace. Only 25% of organisations have successfully operationalized AI customer service; 75% own tools but haven't integrated them. Nearly 40% of new deployments fail due to governance gaps. Additionally, self-service adoption itself faces a supply gap: 69% of consumers attempt self-service first, yet less than one-third of companies actually offer self-service options. When present, only 14% of issues resolve via self-service alone. Consumer acceptance remains mixed: Gartner data shows 64% of customers prefer companies not use AI for service and 53% would switch brands over poor implementation. Economics do not generalise -- low-margin businesses have abandoned tools after significant spend, and consumer satisfaction remains mixed (1 in 5 saw zero benefit from AI customer service). Human-in-the-loop approaches show promise in constrained settings, yet fully autonomous moderation at scale remains bounded by accuracy, fairness, and cultural-context limitations. The practice is bleeding-edge: real value exists for carefully scoped self-service use cases, but broader adoption carries material risks that most organisations are not yet equipped to manage.

CURRENT LANDSCAPE

On the self-service side, named deployments continue to deliver results at scale. Klarna's AI assistant resolved 2.3 million conversations in its first month, handling the workload of 700 full-time agents, cutting resolution time from 11 minutes to under 2 minutes, and reducing repeat inquiries by 25%. Bank of America's Erica surpassed 3 billion customer interactions with 98% resolution without human escalation. TeamSystem automated 80% of repetitive inquiries across 100,000+ monthly questions using Zendesk AI Agents, with 99% email automation and improved CSAT. Intercom's Fin handles 15,000+ conversations per month at 60% resolution for Hospitable, and mature deployments surveyed across 2,400 professionals report 70-95% CSAT. Yet the market shows a critical adoption gap: 69% of consumers attempt self-service first, yet less than one-third of companies actually offer self-service tools; where tools exist, only 14% of issues resolve fully via self-service. Zendesk now ships AI-generated procedure drafts from ticket data, and Microsoft released configurable moderation for Copilot Studio -- both GA. However, customer acceptance remains a constraint: Gartner survey of 5,728 customers shows 64% prefer companies not use AI for customer service, and 53% would switch brands over poor implementation.

Community moderation tells a different story. Meta deployed in-house AI replacement for human contractors on March 19, 2026, handling scams, terrorism, CSAM, and impersonation at billion-user scale with claimed 60% error reduction and simultaneous launch of a sub-5-second AI support assistant covering 98% of global population by language. Yet operational accuracy remains constrained: spam and scam detection achieves 95-98%, but hate speech falls to 85-92%, misinformation degrades to 70-80%, and self-harm detection reaches only 82-88% accuracy. Recent failures expose deeper systemic problems. Peer-reviewed research from the University of Queensland documents that LLMs exhibit partisan bias in content moderation—larger models internalize ideological framings, causing them to judge criticism of their in-group as more harmful than attacks on opponents. X/Twitter's AI-heavy moderation after Elon Musk's acquisition shows enforcement collapse: child safety reports dropped from 8.9M to only 14,571 removals; hate speech suspensions plummeted from 104,565 to 2,361. Language coverage represents another critical barrier: only 42 of 2,000+ African languages appear meaningfully in AI systems, rendering 98% of languages invisible to moderation—TikTok Kenya evidence shows Q1-Q2 2025 removed 450,000+ videos with no semantic understanding of local content. Character.AI's mass bot deletion in February 2026 caused collateral damage to legitimate content, prompting user backlash. Research on Stack Exchange's 2023 moderation strike documented how AI-generated content flooded review queues and drove moderator attrition. An empirical analysis of 2.3 million moderation decisions across 14 enterprise clients found only 62% accuracy in distinguishing harm advocacy from prevention, with over-moderation costs exceeding $340,000 annually per organisation. Human-in-the-loop architectures remain essential, with best practice allocating roughly 60-70% of violations to automated action, 20-30% to AI-assisted human review, and 10-15% to human judgment alone.

Adoption barriers are widening. Only 25% of organisations have successfully operationalized AI customer service automation; 75% own tools but haven't integrated them into workflows. Approximately 1 in 3 organisations deploying AI self-service fail, primarily due to upstream issues: fragmented knowledge sources, stale content, missing governance processes, and misaligned success metrics (optimizing for containment rather than resolution accuracy). Consumer sentiment is cooling: 1 in 5 consumers report zero benefit from AI customer service (a failure rate 4x higher than general AI), and 70% would switch brands after a single frustrating interaction. Economics remain uneven. One Intercom Fin user abandoned the platform after $12,000 in spend, citing unsustainable cost-to-resolution ratios in a low-margin business. Across the broader market, AI-driven customer support deployments fail at four times the rate of other AI applications, primarily from governance gaps. A Canadian health-care community pilot showed that human-in-the-loop triage can improve newcomer retention, but that approach requires careful scoping -- it does not scale without it. Community managers report that AI-generated answers often lack accuracy for complex products and must respect gating and confidentiality constraints that pure AI systems cannot enforce.

TIER HISTORY

ResearchJan-2023 → Apr-2024
Bleeding EdgeApr-2024 → present

EVIDENCE (87)

— Market data compilation: 78% of organizations use AI in some function, 62% experimenting with agents, 75% will use LLMs for CX by 2026; only 24% report AI fully resolved issues, 76% needed escalation/partial resolution.

— Educational guide on systematic limitations: context understanding (sarcasm, humor, cultural references), bias/fairness, language nuances, false positives/negatives, cultural sensitivities—documents core barriers to autonomous community moderation at scale.

— Analysis of X/Twitter moderation failures post-acquisition: 8.9M posts reported as endangering minors, only 14,571 removed; hate speech suspensions collapsed from 104,565 to 2,361. Documents AI moderation accuracy failures at billion-user deployment scale.

— UC Berkeley Haas study identifies five frustration sources with AI chatbots; Gartner survey of 5,728 customers shows 64% prefer companies not use AI for customer service, 53% would switch brands—direct evidence of self-service adoption barriers.

— CX strategist commentary on chatbot failures: Qualtrics data shows nearly 1 in 5 consumers saw no benefit from AI customer service (~4x higher failure rate than general AI); 70% would switch brands after one frustrating AI experience.

— Best practices framework for AI content moderation: diverse training datasets, robust feedback loops, transparency in decision-making, regular audits for bias—addresses implementation requirements for community safety infrastructure.

— Peer-reviewed study in ACM Transactions on Intelligent Systems and Technology finds ideological personas alter LLM precision/recall in moderation; larger models exhibit partisan bias, prioritizing protection of in-group while downplaying harm to opposing groups.

— Real TikTok Kenya deployment evidence: Q1-Q2 2025 removed 450,000+ videos and banned 43,000+ accounts; only 42 of 2,000+ African languages meaningfully represented in LLMs, leaving 98% languages invisible to moderation systems.

HISTORY

  • 2023-H1: Generative AI entered the self-service and community moderation space with strong consumer demand (67% predict AI will transform service) but significant deployment challenges emerged—content moderation showed only 21% effectiveness in preventing harmful engagement, and algorithmic bias against Global South users highlighted fairness concerns; Stack Overflow community strike revealed resistance to automated moderation policies.
  • 2023-H2: Production-scale deployments accelerated with major platforms running millions of moderation decisions daily under regulatory pressure; AWS released vendor tooling for AI moderation; independent research on Reddit and critical practitioner commentary revealed that enforcement remains labor-intensive and unreliable, with communities developing their own rules against AI-generated content, confirming that autonomous moderation at scale remains unproven.
  • 2024-Q1: Enterprise adoption signals strengthened (70% reimagining journeys, 83% claiming ROI from CX AI), but technical evidence revealed accuracy ceilings—LLM-based moderation achieves only 64% accuracy on rule-based tasks with unpredictable variance; academic research demonstrated community-driven approaches superior to centralized automation for nuanced decisions; real deployments surfaced failures (chatbot hallucinations, false positives in creative writing); industry lacks standardized success metrics, requiring baselines against self rather than benchmarks. Gap between business optimism and technical/practitioner reality remained wide.
  • 2024-Q2: Vendor tooling matured and platform-scale deployments expanded: Meta launched AI-generated content labeling across Facebook, Instagram, and Threads; Zendesk announced AI agents and copilot with production availability; independent case study showed 23% automation gains with 20% time reduction per ticket. However, critical limitations persisted: survey data revealed dismal financial payoff from AI projects; MENA research documented platform moderation failures (77% incorrect Arabic content deletion); field experiments confirmed human-in-the-loop practices remain essential for quality content review. Practitioners reported technology too new to fully understand, with 50%+ resolution rates considered strong progress, suggesting immature operational readiness.
  • 2024-Q3: Vendor expansion continued with Intercom's Fin multilingual support reaching GA across 45 languages with 7-point resolution improvement; real deployments achieved significant self-service gains (35% → 75-80% in single month). Critical practitioner analysis emphasized sustained challenges: AI struggles with cultural context and sentiment nuance in community management, data privacy concerns persist, and adoption barriers remain high. Tension between measurable automation gains and unresolved limitations in handling edge cases and cultural sensitivity continued.
  • 2024-Q4: Vendor maturity accelerated with Zendesk GA of omnichannel AI agents (Esusu: 64% email automation) and Intercom Fin reaching #1 on G2 with 65% end-to-end resolution; B2B adoption surged (Forrester: 89% of buyers adopted GenAI as primary self-service source). Yet fundamental scalability constraints emerged: BCG found 74% of companies struggle to achieve and scale AI value; Meta admitted high moderation error rates and over-enforcement. Core tension unresolved—vendor success stories contrasted with broad enterprise struggles and persistent accuracy/fairness limitations in autonomous systems.
  • 2025-Q1: Vendor tooling solidified with Zendesk AI Agents GA (January) delivering 35-50% ticket reduction and documented savings; Intercom's Fin achieved 94% resolution at Frends with full autonomous handling. Self-service portal adoption accelerated (42% → 73% year-over-year, 10+ hours weekly time savings per CS team). However, systemic constraints persisted: Meta's January policy shift from aggressive automation to Community Notes, achieving 50% fewer mistakes but signaling acknowledgment of prior accuracy problems. Oversight Board independent analysis (February) documented automation limitations—bias amplification, context blindness, disproportionate harm to marginalized groups. Enterprise skepticism deepened: IBM analysis (March) found 99% of developers exploring agents but noted unproven ROI and immature operational foundation. Pattern continued: vendor wins and self-service adoption metrics masked broader challenges in accuracy, fairness, and financial return on AI moderation and automation.
  • 2025-Q2: Self-service adoption continued with documented case studies (Zendesk, Intercom), but organizational deployment proved challenging. Slalom survey (May) of C-suite executives found 69% reported AI adoption slowdown at organizational level, indicating widespread difficulty scaling proofs-of-concept. Community moderation research (April-May) documented systematic failures: Hertie School and Weizenbaum Institut studies revealed commercial moderation APIs amplify bias and fail on linguistic/cultural content. Glue Up analysis reported 62% of companies lost revenue due to biased AI decisions, 61% lost customers (primarily marginalized users). Core tension unresolved: vendor maturity and self-service metrics remained positive, but real-world deployments required human-in-the-loop practices, and autonomous moderation remained constrained by accuracy and fairness limits.
  • 2025-Q3: Vendor self-service automation advanced with Intercom optimizing Fin to 75%+ resolution in production and research on feedback classification (ModernBERT), while adoption surveys showed 90% of CX leaders reported AI tool ROI. Community moderation failures multiplied at scale: Meta's August ban wave affected thousands with false-positive account deletions and mass group removals, signal-posting accuracy failures in billion-user production systems. Regulatory pressures accelerated (EU DSA, UK Online Safety Act), forcing platforms toward AI-human hybrid systems; Meta shifted from aggressive automation to Community Notes. Practitioner and academic research (Stimson Center, Q3) reinforced that community-driven deployment with cultural sensitivity and local context is prerequisite for adoption. Deployment bifurcation evident: vendor platforms with clear scoping (FAQ, tier-1 routing) succeeded; broader organizational rollout remained constrained by integration friction and cultural adoption barriers. Autonomous moderation without oversight remained limited to low-context, high-volume cases; nuanced decisions continued to require human judgment.
  • 2025-Q4: Enterprise adoption accelerated with 82% of leaders using Gen AI weekly and 75% reporting positive ROI (Wharton); self-service platforms reached stable maturity with Hospitable case showing Fin handling 90% of conversations (15,000/month) at 60% resolution. Community moderation problems widened: Cornell research documented "triple threat" of AI-generated content (quality degradation, social disruption, governance challenges), while Daily Hacker News moderation crisis exposed systemic failures (300% error increase, trust dropped to 45%). Trust erosion emerged as core limiting signal alongside accuracy constraints, indicating that vendor maturity in self-service contrasts with continued unsolved challenges in autonomous community moderation at scale.
  • 2026-Jan: Vendor GA features accelerated (Zendesk AI-generated procedures, Microsoft configurable moderation) signaling continued platform maturity, while organizational adoption barriers sharpened. Intercom research on 166 support teams showed 95% workflow transformation and 28% Tier 1 headcount reduction. Yet deployment economics stalled—user abandoned Fin after $12k spend; empirical analysis of 2.3M moderation decisions documented only 62% accuracy with $340k+ annual over-moderation costs; nearly 40% of new deployments failed due to governance gaps. Bifurcation widened: vendor tooling mature with clear case studies, but organizational success increasingly requires human-in-the-loop governance, careful scoping, and business model fit assessment. Moderation accuracy and cost constraints remain fundamental blockers to broader autonomous deployment.
  • 2026-Feb: Vendor GA capabilities continued with benchmarking and optimization focus: Intercom published Transformation Report showing 2,400+ professionals with 70-95% CSAT in mature deployments; Zendesk case study documented TeamSystem automation of 80% of repetitive inquiries at scale. Community governance remained critical blocker: Stack Exchange research on 2023 strike showed AI-generated content flooding moderation pipelines and triggering community exit; Character.AI mass deletion wave (Feb 2026) exposed overzealous automation and collateral damage, signaling trust erosion. Architectural constraints emerged: Fin AI limited to single-agent design and cloud-only deployment, reducing enterprise flexibility. Human-in-the-loop approaches showed promise in specific contexts (Canadian health-care community pilot improved retention). Core tension unresolved: self-service content/FAQ automation achieved 80% ROI-positive deployments, yet autonomous moderation remained constrained by accuracy (62%), cost barriers, and governance gaps affecting 40% of new deployments.
  • 2026-Apr: Deployment failure signals intensify: Forrester predicts ~1 in 3 organisations deploying AI self-service will fail, with root causes upstream in fragmented knowledge, stale content, and governance gaps rather than technology. Qualtrics data shows 1 in 5 consumers saw zero benefit from AI customer service — a failure rate 4x higher than general AI — with analysts warning that AI amplifies cost-cutting without improving experience. Counterweight remains strong at the top: Klarna AI resolved 2.3M conversations in its first month (700 FTE equivalent, resolution time 11 min to under 2 min), and Bank of America Erica surpassed 3B interactions at 98% resolution without escalation. Adoption gap persists: 69% of consumers seek self-service first yet fewer than one-third of companies offer it, and only 14% of issues resolve fully via self-service.
  • 2026-May: Community moderation accuracy failures accumulate from multiple directions. Peer-reviewed University of Queensland research confirms LLMs exhibit partisan ideological bias in moderation decisions; X/Twitter data documents enforcement collapse (8.9M child safety reports yielding only 14,571 removals); and TikTok Kenya evidence shows 98% of 2,000+ African languages are effectively invisible to AI moderation systems. Consumer rejection of AI self-service deepens: Gartner survey of 5,728 customers finds 64% prefer companies avoid AI for service, and UC Berkeley Haas research identifies five frustration sources that cause 70% of consumers to switch brands after a poor interaction. The bifurcation between large-scale successes (Klarna, Bank of America Erica) and the structural limitations of autonomous moderation and self-service at the mass-market level continues to widen.