The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail
AI that generates self-service help content, guided troubleshooting flows, and moderates community forums with suggested responses. Includes FAQ generation and community response drafting; distinct from knowledge base management which maintains structured knowledge rather than user-facing self-service experiences.
This practice splits cleanly into two stories with very different maturity profiles. AI-generated self-service content -- FAQ drafting, guided troubleshooting, tier-1 ticket resolution -- has reached production scale at forward-leaning organisations, with automation rates of 60-80% and measurable CSAT gains. That half works. Community moderation, the other half, remains experimental and frequently damaging: accuracy sits around 62% for nuanced harm-distinction tasks (rising to 85-92% for high-signal categories like spam, but degrading sharply for misinformation and cultural context), and high-profile failures continue to erode user trust. Recent evidence reveals critical systematic limitations: AI moderation shows partisan bias in content judgment, fails catastrophically on non-English content (98% of 2,000+ African languages invisible to systems), and produces enforcement failures at billion-user deployment scale (X/Twitter child safety failures documented at scale).
The vendor tooling from Zendesk and Intercom is genuinely capable, with GA features shipping steadily since late 2024. But organisational readiness has not kept pace. Only 25% of organisations have successfully operationalized AI customer service; 75% own tools but haven't integrated them. Nearly 40% of new deployments fail due to governance gaps. Additionally, self-service adoption itself faces a supply gap: 69% of consumers attempt self-service first, yet less than one-third of companies actually offer self-service options. When present, only 14% of issues resolve via self-service alone. Consumer acceptance remains mixed: Gartner data shows 64% of customers prefer companies not use AI for service and 53% would switch brands over poor implementation. Economics do not generalise -- low-margin businesses have abandoned tools after significant spend, and consumer satisfaction remains mixed (1 in 5 saw zero benefit from AI customer service). Human-in-the-loop approaches show promise in constrained settings, yet fully autonomous moderation at scale remains bounded by accuracy, fairness, and cultural-context limitations. The practice is bleeding-edge: real value exists for carefully scoped self-service use cases, but broader adoption carries material risks that most organisations are not yet equipped to manage.
On the self-service side, named deployments continue to deliver results at scale. Klarna's AI assistant resolved 2.3 million conversations in its first month, handling the workload of 700 full-time agents, cutting resolution time from 11 minutes to under 2 minutes, and reducing repeat inquiries by 25%. Bank of America's Erica surpassed 3 billion customer interactions with 98% resolution without human escalation. TeamSystem automated 80% of repetitive inquiries across 100,000+ monthly questions using Zendesk AI Agents, with 99% email automation and improved CSAT. Intercom's Fin handles 15,000+ conversations per month at 60% resolution for Hospitable, and mature deployments surveyed across 2,400 professionals report 70-95% CSAT. Yet the market shows a critical adoption gap: 69% of consumers attempt self-service first, yet less than one-third of companies actually offer self-service tools; where tools exist, only 14% of issues resolve fully via self-service. Zendesk now ships AI-generated procedure drafts from ticket data, and Microsoft released configurable moderation for Copilot Studio -- both GA. However, customer acceptance remains a constraint: Gartner survey of 5,728 customers shows 64% prefer companies not use AI for customer service, and 53% would switch brands over poor implementation.
Community moderation tells a different story. Meta deployed in-house AI replacement for human contractors on March 19, 2026, handling scams, terrorism, CSAM, and impersonation at billion-user scale with claimed 60% error reduction and simultaneous launch of a sub-5-second AI support assistant covering 98% of global population by language. Yet operational accuracy remains constrained: spam and scam detection achieves 95-98%, but hate speech falls to 85-92%, misinformation degrades to 70-80%, and self-harm detection reaches only 82-88% accuracy. Recent failures expose deeper systemic problems. Peer-reviewed research from the University of Queensland documents that LLMs exhibit partisan bias in content moderation—larger models internalize ideological framings, causing them to judge criticism of their in-group as more harmful than attacks on opponents. X/Twitter's AI-heavy moderation after Elon Musk's acquisition shows enforcement collapse: child safety reports dropped from 8.9M to only 14,571 removals; hate speech suspensions plummeted from 104,565 to 2,361. Language coverage represents another critical barrier: only 42 of 2,000+ African languages appear meaningfully in AI systems, rendering 98% of languages invisible to moderation—TikTok Kenya evidence shows Q1-Q2 2025 removed 450,000+ videos with no semantic understanding of local content. Character.AI's mass bot deletion in February 2026 caused collateral damage to legitimate content, prompting user backlash. Research on Stack Exchange's 2023 moderation strike documented how AI-generated content flooded review queues and drove moderator attrition. An empirical analysis of 2.3 million moderation decisions across 14 enterprise clients found only 62% accuracy in distinguishing harm advocacy from prevention, with over-moderation costs exceeding $340,000 annually per organisation. Human-in-the-loop architectures remain essential, with best practice allocating roughly 60-70% of violations to automated action, 20-30% to AI-assisted human review, and 10-15% to human judgment alone.
Adoption barriers are widening. Only 25% of organisations have successfully operationalized AI customer service automation; 75% own tools but haven't integrated them into workflows. Approximately 1 in 3 organisations deploying AI self-service fail, primarily due to upstream issues: fragmented knowledge sources, stale content, missing governance processes, and misaligned success metrics (optimizing for containment rather than resolution accuracy). Consumer sentiment is cooling: 1 in 5 consumers report zero benefit from AI customer service (a failure rate 4x higher than general AI), and 70% would switch brands after a single frustrating interaction. Economics remain uneven. One Intercom Fin user abandoned the platform after $12,000 in spend, citing unsustainable cost-to-resolution ratios in a low-margin business. Across the broader market, AI-driven customer support deployments fail at four times the rate of other AI applications, primarily from governance gaps. A Canadian health-care community pilot showed that human-in-the-loop triage can improve newcomer retention, but that approach requires careful scoping -- it does not scale without it. Community managers report that AI-generated answers often lack accuracy for complex products and must respect gating and confidentiality constraints that pure AI systems cannot enforce.
— Market data compilation: 78% of organizations use AI in some function, 62% experimenting with agents, 75% will use LLMs for CX by 2026; only 24% report AI fully resolved issues, 76% needed escalation/partial resolution.
— Educational guide on systematic limitations: context understanding (sarcasm, humor, cultural references), bias/fairness, language nuances, false positives/negatives, cultural sensitivities—documents core barriers to autonomous community moderation at scale.
— Analysis of X/Twitter moderation failures post-acquisition: 8.9M posts reported as endangering minors, only 14,571 removed; hate speech suspensions collapsed from 104,565 to 2,361. Documents AI moderation accuracy failures at billion-user deployment scale.
— UC Berkeley Haas study identifies five frustration sources with AI chatbots; Gartner survey of 5,728 customers shows 64% prefer companies not use AI for customer service, 53% would switch brands—direct evidence of self-service adoption barriers.
— CX strategist commentary on chatbot failures: Qualtrics data shows nearly 1 in 5 consumers saw no benefit from AI customer service (~4x higher failure rate than general AI); 70% would switch brands after one frustrating AI experience.
— Best practices framework for AI content moderation: diverse training datasets, robust feedback loops, transparency in decision-making, regular audits for bias—addresses implementation requirements for community safety infrastructure.
— Peer-reviewed study in ACM Transactions on Intelligent Systems and Technology finds ideological personas alter LLM precision/recall in moderation; larger models exhibit partisan bias, prioritizing protection of in-group while downplaying harm to opposing groups.
— Real TikTok Kenya deployment evidence: Q1-Q2 2025 removed 450,000+ videos and banned 43,000+ accounts; only 42 of 2,000+ African languages meaningfully represented in LLMs, leaving 98% languages invisible to moderation systems.