The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail
AI that generates self-service help content, guided troubleshooting flows, and moderates community forums with suggested responses. Includes FAQ generation and community response drafting; distinct from knowledge base management which maintains structured knowledge rather than user-facing self-service experiences.
This practice splits cleanly into two stories with very different maturity profiles. AI-generated self-service content—FAQ drafting, guided troubleshooting, tier-1 ticket resolution—has reached production scale at forward-leaning organisations, with automation rates of 60-80% and measurable CSAT gains. Knowledge base platforms now ship with 98% accuracy benchmarks (Fini), SOC 2/GDPR/HIPAA/PCI-DSS compliance, and 48-hour deployment timelines; the technology layer works at scale. Community moderation, the other half, remains experimental and frequently damaging: accuracy sits around 62% for nuanced harm-distinction tasks (rising to 85-92% for high-signal categories like spam, but degrading sharply for misinformation and cultural context), and high-profile failures continue to erode user trust. Recent evidence reveals critical systematic limitations: AI moderation shows partisan bias in content judgment, fails catastrophically on non-English content (98% of 2,000+ African languages invisible to systems), and produces enforcement failures at billion-user deployment scale (X/Twitter child safety failures documented at scale).
The vendor tooling from Zendesk, Intercom, and community platforms like Discourse is genuinely capable, with GA features shipping steadily since late 2024 and accelerating in June 2026 (Forethought autonomous agents, unified measurement standards, Discourse privacy-first suite). But organisational readiness has not kept pace. Only 25% of organisations have successfully operationalized AI customer service; 75% own tools but haven't integrated them. A paradoxical finding emerged in mid-2026: 74% of enterprises rolled back deployed AI agents after launch, with rollback rates climbing to 81% among organisations with mature governance infrastructure—meaning that better monitoring and evaluation practices actually surfaced failures other orgs missed rather than preventing them. Nearly 40% of new deployments fail due to governance gaps. Additionally, self-service adoption itself faces a supply gap: 69% of consumers attempt self-service first, yet less than one-third of companies actually offer self-service options. When present, only 14% of issues resolve via self-service alone. Consumer acceptance remains mixed: Gartner data shows 64% of customers prefer companies not use AI for service and 53% would switch brands over poor implementation. The primary constraint is knowledge-base quality: RAG-based systems achieve 70-85% accuracy on realistic customer queries, while industry benchmarks show 64% of users abandon self-service assistants within two interactions when answers are wrong or vague—meaning technology maturity does not translate to adoption without upstream content work. Median tier-1 deflection rates sit at 41.2% across enterprise deployments, with top-quartile performers reaching 58.7% and achieving 7.3x cost advantages ($1.84 vs. $13.50 per resolution), yet ecommerce-focused deployments outperform tech industry baselines, achieving 50-70% deflection when carefully scoped. Human-in-the-loop approaches show promise in constrained settings, yet fully autonomous moderation at scale remains bounded by accuracy, fairness, and cultural-context limitations. The practice is bleeding-edge: real value exists for carefully scoped self-service use cases, but broader adoption carries material risks that most organisations are not yet equipped to manage.
On the self-service side, named deployments continue to deliver results at scale. Klarna's AI assistant resolved 2.3 million conversations in its first month, handling the workload of 700 full-time agents, cutting resolution time from 11 minutes to under 2 minutes, and reducing repeat inquiries by 25%. Bank of America's Erica surpassed 3 billion customer interactions with 98% resolution without human escalation. TeamSystem automated 80% of repetitive inquiries across 100,000+ monthly questions using Zendesk AI Agents, with 99% email automation and improved CSAT. Intercom's Fin handles 15,000+ conversations per month at 60% resolution for Hospitable, and mature deployments surveyed across 2,400 professionals report 70-95% CSAT. Vendor platforms continue to mature: Fini AI Knowledge Bases reports 98% accuracy with zero hallucinations across 2M+ processed queries, with SOC 2 Type II, ISO 27001, GDPR, PCI-DSS Level 1, and HIPAA certifications, deploying in 48 hours with 20+ native integrations. Zendesk expanded AI agent capabilities to all customers (effective May 11, 2026), removed the Essential/Advanced tier distinction, and released AI-generated procedure drafts to GA—signaling that autonomous AI infrastructure is transitioning from premium to standard across the support industry. Yet the market shows a critical adoption gap: 69% of consumers attempt self-service first, yet less than one-third of companies actually offer self-service tools; where tools exist, only 14% of issues resolve fully via self-service. Industry benchmarks reveal the core constraint: 64% of users abandon AI self-service assistants within the first two interactions when answers are wrong or vague, indicating that knowledge-base quality—not technology—is the primary adoption blocker. Median tier-1 deflection across enterprise deployments sits at 41.2%, with top-quartile performers reaching 58.7% and 7.3x cost advantages. Ecommerce-focused deployments significantly outperform tech industry baselines, achieving 50-70% deflection rates when carefully scoped. Customer acceptance remains a constraint: Gartner survey of 5,728 customers shows 64% prefer companies not use AI for customer service, and 53% would switch brands over poor implementation.
Community moderation tells a different story. Meta deployed in-house AI replacement for human contractors on March 19, 2026, handling scams, terrorism, CSAM, and impersonation at billion-user scale with claimed 60% error reduction and simultaneous launch of a sub-5-second AI support assistant covering 98% of global population by language. Yet operational accuracy remains constrained: spam and scam detection achieves 95-98%, but hate speech falls to 85-92%, misinformation degrades to 70-80%, and self-harm detection reaches only 82-88% accuracy. Recent failures expose deeper systemic problems. Peer-reviewed research from the University of Queensland documents that LLMs exhibit partisan bias in content moderation—larger models internalize ideological framings, causing them to judge criticism of their in-group as more harmful than attacks on opponents. X/Twitter's AI-heavy moderation after Elon Musk's acquisition shows enforcement collapse: child safety reports dropped from 8.9M to only 14,571 removals; hate speech suspensions plummeted from 104,565 to 2,361. Language coverage represents another critical barrier: only 42 of 2,000+ African languages appear meaningfully in AI systems, rendering 98% of languages invisible to moderation—TikTok Kenya evidence shows Q1-Q2 2025 removed 450,000+ videos with no semantic understanding of local content. Character.AI's mass bot deletion in February 2026 caused collateral damage to legitimate content, prompting user backlash. Research on Stack Exchange's 2023 moderation strike documented how AI-generated content flooded review queues and drove moderator attrition. An empirical analysis of 2.3 million moderation decisions across 14 enterprise clients found only 62% accuracy in distinguishing harm advocacy from prevention, with over-moderation costs exceeding $340,000 annually per organisation. Production-scale hybrid architectures (deterministic filters, specialized models, LLM fallback) achieve 93.1% accuracy with 63% cost reduction, confirming that human-in-the-loop design rather than pure automation is the practitioner-validated path. Human-in-the-loop architectures allocate roughly 60-70% of violations to automated action, 20-30% to AI-assisted human review, and 10-15% to human judgment alone. Depop's case study (55M users) demonstrates shift from reactive enforcement to proactive education, with AI surfacing patterns for human judgment rather than autonomous decision-making. Utopamedia's deployments at kaksplus.fi and Anna in Finland show that well-configured hybrid moderation improves moderator wellbeing while maintaining 24/7 safety. theAsianparent's Southeast Asian deployment achieved 95-98% reduction in manual moderation work across 13 countries and 11 languages, deployed in two weeks. New research (arXiv 2026-06-11) on culturally grounded moderation with minority communities in Bangladesh shows that RAG-enhanced LLM responses improve accuracy and context awareness across ethnic and linguistic lines, suggesting a practitioner-validated path forward for previously invisible language populations.
Adoption barriers are widening. Only 25% of organisations have successfully operationalized AI customer service automation; 75% own tools but haven't integrated them into workflows. A critical mid-2026 signal emerged: 74% of enterprises rolled back deployed AI agents after launch, with rollback rates climbing to 81% among organisations with mature governance infrastructure—the rollback rate is higher for better-governed organisations because they actually detect failures that less-mature orgs miss. Approximately 1 in 3 organisations deploying AI self-service fail, primarily due to upstream issues: fragmented knowledge sources, stale content, missing governance processes, and misaligned success metrics (optimizing for containment rather than resolution accuracy). Consumer sentiment is cooling: 1 in 5 consumers report zero benefit from AI customer service (a failure rate 4x higher than general AI), and 70% would switch brands after a single frustrating interaction. A strong adoption barrier emerged in June 2026: brand and community managers explicitly reject full automation for customer-facing moderation, with nearly every customer requesting human oversight for reputation-critical decisions. Regulatory compliance is increasingly a differentiated requirement: DSA, UK Online Safety Act, GDPR Article 22, California AB 587, and EU AI Act now mandate transparency reports, error-rate disclosure, and human-review requirements—governance infrastructure is now a regulated differentiator by jurisdiction. Economics remain uneven. One Intercom Fin user abandoned the platform after $12,000 in spend, citing unsustainable cost-to-resolution ratios in a low-margin business. Across the broader market, AI-driven customer support deployments fail at four times the rate of other AI applications, primarily from governance gaps. A Canadian health-care community pilot showed that human-in-the-loop triage can improve newcomer retention, but that approach requires careful scoping—it does not scale without it. Community managers report that AI-generated answers often lack accuracy for complex products and must respect gating and confidentiality constraints that pure AI systems cannot enforce. Over-control in community moderation erodes trust and participation, as platforms that shift toward heavy automation risk a "museum effect" where communities feel managed rather than peer-driven.
— Technical tutorial on practical Zendesk-Notion integration deployed same-day (2026-06-27), supporting unified knowledge access across Help Center search and AI agents; demonstrates real-world implementation of multi-source content consolidation.
— Benchmarking synthesis: median tier-1 deflection 41.2% (top-quartile 58.7%); $1.84 per AI-handled resolution vs $13.50 human baseline; $47.82B market valuation (2030 projection at 25.8% CAGR)—baseline metrics for self-service ROI at enterprise scale.
— Meta deployed LLMs replacing ~50% of human content/advertising review; claims 13% fewer enforcement errors vs humans and 10% more active violations caught; scales language coverage to 98% global population but Oversight Board warns of dual enforcement flaws—major platform automation with acknowledged fairness risks.
— Zendesk GA infrastructure upgrade (July-September 2026 rollout) enables external knowledge source integration (Notion, Confluence, etc.), locale-scoping, and permission-aware help center connections—directly demonstrates ecosystem maturity in knowledge consolidation for self-service.
— 2026 chatbot implementation guide: 68% average deflection rate (Tidio 2024), 60%+ achievable in B2B SaaS within 90 days; distinguishes rule-based, AI-powered, and agentic approaches; details eight use cases from FAQ automation to escalation routing.
— Seven documented deployments across industries (fashion, SaaS, trades, healthcare, recruitment, hospitality): 55-72% automation rates within 60-90 days; common pattern shows speed is universal adoption win independent of industry or business model.
— UK fashion retailer (85k customers) deployed self-service chatbot for order tracking and returns, achieving 61% automation, 4-hour to 28-second first response, and 5-month payback; secondary finding: reduced support-driven refund requests 18%.
— Critical analysis identifies four self-service failure modes (coverage gaps, findability gaps, clarity gaps, trust gaps); reveals deflection metric systematically overstates success by treating abandoned attempts identically to genuine resolutions—negative signal on adoption metrics reliability.