The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail
AI that generates event materials, sales collateral, technical documentation, and product content for specific professional contexts. Includes white paper drafting and event programme generation; distinct from general long-form or short-form content which targets broader audiences.
AI-generated specialist content — white papers, technical documentation, event materials, and sales collateral — has reached mainstream production adoption at significant scale (76% of documentation professionals, 74.2% of new web pages), yet deployment outcomes are bifurcated. Bounded, infrastructure-intensive deployments deliver measurable ROI: MongoDB and Virtual Coffee optimized documentation for AI consumption with empirical improvements in weeks; Graebel deployed Copilot agents for service request processing with faster turnaround and consistent data quality; Tandem Health's ambient AI scribes in European healthcare (MDR Class IIa certified) reduced administrative burden in live clinical use; pharmaceutical firms automated MLR-compliant content generation. These successes share a common structure: narrow scope, mandatory human review, validation pipelines. However, broad-scale adoption reveals systemic failures: EY, Sullivan & Cromwell, Deloitte all deployed AI for high-stakes specialist content (reports, legal filings, consulting studies) and generated fabricated citations—Ontario's audit found 9 of 20 approved healthcare AI systems fabricated information; Columbia's analysis shows 146,932 hallucinated references entered the scientific record in 2025 alone. Hallucination remains fundamental at 3–10% on best models, escalating to 33–51% on complex reasoning. The paradox: 95% of GenAI pilots deliver zero P&L impact despite cost-cutting (Snowflake, Amazon, Cloudflare eliminated thousands of documentation roles while buyers still review docs pre-purchase). The practice has hardened: AI as high-ROI drafting/editing assistant in bounded contexts, with strict human validation required; rejected as automation-grade solution for critical-path content. Event marketing leads adoption; technical documentation and regulated domains proceed with intensified institutional scepticism about accuracy risks.
Adoption metrics confirm mainstream scale with persistent governance gaps. The State of Docs 2026 survey (1,100+ professionals) shows 76% use AI regularly—a 16-point YoY increase—with 56% shifting from drafting to editing and validation; 70% now factor AI into information architecture decisions. Ahrefs: 74.2% of new web pages contain AI-generated content; Siege Media: 97% of content marketers plan AI use in 2026; the document generator market reached $5.6B in 2025. Concrete deployments span event marketing (Captured Celebrations generating 400+ branded posts per event across 500+ corporate events—Adidas, Four Seasons, Sony Music—with 15K–25K impressions within 72 hours), manufacturing (STADLER and ENEOS deployed ChatGPT Enterprise for technical specifications with 30–40% and 80% time savings respectively), and technical platforms (Wonderchat supporting ESAB, Jortt at 92% autonomous resolution, Keytrade Bank; API platforms Apidog and Treblle achieving 75% error reduction and 90% accuracy improvements on grounded tasks). However, quality failures escalate in parallel. EY Canada withdrew a 44-page cybersecurity report after 16 of 27 citations were fabricated; Sullivan & Cromwell admitted 42 hallucinations in a court motion; Ontario's audit found 9 of 20 approved clinical AI systems fabricated patient information. Columbia's study documents 146,932 hallucinated references across 2.5M papers in 2025 alone—a rate accelerating monthly. Only 10% of organisations are fully prepared for AI (OpenText); 72% struggle with daily integration; 80% report no measurable P&L impact. Compliance violations mount: CFPB fines, FDA 200+ enforcement letters in 2025, product recalls due to AI-generated instructions. The expertise paradox emerges: fluent hallucinations bypass expert review—an NRC editor embedded 15 fabricated quotes despite years of warnings about hallucinations. Tech writer role elimination accelerates (Snowflake 47–70, Amazon 16,000+, Cloudflare 1,100 roles) yet 80% of buyers review documentation pre-purchase, indicating cost-cutting disconnected from customer quality expectations. Governance infrastructure remains the bottleneck: eight production patterns (RAG, schema validation, hard guardrails, citation verification, human-in-the-loop) combined reduce hallucination but require intensive engineering. Practitioner consensus: AI as time-saving editing/verification tool with mandatory expert oversight on bounded tasks; rejection as automation-grade solution for critical-path content. Event marketing and event logistics lead adoption; technical, healthcare, and regulated documentation proceed with intensified scepticism.
— AMA 2026 survey: 81% of medical providers using AI (vs 38% in 2023); malpractice analysis shows clinician remains solely liable for AI-generated clinical notes despite accuracy gaps—demonstrates rapid mainstreaming with unresolved governance.
— Product GA: GitBook AI workflow for specialist API documentation (OpenAPI spec generation, supporting docs drafting, developer Q&A, MCP integration); demonstrates ecosystem maturity for structured technical documentation automation.
— Production clinical-documentation deployment metrics: 31% hallucination rate in AI-generated notes vs 20% in physician-written notes (PDQI-9); four failure modes (hallucinations, critical omissions, misattribution, contextual misinterpretation) documented from blinded study.
— MHRA reclassification: AI medical scribes now Class IIa medical device (from Class I), requiring conformity assessments and post-market surveillance; signals specialist healthcare documentation moving into regulated territory.
— Production case studies show structured documentation enables measurable improvements: MongoDB 62% error reduction, Google Cloud 45% accuracy gain, e-commerce 37% escalation reduction on AI-assisted support systems via RAG and semantic architecture.
— Production technical documentation workflow using constraint framework (Rules, Skills, Harnesses); documents team transition from ad-hoc prompting to structured generation with validation pipelines and deterministic refresh cycles.
— 90-engineer SaaS case study mapping success/failure boundaries: AI 100% effective on API refs/runbooks (auto-regenerate on code change), AI outline-only on design docs; onboarding improved 3 weeks to 5 days via boundary-respecting deployment.
— Microsoft Research study of 19 LLMs in 52 professional domains found 25-50% content degradation across banking, healthcare, legal; agentic tool access showed no improvement—documents fundamental model limitations in multi-step specialist document workflows.
2023-H2: ChatGPT white paper tests showed B-minus quality requiring major revision. Hallucination research documented systematic accuracy failures across healthcare and legal domains. No evidence of production adoption at scale; practice remains experimental.
2024-Q1: Foundational research (Northwestern, NUS, Stanford) proved hallucinations are inevitable and quantified failure rates at 30-88% across document, legal, and QA tasks. Early commercial products emerged (Storydoc at 2,500+ customers) but MIT data showed 95% of enterprise GenAI pilots delivered zero ROI. Adoption severely constrained by reliability barriers.
2024-Q2: Hallucination detection advances (Oxford Nature study on semantic entropy) offered mitigation hope, but deployment challenges persisted: JMIR medical study confirmed 28.6% hallucination rates for GPT-4 on systematic reviews, recommending against primary use. Survey data revealed 61% of enterprises experienced accuracy issues with in-house solutions (only 17% rated excellent). Early commercial traction in sales collateral (Highspot showing 2.3x view lift), but practice remains research-stage with high human oversight requirements.
2024-Q3: Broad workplace adoption accelerated—39% of U.S. adults and 24% of workers using GenAI weekly (Federal Reserve survey). Yet specialist content barriers sharpened: JMIR study found ChatGPT 3.5 and Bing at critical hallucination levels for medical documentation; Northwestern research reaffirmed hallucinations are fundamental LLM features, not fixable by architecture; practitioners at law firms adopted tools cautiously, noting they remained "not necessarily that great at legal research." Microsoft's Correction tool faced expert skepticism. Technical writers pragmatically integrated AI for time-saving assistant tasks but rejected it as primary content generator. Practice shows simultaneous adoption momentum and intractable reliability barriers.
2024-Q4: Specialist content adoption shifted into pragmatic integration phase. GitHub Copilot Metrics API reached general availability (October 2024), enabling production tracking of AI-assisted documentation. Real-world deployments confirmed: Rehab for JAPAN measured GitHub Copilot acceptance rates in technical documentation at 30% overall (40% for Ruby, 14% for Java), showing language-dependent effectiveness. Professional Copilot adoption jumped to 89% weekly use by December 2024, with 50,000+ organizations adopted globally. Event marketing emerged as adoption leader: 57% of event marketers expect AI to fundamentally reshape planning and execution, with live case studies of AI-generated event content (Coca-Cola Spiced Shop). Hallucination mitigation matured: AWS shipped production-ready RAG + human-in-the-loop tutorial patterns in November 2024. Critical counterpoint: academic analysis emphasized AI-generated writing remains generic and unsuitable for creative, discovery-oriented specialist content. Practice consolidates around collaboration model: AI as time-saving assistant and personalization engine, human as final validator.
2025-Q1: Specialized reliability research deepened. FailSafeQA benchmark found LLMs hallucinate in 41% of finance-related documentation queries, with open-source models at 22% error rate vs. 5% for commercial—stratifying tool reliability by capability and input sensitivity. Harvard Data Science Review published theoretical framework on AI errors, situating hallucinations as inevitable consequences of design and training data structure, not fixable by model-only approaches. Ecosystem maturity continued: Microsoft's Copilot Usage Dashboard shipped in February 2025, enabling enterprise teams to track acceptance rates and ROI. Security analysis documented code hallucination rates of 48% in AI-generated code with emerging 'slopsquatting' vulnerabilities. Practice remains in pragmatic collaboration phase with increased instrumentation and validation rigor.
2025-Q3: Practitioner adoption accelerated despite persistent reliability barriers. Peer-reviewed study of 83 technical writers confirmed AI time-saving benefits for routine tasks but documented systemic accuracy limitations and ethical concerns—revealing adoption bounded by domain-specific verification requirements. Hallucination benchmarks continued to worsen: industry research found 17-45% hallucination rates across general-purpose LLMs with case studies of critical failures (e.g., financial services AI fabricating brand themes). Conference practitioners and technical writers reinforced consensus: AI works best as time-saving assistant with strict human oversight, not as primary content generator. Survey of 400 B2B marketing executives found adoption lagging—only 11% optimizing content for AI discovery, indicating unreadiness among specialist content creators. Event marketing remained adoption leader with AI applications in personalization and measurement. Practice consolidates around collaboration model with increased validation rigor and measured skepticism of vendor claims about accuracy improvement.
2025-Q4: Adoption reaches maturity with hardened expectations. Wharton survey (November 2025) of 800 senior leaders confirms 46% daily GenAI use (up 17 points), 75% reporting positive ROI, but paradox sharpens: advanced models (o3, o4-mini) hallucinate at 33-79% on benchmarks. Legal domain hallucinations persist at 17-33% despite RAG with 508 cases documented globally. MIT analysis: 95% of AI pilots fail to deliver value. Technical writing community crystallizes durable framework distinguishing "writing with AI" (time-saving assistant) from "writing for AI" (content for AI consumption); emphasizes structured CCMS and human-required validation. Practice transitions from experimental to mainstream augmentation constrained by acceptance of technical barriers—AI integrated where scope is narrow and oversight is systematic, rejected where accuracy is critical-path.
2026-Jan: Healthcare and technical documentation domains show pragmatic adoption amid persistent hallucination concerns. HTA scoping review of hospital documentation systems documents variable accuracy in AI scribes and required human oversight for AI-generated clinical documentation. API documentation platforms (Apidog, Treblle) achieve production deployments with measured gains (75% error reduction, 90% accuracy improvements), but data analysis reveals hallucination improvement only in grounded tasks (0.7-1.5%) with surge in complex reasoning (33-51% for o3). Event content management shifts from creation toward AI-assisted logistics and compliance workflows. GitHub Copilot usage dashboard advances as ecosystem instrumentation. Specialists affirm that hallucinations remain fundamental LLM limitation unsolvable by model scaling, reinforcing adoption boundary that separates "writing with AI" (efficiency) from automation-grade solutions.
2026-Feb: Ecosystem instrumentation accelerates as GitHub expands Copilot metrics to org level and CLI telemetry, enabling enterprises to track adoption patterns. Yet integration barriers persist: Gartner data shows 72% of organizations struggle with daily tool integration, only 6% achieve enterprise-wide rollout; OpenText analysis finds only 10% of organizations fully prepared for AI with 40% of agentic projects likely to be canceled by 2027. Healthcare adoption documented by American Hospital Association emphasizing clinical oversight and hallucination risks in ambient documentation tools. Critical counterevidence: Deloitte refunded AU$440,000 for government report with AI-fabricated references (published with corrections Feb 3), demonstrating real-world specialist content failures. Enterprise adoption data: 15M paid M365 Copilot seats (160% YoY), 4.7M GitHub Copilot subscribers (75% YoY), but quality integration remains bottleneck not scale.
2026-Q2: Manufacturing sector shows pragmatic adoption of AI for specialist technical content. STADLER (230-year-old European manufacturer) deployed ChatGPT Enterprise across 650 employees with 125+ custom GPTs for technical documentation and engineering specifications, achieving 30-40% time savings. ENEOS Materials (Japanese producer) created 1,000+ custom GPTs for plant design and multilingual documentation translation with 80% workflow improvements, demonstrating sustainable patterns in conservative sectors. Event industry benchmark (Tree-Fan Events) documents 71% of workflows AI-capable but only 22% deployed (49-point implementation gap), with 39-43% creating content using AI; identifies critical infrastructure requirements (review workflows, guardrails, role-based permissions) separating experiments from operational systems. Expertise paradox emerges: NRC media editor embedded 15 fabricated quotes despite years of warning about hallucinations, revealing that polished outputs and fluency trust lead experts to skip verification—a structural vulnerability in specialist content workflows. World Bank evaluation synthesis case study documents complete failure (fabricated all evidence) but documents remediation pathway: corrected methodology (summarize components, validate each, synthesize, mandate citations, allow unknown, require line-by-line human review) achieved 100% faithfulness in 2024-2025 follow-up. Regulatory frameworks mature: FDA, EMA, MHRA, ISPE publish guidance on AI-assisted laboratory documentation (protocol drafting, regulatory submissions) with risk-based validation and human-centric design. Named tech firms (PostHog, Airbyte, dbt Labs, Booking.com) confirmed AI agents generating first-draft documentation PRs with 60% completion on first try via context engineering and agentic QA loops. Hallucination risks escalate in high-stakes contexts: ICLR 2026 exposed 50+ peer-reviewed papers with AI-hallucinated citations and datasets passing peer review; JMIR study documented five clinically relevant error categories in AI-transformed psychiatric notes despite stylistic improvement. MIT data from 300 deployments shows 95% of GenAI projects deliver zero P&L impact, attributing failure to organisational readiness gaps rather than model capability. Practice consolidates around infrastructure-intensive reliability models—domain expertise remains non-delegable, but structured workflows increasingly enable safe deployment in bounded contexts.
2026-May: Mainstream production adoption confirmed at scale but governance failures accumulate in parallel. The State of Docs 2026 survey (1,131 professionals) reports a 60-to-76% YoY jump in AI adoption, with writers shifting from drafting to validation and context system building; agentic documentation workflows at Skyflow, Adyen, and dbt Labs auto-detect code changes and update docs, while 41% of organizations without formal documentation teams shipped zero AI features — signaling that information architecture has become a competitive differentiator. Bounded deployments deliver measurable ROI: Graebel deployed Copilot agents for service-request interpretation with faster turnaround and consistent data quality; Tandem Health's ambient AI scribes (MDR Class IIa certified) reduced clinical administrative burden in European NHS pilots; a major pharma firm deployed end-to-end MLR-compliant content pipelines with multimodal LLM and separate validation stages. However, hallucination failures at tier-one firms escalate: Sullivan & Cromwell admitted 42 AI hallucinations in a bankruptcy court motion (internal review failed; global database now tracks 1,334+ legal AI hallucination cases); EY Canada withdrew a 44-page cybersecurity report after 16 of 27 citations were fabricated; Ontario's audit of 20 approved AI note-taking systems found 9 fabricated information, 12 inserted incorrect medications, and 17 omitted mental health findings — a systematic failure rate that persists despite vendor approval processes. Event content deployment reaches corporate scale (500+ events, 400+ branded posts per event, 15K-25K impressions within 72 hours) while tech writer role elimination accelerates (Snowflake, Amazon 16,000+, Cloudflare 1,100 roles) even as 80% of buyers review documentation pre-purchase — cost-cutting disconnected from customer quality expectations.
2026-June: Reliable deployment patterns crystallize amid fresh evidence of structural limitations. Recent peer-reviewed research (IEEE RE 2026) confirms guideline-driven LLM-assisted technical documentation tools deliver 24.4% faster formulation with significantly higher user satisfaction, validating collaboration models where AI handles structural scaffolding. However, Microsoft Research's evaluation of 19 LLMs across 52 professional domains (banking, healthcare, legal) documented 25-50% content degradation in multi-step specialist document workflows, with agentic tool access providing no measurable improvement—a structural model-limitation, not a product-design issue. FDA regulatory enforcement matures: Warning Letter 320-26-58 (April 2026) to Purolea Cosmetics Lab establishes precedent that AI-assisted drafting is acceptable only with mandatory qualified human review and quality-unit approval—rejecting automation-grade deployment. Healthcare providers deploying AI clinical documentation adopt lightweight pre-approval workflows where clinicians maintain authorship before record entry, reducing administrative burden while preserving safety oversight (Tandem Health case study). Enterprise hallucination mitigation shows progress: multi-model verification architecture (480M verified outputs across legal, financial, healthcare) reduces hallucination from 8.3% baseline to 3.2% through cross-model agreement. Product teams standardize 70-30 split workflows (AI structural work, humans judge strategy), delivering PRDs and release notes in 60-90 minutes vs. 4-6 hours. Critical-path specialist content (legal filings, clinical documentation, regulatory submissions) remains governance-intensive despite improved tooling; calibration gaps (models confident but wrong) and fluency-driven expert bypass remain permanent risk factors.
2026-Jun: Governance frameworks intensify as adoption reaches mainstream scale across healthcare and technical domains. MHRA regulatory reclassification moves AI medical scribes from Class I to Class IIa medical device, requiring conformity assessments and post-market surveillance—signaling specialist healthcare documentation entering regulated territory with higher compliance costs. AMA survey data shows 81% of medical providers now using AI (vs 38% in 2023), yet CM&F liability analysis documents clinicians remain solely liable for AI-generated notes despite accuracy gaps—demonstrating rapid adoption outpacing governance resolution. Washington State University study confirms fundamental accuracy barriers: ChatGPT achieves only 60% above-chance accuracy on business hypotheses, with only 16.4% accuracy identifying false statements—critical limitation for technical/scientific content. Production case studies validate boundary-respecting deployment: Flowing Docs documents engineering workflow moving from ad-hoc prompting to Rules/Skills/Harnesses framework; a 90-engineer SaaS successfully deployed AI on API refs and runbooks (auto-regenerate on code change) while restricting AI to outline-only on design docs, improving onboarding from 3 weeks to 5 days. Structured documentation yields measurable improvements in downstream AI-assisted systems: MongoDB 62% error reduction, Google Cloud 45% accuracy improvement, e-commerce 37% escalation reduction on support automation. GitBook product GA adds OpenAPI spec generation and developer Q&A to API documentation workflow. Polygraf production metrics document persistent healthcare challenge: 31% hallucination in AI-generated clinical notes versus 20% in physician-written notes (PDQI-9 framework). Practice consolidates around infrastructure-intensive boundary models: AI excels as drafting/editing assistant on mechanical documentation (API refs, runbooks) with mandatory human judgment preserved on strategy/design decisions; rejected on critical-path content (legal filings, clinical documentation) without multi-step governance pipelines.