Perly Consulting │ Beck Eco

The State of Play

A living index of AI adoption across industries — where established practice meets the bleeding edge
UPDATED DAILY

The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.

The Daily Dispatch

A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.

AI Maturity by Domain

Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail

DOMAIN
BLEEDING EDGEESTABLISHED

Narrative generation from data

LEADING EDGE

TRAJECTORY

Stalled

AI that generates written narratives and explanations from data, turning charts and tables into human-readable stories. Includes automated insight commentary and report narrative sections; distinct from dashboard generation which presents visual rather than written output.

OVERVIEW

Narrative generation from data -- AI systems that convert raw data, charts, and tables into written explanations -- has reached the point where every major BI platform ships the capability, yet most organisations still treat its output as a draft requiring human review. That gap between feature availability and trusted autonomy defines the practice's leading-edge status. Microsoft, Tableau, Oracle, and Google all offer GA narrative features, and forward-leaning deployments in financial services, pharmaceutical trials, and newsroom automation demonstrate real value. But hallucination remains an architectural constraint, not an engineering bug to be patched: OpenAI research has confirmed that LLM confabulation is mathematically inevitable, and industry surveys report hallucination rates as high as 79% in uncontrolled settings. The result is a practice where the tooling is mature and demand is strong, but production use concentrates in structured, compliance-adjacent domains with mandatory human validation. The question facing adopters is not whether narrative generation works -- it does -- but whether their governance and review processes can keep pace with what the models produce.

CURRENT LANDSCAPE

The vendor landscape has consolidated around embedding narrative generation directly into enterprise platforms, with Microsoft actively pushing adoption through architectural defaults. In April 2026, Microsoft updated Power BI's Narrative Visual to default to Copilot mode when users hold a license, shifted the 10,000-character limit to enable richer prompts, and expanded in-report Copilot to mobile with voice dictation—signaling platform-level automation of the shift from manual to AI-assisted narratives. Microsoft is discontinuing Power BI Q&A in December 2026, funnelling users to Copilot-based narrative summarisation that generates textual insights from report metadata. Tableau completed a similar consolidation in January 2025, replacing Data Stories with Tableau Pulse. Oracle EPM Cloud offers GA narrative summaries for financial reporting. AWS launched HealthScribe in March 2026, a HIPAA-eligible clinical documentation service generating notes from patient conversations. The tooling has reached baseline availability across major platforms, with pricing established ($20/user/month for Power BI Premium Per User licensing) and regional availability documented.

Deployment evidence demonstrates narrative generation working at scale in regulated, structured-data domains. Tandem Health deployed 375,000 AI-generated clinical notes across a European health system, showing narrative generation at enterprise scale in healthcare. Narrativa reports production use in pharmaceutical clinical trials with knowledge graph grounding. FactSet and the Associated Press continue narrative generation in financial services. Gartner projects 75% of analytics content will be AI-contextualized by 2027, signaling mainstream adoption trajectory. Where data governance is solved, adoption accelerates sharply: an enterprise case study documented 84% adoption of Power BI narrative features within 12 days after implementing data preparation standards, with 40% cycle-time reduction. Major consulting firms position narrative generation as foundational to decision intelligence, documenting $0.84M average annual ROI and 11.4-week time-to-production. Major consulting firms (RSM US, Q-Advise) now position narrative generation as a core expectation in BI modernization and digital transformation initiatives, indicating analyst-driven adoption momentum.

Yet the adoption ceiling remains organizational, not technical. Consulting analysis of 450 million Copilot-licensed users found that 40% of deployments stall or fail within 6 months and only 3% report meaningful ROI. Root causes: data governance concerns (52% cite hallucinations as the primary blocker), cost-benefit uncertainty without clear financial frameworks, and change management gaps. Hallucination remains a measurable constraint: industry benchmarking shows rates of 0.7%-0.8% on summarization tasks but 15.6%-18.7% on medical and legal domains, with no models immune. Real-world testing of finance narratives reveals competence for basic narration but unreliability for causal analysis—model consistently misattributes cause of data movements, requiring analyst review. Production deployments increasingly document a five-layer mitigation stack -- RAG for grounding, guardrails for policy enforcement, automated evaluations, human-in-the-loop review, and observability for auditing -- to move narrative generation from pilot to trusted production. Practitioners report the bottleneck has shifted: where narrative generation is deployed, organizations report the constraint moves from analysis production to review and action, requiring organizational maturity to capture the value. The question facing the field is no longer whether narrative generation works, but whether governance, cost, and organizational change can unlock adoption beyond early-adopter pockets.

TIER HISTORY

ResearchJan-2019 → Jan-2019
Bleeding EdgeJan-2019 → Jul-2024
Leading EdgeJul-2024 → present

EVIDENCE (127)

— CFO-focused risk assessment documenting non-determinism, data fabrication, RLS bypass, and audit verification gaps in narrative generation; cites $67.4B annual AI hallucination cost and 47% of executives making major decisions on hallucinated content.

— Practical adoption guide documenting narrative summary capabilities and prompt engineering strategies; signals organizational demand for narrative generation training with caveats on output validation and governance.

— Comprehensive hallucination benchmarking across 40+ models: 0.7%-0.8% on summarization, 15.6%-18.7% on medical/legal domains; no model immune to hallucinations—establishes measurable reliability constraint for narrative generation systems.

— Multi-vendor comparison of narrative and NLP capabilities showing competitive ecosystem maturity; Power BI Copilot default enablement (Sept 2025), grounded references feature (Jan 2026), widespread enterprise rollout.

— Analytics consulting firm realistic finance testing shows Copilot competent for narration/exploration but unreliable for causal analysis, consistently misattributing cause of data movements—critical governance constraint on autonomous narrative deployment.

— Tech analyst coverage of Power BI Desktop v2.153.910.0 narrative visual enhancements: 10,000 character limit for richer prompts, forced Copilot default for licensed users, mobile expansion—signals active user experience tuning toward AI-assisted narratives.

— Forensic audit of ChatGPT narrative generation on market analysis showing fabricated quantitative data, persistent negative bias, misattribution of causality—credibility rating C (5.2/10), direct evidence of narrative system limitations on data analysis.

Power BI April 2026 Feature SummaryProduct Launches

— Official Microsoft release documenting Narrative Visual default-to-Copilot mode when user holds license, in-report Copilot mobile expansion with citations, confirming narrative generation as core GA feature.

HISTORY

  • 2019: Automated Insights and MicroStrategy partnered to integrate Wordsmith narrative generation into dashboards; early vendor focus on empowering data analysts with NLG capabilities in enterprise BI.
  • 2020: Microsoft Power BI released Smart Narratives preview (Sep), advancing mainstream adoption; Narrator closed $6.2M Series A to commercialize narrative generation for data modeling; academic research achieved breakthroughs in neural reliability (EMNLP) but identified unresolved challenges in content selection and contextual reasoning.
  • 2021: Platform consolidation accelerated with Salesforce/Tableau acquiring Narrative Science (Dec), integrating narrative generation into the BI mainstream; Gartner predicted 75% of data stories would be automatically generated by 2025; enterprise surveys showed strong demand (93% see revenue value in data storytelling) but academic research continued highlighting hallucination and faithful output generation as critical unresolved challenges in data-to-text systems.
  • 2022-H1: Narrative generation reached mainstream feature parity with Microsoft Power BI Smart Narratives achieving general availability and Tableau announcing Data Stories (from Narrative Science acquisition) at May 2022 conference; both major BI platforms now offered automated narrative generation as core features. Academic research intensified focus on hallucination detection and mitigation (major February 2022 survey). Practitioner evaluations showed deployments working for common use cases but revealed limitations in complex scenarios; industry analysis highlighted persistent tension between scaling automation and maintaining reliability in mission-critical analytical narratives.
  • 2022-H2: Research productivity on hallucination and omission problems accelerated with major studies (IBM NAACL 60%+ hallucination rates in benchmarks, INLG meteorology use case, radiology report generation improvements). Tableau Data Stories moved toward general availability by year-end, but Power BI Smart Narratives remained in preview for on-premises deployments, indicating uneven platform rollout. Academic and domain-specific work continued demonstrating that narrative generation quality remained constraint on production adoption despite vendor platform integration and enterprise demand.
  • 2023-H1: Tableau Data Stories achieved platform-wide GA in Server and Desktop (expanding from Cloud-only launch in 2022); both Power BI Smart Narratives and Tableau Data Stories positioned as production features with documented technical constraints (timeouts, data point limits). Standalone narrative generation ecosystem expanded to 15+ competing tools. Academic research documented systemic hallucination challenges in LLM-based systems; practitioner analysis identified implementation barriers beyond vendor features (domain knowledge, audience understanding, visualization competency). Deployment patterns remained focused on analytical augmentation with mandatory human validation rather than autonomous narrative generation.
  • 2023-H2: Major vendor acceleration with Microsoft announcing Fabric GA and Copilot-powered Narrative visual (public preview by Q1 2024), extending narrative generation beyond traditional BI into broader data platforms. Academic research intensified focus on hallucination mitigation with large-scale benchmarks (HaluEval showing 19.5% ChatGPT hallucination rate) and novel technical solutions (58% error reduction via fine-tuning, RAG-based hallucination detection). Domain-specific adoption emerged in regulated environments (clinical report automation) and niche sectors (library data storytelling), though real-world implementations revealed persistent adoption barriers: practitioners identified visualization competency, domain knowledge, and audience understanding as critical success factors beyond vendor platform maturity. LLM-based narrative generation remained positioned as analytical augmentation requiring human validation rather than autonomous decision support.
  • 2024-Q1: Microsoft Copilot narrative visual reached GA in Power BI (Feb), accelerating LLM-based narrative generation in enterprise BI; concurrent academic research on interactive narrative generation (Socrates user study, 18-person evaluation) and large-scale hallucination surveys (79-paper synthesis, 171-researcher audit) confirmed improved user relevance alongside persistent reliability challenges. Evaluation frameworks for NLG systems advanced with LLM-based metrics, though practitioner reporting indicated hallucinations remained a key adoption barrier (3-10% rates documented by industry analysts).
  • 2024-Q2: Vendor ecosystem consolidated with Google promoting AI-powered storytelling in Looker (June MQ announcement); real-world deployments scaled in financial services (FactSet, portfolio commentary in GA; Associated Press earnings narratives at 3,750 quarterly reports). Academic research accelerated focus on hallucination mitigation—Oxford Nature paper on semantic entropy detection and JMIR peer-reviewed study documenting 28.6%-91.4% hallucination rates in LLM narrative tasks. Production maturity advanced while reliability remained the primary constraint; Australian journalism case study documented unsustainability despite initial enthusiasm, signaling sector-specific adoption barriers beyond technical platform capability.
  • 2024-Q3: Academic work intensified on narrative generation mechanics—DataNarrative (1,449-story benchmark) and Compendia (user study) showed progress but persistent challenges in coherence and fact extraction. Empirical research directly measured hallucination impact on data quality; Northwestern CASMI published critical perspective reframing hallucinations as fundamental LLM property, advocating paradigm shift toward data-guided approaches (Satyrn). United Robots expanded deployments in newsrooms for weather and real-estate automation (6-7 hours daily coverage). Academic consensus shifted from mitigation hopes toward acceptance that reliability barriers require architectural changes, not technical tuning.
  • 2024-Q4: Ecosystem expansion continued with Oracle EPM Cloud achieving GA of GenAI narrative summaries for financial reporting in November, extending narrative generation to adjacent enterprise domains. Research shifted focus from hallucination mitigation to architectural redesign—knowledge graph integration proposed as promising direction to anchor LLMs in verified data. OpenAI SimpleQA study confirmed systemic overconfidence in generative AI systems (November), reinforcing consensus that autonomous narrative generation requires mandatory human validation in mission-critical contexts. Deployment patterns remained cautious; vendor platform feature parity achieved but real-world adoption concentrated in structured, compliance-driven sectors with continued emphasis on augmentation rather than autonomy.
  • 2025-Q1: Vendor ecosystem stabilized with Tableau retiring Data Stories in January 2025 (version 2025.1) in favor of Tableau Pulse, signaling strategic consolidation toward conversational analytics. Microsoft Power BI Copilot narrative visual maintained GA with documented production constraints (30,000-row limits, field truncation). Adoption focus shifted from feature exploration to governance and capacity planning, with practitioners addressing cost management and pilot-to-production scaling challenges. Academic and practitioner research continued emphasizing hallucination as a binding constraint, with comprehensive surveys and real-world examples (fabricated legal citations, hallucinated case references) reinforcing that narrative generation requires mandatory human validation in production deployments. Deployment remained concentrated in structured, compliance-adjacent sectors with mandatory validation protocols.
  • 2025-Q2: Academic research formalized hallucination as architectural constraint rather than solvable engineering problem. New research (April-June 2025) introduced "corrosive hallucination" framework and comprehensive LLM hallucination taxonomy, documenting inherent inevitability in LLM-based systems. Real-world failures documented: Mata v. Avianca legal brief with fabricated case citations exemplifying risks of unreviewed AI narrative output. Scaling challenges surfaced with Gartner data showing 30% of successful AI pilots abandoned before production due to organizational barriers. Product ecosystem experienced mixed signals: Power BI Copilot narrative remained GA with user-reported failures in production (regional limitations, configuration dependencies), while Tableau Pulse transition signaled vendor evolution beyond dedicated narrative generation toward conversational analytics. Practitioner focus intensified on governance, capacity planning, and pilot-to-production challenges rather than feature capability expansion. Hallucination research consensus hardened: reliability barriers require architectural redesign, not incremental tuning—positioning narrative generation as mandatory-validation augmentation tool rather than autonomous decision support path.
  • 2025-Q3: Vendor platform ecosystem continued consolidation with Microsoft extending narrative generation beyond traditional BI into project management (Planner Agent preview generating status reports from structured task data). Research focus intensified on evaluation frameworks: NarraBench taxonomy and survey documented that only 27% of narrative understanding benchmarks fully capture narrative tasks, exposing critical gaps in assessing narrative generation quality. Regulatory and governance discourse advanced with research proposing layered frameworks for hallucination risks encompassing epistemic instability, user misdirection, and social-scale effects. Real-world deployment evidence expanded into regulated sectors: pharmacovigilance case study demonstrated production use of AI-generated reports in medical domain with explicit hybrid human-AI model acknowledging reliability and oversight requirements. Adoption trajectory showed vendor extension into adjacent domains (project management, regulated reporting) while maintaining core narrative generation as augmentation tool requiring mandatory human validation, with reliability barriers positioned as architectural rather than incremental engineering challenges.
  • 2025-Q4: Vendor platform maturity reinforced with Microsoft continuing GA support for Power BI Copilot narrative visuals (November-December 2025 documentation updates) and expanded mobile accessibility (iOS/Android preview). Deployment reliability challenges surfaced: NHS service alert documented Copilot outage affecting production healthcare environment due to traffic throttling and policy regression, exemplifying scalability constraints in enterprise narrative generation. Research consensus hardened on hallucination as fundamental architectural barrier: October 2025 comprehensive survey confirmed hallucination causes, detection approaches, and mitigation limitations. Real-world deployment examples highlighted (legal briefs with fabricated citations, healthcare failures) reinforcing mandatory validation requirements. Adoption pattern remained stable: narrative generation as augmentation tool with human oversight in structured, compliance-adjacent sectors. Vendor ecosystem consolidation complete; feature parity achieved but reliability barriers maintained as core constraint on autonomous deployment. By end-2025, the practice had reached stable maturity with broad platform availability but narrow, validation-required deployment windows.
  • 2026-Jan: Academic and vendor activity accelerated research on narrative theory and hallucination mitigation. New research survey (Narrative Theory-Driven LLM Methods) advanced theoretical foundations for narrative generation systems, while parallel work (Idea2Story) proposed knowledge graph anchoring to reduce hallucinations in autonomous narrative pipelines. Microsoft maintained GA status for Power BI Copilot narrative visuals with documented multilingual and sovereign cloud constraints (Jan 2026 documentation). Industry analysis (79% hallucination rates) and critical perspectives (Duke University survey: 94% users concerned about accuracy) reinforced hallucination as persistent adoption barrier. Nuanced research (Engineering of Hallucination) suggested hallucination-as-feature reframe for creative applications. Focused language models proposed as technical solution for accuracy improvement through task-specific training.
  • 2026-Feb: Vendor consolidation accelerated with Microsoft announcing Power BI Q&A discontinuation (December 2026), replacing it with Copilot-driven narrative summarization. Academic research (StoryScore) advanced evaluation frameworks to distinguish creative adaptation from hallucination. Deployment in regulated sectors expanded: Narrativa reported production use in pharmaceutical clinical trials with knowledge graph grounding. OpenAI research confirmed hallucinations are mathematically inevitable in LLMs, hardening consensus on architectural constraints. Security vulnerability discovered in Copilot (email summarization bypass) highlighted real-world deployment risks. Adoption patterns remained validation-required with governance and cost management challenges emerging as pilots scaled toward production.
  • 2026-Mar: Clinical narrative generation reached scale: Tandem Health processed 375,000 clinical notes and AWS HealthScribe reached GA for ambient documentation. Power BI Copilot shipped standalone narrative email summaries in production. Critical deployment friction quantified: 40% of Copilot deployments stall within six months with only 3% achieving meaningful ROI — yet where data governance is solved, adoption can be rapid (one SaaS case study showed 12% to 84% adoption in 12 days with 40% cycle-time reduction). Practitioners operating at scale document five-layer mitigation stacks (RAG grounding, guardrails, automated evals, human-in-the-loop review, observability) as the architectural prerequisite for moving from pilot to trusted production.
  • 2026-Apr: Narrative generation ecosystem extended beyond BI platforms: Microsoft Power Platform Copilot added data narrative generation to low-code model-driven apps (summarizing table data, recapping record history, generating documents). Vendors consolidated ecosystem: Power BI smart narratives with auto-refresh confirmed GA in March 2026 updates. Academic research intensified on reliability barriers. EMNLP 2024 retrospective: DataNarrative multi-agent framework with 1,449-story benchmark demonstrated technical progress on coherence and hallucination mitigation. EACL 2026 paper revealed critical evaluation gap: 50%+ of hallucinations involve consistency failures rather than correctness errors, requiring fundamentally different assessment approaches. Comprehensive hallucination survey (100+ papers) confirmed architectural inevitability. Large-scale empirical study (172B tokens, 35 models) quantified hallucination baseline: 1.19%-10%+ depending on context length. GCAN framework showed 27.8% hallucination reduction vs. baseline RAG, indicating continued technical innovation. Practitioner analysis documented real-world failures (legal briefs with fabricated citations, judicial sanctions) and proposed 4-layer risk assessment framework. Independent benchmark (Halluhard) showed Claude Opus ~33% hallucination in legal/research domains. Organizational adoption barrier identified: analyst bottleneck—narrative generation solves statistical insight communication but organizational adoption depends on solving data governance, cost uncertainty, and change management, not platform capability.
  • 2026-May: Microsoft pushed Power BI Copilot narrative generation further into defaults—April 2026 update forces Copilot mode for licensed users, raises the prompt character limit to 10,000, and expands in-report narratives to mobile. Hallucination risk quantification sharpened: industry benchmarking across 40+ models shows 0.7%-0.8% error rates on summarization but 15.6%-18.7% in medical and legal domains, with no model immune; CFO-focused risk analysis cites $67.4B annual AI hallucination cost and documents RLS bypass and audit verification gaps in deployed narrative systems. Real-world testing of finance narratives confirms Copilot competent for basic narration but unreliable for causal attribution, hardening the governance requirement for human review before organizational use.