Perly Consulting │ Beck Eco

The State of Play

A living index of AI adoption across industries — where established practice meets the bleeding edge
UPDATED DAILY

The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.

The Daily Dispatch

A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.

AI Maturity by Domain

Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail

DOMAIN
BLEEDING EDGEESTABLISHED

Personal research & reading acceleration

BLEEDING EDGE

TRAJECTORY

Stalled

AI that accelerates personal research and reading through summarisation, synthesis, and intelligent highlighting of key content. Includes article distillation and research compilation; distinct from deep research tools which autonomously gather sources rather than processing provided ones.

OVERVIEW

AI-powered reading acceleration—summarising articles, distilling research, synthesising sources—remains stuck in bleeding-edge territory despite mainstream adoption signals and three years of vendor investment. The tools demonstrably work: Google NotebookLM holds a 4.8/5 rating from 240K+ app store reviews; 88% of UK undergraduates use AI for "explain concepts, summarise articles"; Readwise Reader has consolidated around power readers; and Adobe's contract review tools achieve 77% time reduction. Yet adoption growth is stalled by an unresolved core tension: verification barriers and the confidence trap. Independent evaluation reveals the production-reality gap: Oumi's April 2026 analysis found only 39% of Google AI Overviews are both correct and source-supported. Hallucination severity jumps 3-10x on enterprise datasets (legal 18.7%, medical 15.6%) compared to benchmarks. More critically, practitioner research documents that 95% of enterprise AI investments deliver zero ROI, and 40% of US workers experience "workslop"—polished-but-wrong outputs costing 2-3.5 hours rework per incident. Domain expertise amplifies risk: experienced professionals trust confident-sounding outputs without verification. Mitigation strategies exist (source grounding reduces hallucinations 30-50%, disciplined prompting achieves 80% improvement) but all require active human verification workflows, which offset adoption gains. The bifurcation has hardened: routine document review (contract summary, legal brief prep) consolidated around major vendors with proven ROI; research synthesis, academic reading, and knowledge work remain blocked not by capability gaps but by the adoption paradox—tools keep improving while organizations systematically withold scaling until ROI can be demonstrated.

CURRENT LANDSCAPE

Ecosystem consolidation and feature momentum signal category maturity. Google NotebookLM (May 2026) shows mainstream adoption: 240K+ app store reviews at 4.8/5 rating, #27 US Productivity ranking, with audio overview generation as primary adoption driver. April 2026 releases (auto-source-labeling, bulk sharing, flashcard mastery tracking) address documented friction points from user feedback, indicating product-market fit. Student deployments scaled: 88% of 1,041 UK undergraduates (HEPI survey, Feb 2025) use AI for "explain concepts, summarise articles, suggest research ideas." Readwise Reader consolidated as premium unified reading app targeting knowledge workers (highlights, annotations, Obsidian/Notion/Roam exports) with 7.4/10 ecosystem positioning. Market segmentation deepened: free skimmers (Apricot), power analysts (Feedly AI Pro), executive digests (Readless), semantic search retrieval (Surface + Readwise), open-source alternatives (Karakeep, Linkwarden, Wallabag) showing ecosystem depth. Practitioner workflows scaled: content researchers centralize sources in NotebookLM for pattern discovery; newsletter consumption optimized via hot-topic detection (190 min/week reclaimed from 50-subscription digest); semantic search surfaces relevant reading at writing time (18K+ highlights managed via vector retrieval). Institutional warnings intensified: universities issued systematic guidance restricting research use due to verification barriers; BBC research documented 45% misrepresentation in AI-generated news summaries. However, adoption metrics reveal the core constraint: 95% of enterprise AI investments delivered zero ROI (Harvard/MIT 2025-2026); only 12-18% of companies captured meaningful returns. "Workslop"—polished-but-wrong outputs—affected 40% of US workers monthly, costing 2-3.5 hours rework per incident, offsetting tool-reported speedups. Domain expertise amplified risk: experienced professionals trusted confident-sounding output without verification. The practice bifurcation hardened: routine document review (contract summary 77% speedup, legal brief prep) consolidated around Adobe and Readwise with demonstrated ROI; research synthesis, academic reading, knowledge work remained blocked by verification barriers and the adoption paradox—organizations demanded hard ROI evidence while tools required manual verification workflows that erased productivity gains.

TIER HISTORY

ResearchNov-2022 → Nov-2022
Bleeding EdgeNov-2022 → present

EVIDENCE (102)

— Ecosystem consolidation: 7 newsletter readers compared across AI features and integrations; Readwise Reader positioned as premium unified reading app for knowledge workers.

— Mainstream adoption confirmed: 240.5K+ app store reviews, 4.8/5 rating, #27 US Productivity ranking. Audio overview generation identified as primary value driver; sync friction and rate limits documented.

— Ecosystem maturity with tiered solutions: free skimmers (Apricot), power-user feed intelligence (Feedly AI Pro), executive digests (Readless); shows market differentiation at category scale.

— Critical adoption barrier: 95% of enterprise AI investments zero ROI; confidence trap escalates errors; 40% of US workers experience 'workslop' costing 2-3.5 hours rework—systematically erases adoption gains.

— Practitioner deployment: content research, audience pattern discovery, SEO research accelerated via centralized source organization and AI-driven first-layer synthesis.

— Addresses reading bottleneck at scale: hot-topic detection clusters cross-source overlap; models time savings of 190 minutes weekly from 50-subscription digest consolidation.

— Semantic search integration surfacing relevant reading during writing via vector similarity; deployed at scale (18k+ highlights), demonstrates agentic retrieval pattern for research synthesis.

— Maps research workflow integration with verification discipline: Perplexity for orientation/discovery only, not formal databases or synthesis; explicit warnings on verification requirements and hallucination risks.

HISTORY

  • 2022-H2: Summarization evaluation methodology emerged as a critical research gap (RoSE, APPLS benchmarks); hallucination problems and mitigations published across EMNLP venues; Readwise Reader launched in public beta with commercial pricing; medical domain analysis revealed fundamental limits of source-only summarization.
  • 2023-H1: Hallucination research deepened with peer-reviewed case studies (ChatGPT fabricating medical summaries) and systematic surveys documenting hallucination as fundamental LLM limitation; product development continued (Readwise Reader feature updates, AI Reader preview release) but consumer adoption remained stalled—only 19% had tried ChatGPT by June 2023; practice blocked by reliability concerns rather than capability gaps.
  • 2023-H2: Hallucination research intensified with new peer-reviewed papers on detection methodologies and entity-specific errors across ACL, arXiv, and medical journals; commercial vendors accelerated deployment with Adobe expanding Acrobat AI into public beta and Readwise shipping performance improvements; scholarly debate emerged questioning pedagogical impact of reading assistants on comprehension; consumer mainstream adoption remained limited despite 66% enterprise adoption of generative AI tools broadly.
  • 2024-Q1: Workplace adoption accelerated to 23% of employed Americans using AI for research with documented time savings (1.4% of work hours); consumer chatbot use for research reached 91% of survey respondents. Adobe released AI Assistant from beta in Reader and Acrobat. Hallucination remained the critical blocker: Stanford study documented 69-88% hallucination rates on legal queries, while academic research showed LLMs fabricate scholarly citations. Practice split between low-stakes research (where time savings drive adoption) and specialized/academic contexts (where hallucination risks block deployment).
  • 2024-Q2: Adobe demonstrated 4x task completion speedup with Acrobat AI Assistant in production and expanded to multi-format documents; launched free trial period to drive adoption. Field evidence from PhD researchers revealed persistent usability friction: AI outputs still required substantial manual rework, often negating claimed productivity gains. Vendor confidence in scaling contrasted with user skepticism about actual value delivery; hallucination barriers persisted in specialized domains while commoditized document review consolidated around major vendors.
  • 2024-Q3: Vendor responses to hallucination problem escalated with Microsoft launching Correction tool (September), signaling admission that post-hoc fact-checking was necessary. Research documented new hallucination patterns: concentration at end of long summaries and domain-specific failures in medical summarization. Elsevier survey found researcher interest in AI research tools remained high but actual adoption far lower, indicating persistent deployment barriers. Legal domain continued shaped by Stanford's January 2024 findings through Q3. Practice stratification hardened: corporate document review consolidating around major vendors, while academic and specialized research remained largely blocked by reliability constraints.
  • 2024-Q4: Deployment in routine corporate document review stabilized with Readwise Reader and Adobe Acrobat AI Assistant shipping incremental improvements and sustained usage. However, adoption growth plateaued: Adobe survey showed 68% of employed Americans still had not tried AI for document tasks despite 80% saying they would if time savings exceeded 10 hours weekly—revealing persistent gap between vendor claims and user value perception. Enterprise AI project deployment declined overall (47.4% in 2024 vs 55.5% in 2021) with ROI demonstration cited as primary blocker. Research continued addressing hallucination through new mitigation techniques (Entity Hallucination Index reinforcement learning), but proliferation of mitigation papers signaled problem remained unresolved. Two-tier market matured: low-stakes document review consolidated around major vendors; specialized and academic contexts remained blocked by reliability and adoption friction.
  • 2025-Q1: Vendor product development accelerated with Readwise adding Chat With Highlights (January) and Frontmatter Summaries, while Adobe's Pfeiffer benchmark found 4x productivity gains on document tasks and Forrester TEI projected 176-415% ROI. Yet the hallucination barrier persisted unchanged: a computational biologist's January tests confirmed that ChatGPT, Claude, and GPT-4o mini all fabricated academic references; Google's AI Overviews rollout faced public accuracy issues with CEO acknowledgment of no foolproof solution. Technical progress continued (March 2025 paper on GPT-based hallucination reduction) but reliability remained the critical blocker. Low-stakes corporate document review continued consolidating around major vendors; academic and research-intensive contexts remained blocked by hallucination risks.
  • 2025-Q2: Research evidence intensified on hallucination severity: a NAACL 2025 peer-reviewed study found 75% of content in LLM multi-document summaries is hallucinated, with GPT models fabricating 44-79% of non-existent topics; concurrent tracking showed hallucination rates worsening (33-79% for latest OpenAI models) despite AI capabilities advancing. Academic and library institutions issued cautionary guidance (Johns Hopkins, Utrecht University) warning researchers against trusting AI outputs for factual accuracy due to persistent citation fabrication and information invention. Practitioner testing confirmed wide variance in summarization quality across models, with no clear leader and context-dependent results. No new deployment announcements from major vendors during this window; prior period's productivity claims (4x speedup) remained the primary positive signal, but evidence from Q2 focused on deepening technical understanding of why hallucination remains unsolved.
  • 2025-Q3: Vendor product development accelerated with Adobe launching Acrobat Studio's PDF Spaces for multi-document summarization and Readwise shipping AI Themed Reviews; simultaneously, adoption momentum reversed sharply with Census Bureau data showing large-firm AI adoption declining from 14% to 12% (June-August), MIT surveys finding 95% of AI pilots failing, and FDA's clinical document review AI assistant (Elsa) hallucinating extensively. The practice's bifurcation hardened further: commodity document review showed incremental productivity improvements (Adobe's 4x speedup claimed persisting), but reliability barriers deepened across domains—hallucination research documented up to 75% fabrication rates in multi-document summaries, and institutional warnings from universities and research libraries amplified, making Q3 a inflection point where vendor capability expansion collided with enterprise adoption contraction and user skepticism about hallucination risks.
  • 2025-Q4: Vendor development continued with Adobe reporting 4x YoY usage growth in AI productivity tools, but enterprise adoption collapsed with 42% of companies abandoning AI initiatives (up 250% from Q4 2024), Census Bureau data continuing decline to 12% large-firm adoption, and MIT research confirming 95% of pilots fail. Hallucination evidence intensified: November study found 56.2% of ChatGPT citations in mental health reviews fabricated (with domain-specific rates to 28-29%), FDA's Elsa clinical AI hallucinating extensively, and institutional guidance from universities warning against research use. Readwise Reader case study documented continued strong personal adoption with sustained production deployment. The market bifurcation inverted: vendors scaled capabilities while organizations withdrew; routine document review remained stable with 4x productivity gains; research, academic, and specialized domains remained blocked by unresolved hallucination risks and ROI demonstration barriers.
  • 2026-Jan: Enterprise adoption crisis deepened: only 18% of US workers use AI assistants weekly (Gallup) with most organizations in pilots or small-scale rollouts; 78% of enterprises use AI but only 23% measure ROI, with 40% of productivity gains lost to rework. Hallucination crises continued: 100+ AI-hallucinated citations detected in 53 NeurIPS 2025 papers (slipping past peer review), and critical analyses found 60%+ of AI-generated citations broken or fabricated. Enterprise spending reached $2.52 trillion (44% YoY increase) yet only 12% of CEOs report significant benefits; agentic AI deployment dropped from 42% to 26%, and 87% of enterprise AI projects fail to deliver P&L impact. The bifurcation persisted: vendors continued feature rollouts (Adobe's presentation generation, Readwise's themed reviews) and claimed 4x productivity gains; organizations deployed tools in pilots but systematically withheld scaling, citing inability to measure value and persistent hallucination risks. Research acceleration remained blocked by reliability barriers; document review use cases remained stable but adoption and expansion stalled.
  • 2026-Feb: ROI measurement crisis dominated discourse as organizations faced hard budget deadlines. Vendor deployment metrics continued accelerating: Adobe's Acrobat Studio reported 415% ROI and 45% efficiency gains (45 min → 9 min for legal document review), while a personal automation case study documented reading management freed 15-20 hours monthly via Readwise-reMarkable integration. However, negative signals intensified: 71% of CIOs reported budget-cut pressure to prove returns by mid-2026, and analyst research confirmed 95% of enterprise AI pilots delivered zero P&L impact with only 12-18% of companies capturing meaningful ROI. Hallucination remained persistent: Johns Hopkins peer-reviewed study (JMIR) compared ChatGPT summaries to human annotations on biomedical articles, finding AI matched human main-point capture but showed 3-10x higher error rates—a clear tradeoff between speed and reliability. A new product milestone emerged: GeoBarta AI News Summary reached GA, delivering 60-second briefings across 10,000+ news sources, demonstrating category maturity in specialized reading domains. The market bifurcation hardened further: reliable productivity gains existed in commodity document review (enterprise use cases with clear ROI); research and specialized domains remained blocked by hallucination risks; and adoption momentum continued reversing as organizations demanded hard ROI evidence by Q2 2026.
  • 2026-Mar: Market growth accelerated amid ongoing hallucination crises. Personal knowledge base AI market grew 30.3% YoY ($1.65B to $2.16B), while intelligent document processing projected 33.10% CAGR through 2032 ($2.30B to $12.35B), signaling strong market appetite despite adoption barriers. Platform-scale signals emerged: X (Twitter) launched AI article summaries as GA feature; Readwise announced MCP and CLI integrations enabling AI agent access to personal reading libraries, reflecting agentic AI workflow evolution. Hallucination evidence intensified at the research frontier: ICLR 2026 revealed 16% of peer-reviewed papers contain hallucinated references, fake authors, and fabricated data (21% of reviews may be AI-generated); BBC research documented 45% misrepresentation rates in AI news summaries with 40-90% traffic impact on content creators. Paradoxically, studies showed AI summaries increase purchase intent despite 60% hallucination rates, and an NRC editor's case study documented how domain expertise amplified hallucination risk through confident language and reduced verification. The practice bifurcation remained firm: commodity document review (contract review 77% time reduction, legal tasks 6-9x faster) continued consolidating around Adobe and Readwise; research, academic, and high-stakes domains remained blocked by unresolved reliability risks and the adoption paradox that users increasingly trust outputs they should distrust.
  • 2026-Apr (early): Model-level improvements accelerated: Vectara's April HEF benchmark documented frontier models (Gemini 2.0 Flash, Claude 4.1 Opus, GPT-4o) achieving 0.7-0.9% hallucination rates—representing ~95% improvement from 2024 baselines. Enterprise vendors shipped new capabilities: DistillerSR's Smart Evidence Extraction module reached GA with 8.3pp accuracy improvement on scientific literature extraction, trusted by 80%+ of top pharma/medical device companies; Adobe expanded Acrobat Spaces to education market with 500-student testing from Harvard, Berkeley, Brown. Yet deployment barriers persisted: Vectara data revealed that models without web search still hallucinate 30-60% of the time; Google's AI Overviews at global scale (100M+ monthly users) documented 9-15% error rates producing tens of millions of false summaries hourly; legal database tracked 1,227+ documented hallucination cases in courts (5-6 new weekly), with 1,022 fabricated case citations. Research revealed unintended costs: MIT/Stanford studies showed AI interaction amplifies confirmation bias (users affirmed 49% more than by humans) and extended sessions increase delusional spiraling. Pattern analysis identified what actually works: shift from reactive (chatbots) to proactive AI (overnight synthesis, persistent memory, cross-app integration) delivering measurable gains. The reliability-adoption paradox deepened: frontier models showed unprecedented accuracy improvement yet real-world deployments revealed persistent failures—gap between controlled benchmarks and production chaos widened further.
  • 2026-Apr (late): Independent evaluation and practitioner evidence revealed the production-reality gap in greater detail. Oumi's independent analysis of Google AI Overviews using 4,000+ queries from OpenAI's SimpleQA benchmark found that 91% returned correct answers, but only 39% were fully trustworthy (correct AND source-supported); hallucination rate actually increased in Gemini 3 despite accuracy gains in base model. Suprmind's comprehensive benchmark compilation documented that hallucination severity jumps 3-10x on enterprise-scale datasets compared to controlled tests: frontier models at 0.7-0.9% baseline but domain-specific rates reached 18.7% (legal), 15.6% (medical). Practitioner testing (40+ case studies) revealed hallucination predictability: severity scales linearly with knowledge-gap distance (1-2 months past cutoff=hedged; 6+ months=fabricated narratives), and confidence inversely correlates with accuracy—highest-risk categories are names, dates, financial figures, and URLs. Critical assessments documented that hallucination rates are worsening, not improving: OpenAI's o4-mini model hallucinated at 80% on general knowledge questions; reasoning-based models amplify errors at each inference step. Mitigation strategies showed efficacy in controlled settings: source grounding reduces hallucinations 30-50%, prompting disciplines (single-focus, explicit refusal patterns) achieve 80% improvement, RAG systems 70-80%; however, all strategies require active human verification workflows offsetting adoption gains. Real-world deployment metrics confirmed: semantic enterprise search achieves ~9x research task acceleration (45 min → <5 min), evidence of proven productivity in controlled domains. Ecosystem evolution continued: Readwise shipped MCP and CLI integrations enabling AI agent access; reading tools integrated into agentic workflows. The bifurcation persisted with sharper evidence: routine document review productivity proven at scale (77% contract review acceleration), but production hallucination barriers remain unresolved despite model capability advances; research, academic, and high-stakes domains remain blocked by fundamental reliability constraints and adoption-paradox dynamics where users increasingly trust outputs they should distrust.
  • 2026-May: Ecosystem consolidation and practitioner adoption signals confirmed category maturity alongside persistent ROI barriers. Google NotebookLM confirmed mainstream adoption at scale: 240K+ app store reviews at 4.8/5, #27 US Productivity ranking, with audio overview generation as the primary adoption driver; April 2026 updates (auto-source-labeling, bulk sharing, flashcard tracking) addressed documented friction. The AI reading tool market differentiated into tiers — free skimmers (Apricot), power-user feed intelligence (Feedly AI Pro), and executive digest services (Readless) — with Readwise Reader consolidating as the premium unified option for knowledge workers. Practitioner workflows scaled: hot-topic detection across 50+ newsletter subscriptions reclaims 190 minutes weekly; semantic search via Readwise+Obsidian surfaces relevant highlights at writing time across 18K+ annotations. However, the adoption ceiling remained intact: 95% of enterprise AI investments delivered zero ROI, and 40% of US workers reported "workslop" — polished-but-wrong outputs requiring 2-3.5 hours rework per incident — systematically erasing the tool-reported productivity gains. The bifurcation held: personal productivity use cases continue scaling, while research and high-stakes reading contexts remain blocked by verification burdens that offset acceleration gains.