Perly Consulting │ Beck Eco

The State of Play

A living index of AI adoption across industries — where established practice meets the bleeding edge
UPDATED DAILY

The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.

The Daily Dispatch

A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.

Pick a role above to explore practices

BLEEDING EDGE

⌨️ SOFTWARE ENGINEERING
✍️ CONTENT & MARKETING
🔬 RESEARCH & KNOWLEDGE
⚖️ LEGAL, COMPLIANCE & RISK
🎧 CUSTOMER OPERATIONS
🏛️ AI GOVERNANCE & SAFETY
📊 DATA & ANALYTICS
🛡️ IT OPERATIONS & SECURITY
🎯 PRODUCT & DESIGN
💼 SALES & REVENUE
🎬 CREATIVE & GENERATIVE MEDIA
👁️ COMPUTER VISION & SENSING
💹 FINANCE & ACCOUNTING
🔄 OPERATIONS & PROCESS AUTOMATION
🚗 AUTONOMOUS SYSTEMS & VEHICLES
🦾 PHYSICAL AI & ROBOTICS
🎓 EDUCATION & LEARNING
PERSONAL EFFECTIVENESS

LEADING EDGE

⌨️ SOFTWARE ENGINEERING
✍️ CONTENT & MARKETING
🔬 RESEARCH & KNOWLEDGE
⚖️ LEGAL, COMPLIANCE & RISK
🎧 CUSTOMER OPERATIONS
🏛️ AI GOVERNANCE & SAFETY
📊 DATA & ANALYTICS
🛡️ IT OPERATIONS & SECURITY
🎯 PRODUCT & DESIGN
💼 SALES & REVENUE
🎬 CREATIVE & GENERATIVE MEDIA
👁️ COMPUTER VISION & SENSING
💹 FINANCE & ACCOUNTING
🔄 OPERATIONS & PROCESS AUTOMATION
👥 PEOPLE & TALENT
🚗 AUTONOMOUS SYSTEMS & VEHICLES
🦾 PHYSICAL AI & ROBOTICS
🎓 EDUCATION & LEARNING
PERSONAL EFFECTIVENESS

GOOD PRACTICE

⌨️ SOFTWARE ENGINEERING
✍️ CONTENT & MARKETING
🔬 RESEARCH & KNOWLEDGE
⚖️ LEGAL, COMPLIANCE & RISK
🎧 CUSTOMER OPERATIONS
🏛️ AI GOVERNANCE & SAFETY
📊 DATA & ANALYTICS
🛡️ IT OPERATIONS & SECURITY
🎯 PRODUCT & DESIGN
💼 SALES & REVENUE
🎬 CREATIVE & GENERATIVE MEDIA
👁️ COMPUTER VISION & SENSING
💹 FINANCE & ACCOUNTING
🔄 OPERATIONS & PROCESS AUTOMATION
👥 PEOPLE & TALENT
🚗 AUTONOMOUS SYSTEMS & VEHICLES
🦾 PHYSICAL AI & ROBOTICS
🎓 EDUCATION & LEARNING
PERSONAL EFFECTIVENESS

ESTABLISHED

⌨️ SOFTWARE ENGINEERING
✍️ CONTENT & MARKETING
🛡️ IT OPERATIONS & SECURITY
🎯 PRODUCT & DESIGN
💹 FINANCE & ACCOUNTING
👥 PEOPLE & TALENT

🔬 Research & Knowledge

AI for finding, synthesising, verifying, and preserving organisational knowledge. Mostly leading-edge: literature review, competitive intelligence, and knowledge management tools are maturing quickly with five practices actively advancing. The main constraint is hallucination risk — fact-checking and source verification still require human oversight in high-stakes contexts.

14 practices: 3 good practice, 10 leading edge, 1 bleeding edge

Where AI Stands in Research & Knowledge

Research and knowledge work is where AI's productivity promise collides hardest with its reliability problem, and mid-2026 finds the domain locked in that collision rather than resolving it. The tooling is everywhere and deployed at genuine scale: Perplexity has crossed 50 million monthly active users and $454M in annual recurring revenue; Microsoft 365 Copilot and Google Gemini ship summarisation and search across billions of seats; AlphaSense passed $500M ARR with 85 percent of the S&P 100 as customers; Gong runs at over $500M ARR in revenue-operations meeting intelligence. Across literature analysis, due diligence, enterprise search, pharmacovigilance monitoring and competitive intelligence, the capability question is settled. The technology works, often impressively. What it does not do reliably is tell the truth without supervision, and that single fact governs the entire domain.

The result is a remarkably consistent bifurcation that recurs in practice after practice. Bounded, supervised, narrowly-scoped tasks — patent triage, meeting transcription for sales calls, regulatory monitoring in pharma, single-document summaries with mandatory human review — have crossed into proven production with measurable ROI. Unsupervised synthesis, multi-document reasoning, autonomous determinations and anything mission-critical remains blocked by hallucination rates that no shipping product has resolved. The numbers are blunt: frontier models still land at roughly 75 percent accuracy on long-document synthesis, agents reach only the low 60s on realistic long-horizon research tasks, and a study of five frontier models found 67 percent disagreement on factual claims with 34 percent in direct opposition. A Princeton-backed analysis captured the structural problem most starkly: eighteen months of model capability gains yielded zero reliability improvement for production research agents.

What distinguishes this domain from others is that the binding constraint has migrated from technology to organisation and, increasingly, to law. The frontier is no longer model capability; it is governance, data readiness, verification discipline and the courts. Judicial sanctions for AI-fabricated citations now exceed $145,000 in a single quarter across more than 30 countries, with a global database tracking over 1,200 documented court hallucination cases. Gartner forecasts 40 percent of agentic projects cancelled by 2027 — for reasons of cost and governance, not capability. The organisations winning here are not those with the best models; they are those that have built the verification frameworks, audit trails and human-in-the-loop (a person reviews each AI output before it ships) processes to deploy fallible tools safely. That is the real maturity gradient in research and knowledge work today.

What's New, 2026-05-18 to 2026-06-17

The scan's single tier-level movement is a downgrade in momentum, not capability: single-query research retrieval shifted from advancing to stalled. The signals that drove it are telling. Anthropic launched a Web Search API and its Citations API showed real technical progress (overall error rates falling 19 to 2 percent, legal citations from 88 to 52 percent), yet the same window surfaced evidence that the underlying reliability ceiling is structural, not incremental. ACM research documented "retrieval collapse" when AI-generated content contaminates search indexes — accuracy metrics stay reassuring while systems invisibly drift onto synthetic evidence, which can snowball to 80 percent of top results. Wyoming DOT corpus tests showed vector-search accuracy collapsing from 75 percent to below 40 percent simply as a document set grew from 54 to 1,128 files. The verdict across the domain is that adoption is now an organisational practice — brands run quarterly hallucination audits and treat AI visibility as a managed channel — rather than a technology still improving on its core weakness.

Two reinforcing themes dominated everything else. First, the citation-integrity crisis hardened into hard numbers: a Columbia-led audit of 2.5 million biomedical papers found 4,046 fabricated citations, a twelve-fold acceleration since 2023, with publishers acting on fewer than 2 percent of flagged cases. arXiv began issuing one-year submission bans for unchecked AI content. Second, governance matured on the regulatory side: the EMA's 2025 AI Observatory Report recognised continuous pharmacovigilance monitoring as mainstream practice, the UK MHRA opened a regulatory sandbox for AI safety monitoring, and the EU AI Act began producing enforcement signals (a €4.5M fine on a Frankfurt firm for opaque RAG attribution). Vendor architecture also diversified meaningfully — Amazon Science showed agentic keyword search matching 90 percent of vector-RAG performance without a vector database, and LazyGraphRAG demonstrated a 700-fold cost reduction over GraphRAG — signalling the field is now competing on cost and governance, not just accuracy. The stability of the rest of the domain (most practices held position) is itself the signal: capability is no longer the bottleneck.

Key Tensions

  • The reliability ceiling is structural, and capability gains no longer touch it. Eighteen months of frontier-model improvement produced zero reliability gain for production research agents (Princeton-backed analysis). Long-document synthesis sits at ~75 percent accuracy; five-model factual disagreement runs at 67 percent; ResearchClawBench scored Claude Code at 21.5 percent on real scientific re-discovery. The problem is not that models are weak but that the failure modes — confident fabrication, retrieval collapse under scale, multi-hop dependency breakdown — are intrinsic to the architecture.

  • The binding constraint has moved from technology to data readiness and governance. Enterprise RAG fails critically in roughly 80 percent of attempts, with 73 percent of failures originating at the retrieval layer and knowledge-base quality — not model quality — as the determinant. M&A due-diligence deployments are stalling on insufficient data-layer preparation (collection defensibility, deduplication, metadata integrity). The hard work is now janitorial and organisational, which is precisely the work most organisations underestimate.

  • Courts and regulators are now pricing the reliability gap directly. Judicial sanctions for AI-fabricated citations exceeded $145,000 in Q1 2026; a global database tracks over 1,200 court hallucination cases across 30+ countries; the Withers v City of Aberdeen case saw all four lawyers on both sides sanctioned for unverified AI citations. The EU AI Act's transparency and audit-trail mandates are producing fines. Verification is shifting from optional enhancement to legal table-stakes, and liability cannot be outsourced to the tool.

  • The information environment is poisoning its own inputs. Roughly 40 percent of web content is now AI-generated, degrading the signal-to-noise ratio that retrieval and horizon-scanning systems depend on. ACM research documented synthetic content snowballing to 80 percent of top search results while accuracy metrics stayed reassuring. The citation-fabrication crisis (twelve-fold rise since 2023) means the corpora that analysis tools ingest are themselves contaminated — a compounding feedback loop with no vendor solution.

  • Architecture is diversifying away from vector-centric defaults, on cost as much as accuracy. Amazon Science demonstrated agentic keyword search reaching 90 percent of vector-RAG performance without a vector database; Alibaba Cloud achieved a 95 percent cost reduction with billion-scale hybrid retrieval; LazyGraphRAG showed a 700-fold cost cut over GraphRAG. With 72 to 87 percent of RAG implementations reporting first-year failure on uncontrolled infrastructure costs, total cost of ownership — not benchmark accuracy — is increasingly the deciding factor in production design.

Top 10 Evidence Items

  1. AI Making Up References in Research Papers (research-paper) — The twelve-fold acceleration in fabricated biomedical citations since 2023, with publishers acting on fewer than 2 percent of cases, is the clearest single signal that the information-environment is poisoning its own inputs faster than any vendor mitigation is working. https://www.scienceunderattack.com/blog/2026/6/8/a-new-threat-to-science-ai-making-up-references-in-research-papers-193

  2. International AI-assisted audit finds fabricated references in nearly 3,000 peer-reviewed medical articles (case-study) — The Columbia-led team used AI to detect AI-caused damage at scale across 2.5 million PubMed papers; the 98.4 percent non-response rate from publishers illustrates precisely why the binding constraint is now governance, not capability. https://www.uef.fi/en/article/international-ai-assisted-audit-finds-fabricated-references-in-nearly-3000-peer-reviewed-medical

  3. ResearchClawBench: A Benchmark for End-to-End Autonomous Research Agent Evaluation (research-paper) — Claude Code scoring 21.5 percent on real scientific re-discovery tasks is the most direct evidence that the gap between market adoption of "agentic research" and measured production reliability is structural, not incremental. https://huggingface.co/papers/2606.07591

  4. When More Documents Hurt RAG: Mitigating Vector Search Dilution with Domain-Scoped, Model-Agnostic Retrieval (research-paper) — Accuracy collapsing from 75 percent to below 40 percent as a corpus scales from 54 to 1,128 documents is the empirical underpinning of why data readiness, not model quality, is the actual bottleneck for production enterprise search. https://arxiv.org/abs/2606.11350v1

  5. Retrieval Collapses When AI Pollutes the Web (news-coverage, 2026 ACM Web Conference) — Synthetic content snowballing to 80 percent of top results while accuracy metrics stay reassuring is the clearest demonstration of the compounding feedback loop: AI-generated content degrades the retrieval systems AI tools depend on, invisibly. https://www.linkedin.com/posts/sandeepde_retrieval-collapses-when-ai-pollutes-the-activity-7471683826130370560-B7DP

  6. Withers v City of Aberdeen: Bilateral AI Hallucination Failure (case-study) — All four lawyers on both sides sanctioned for unverified AI citations in the same case is the most concentrated illustration of why verification is now legal table-stakes, not optional enhancement. https://ediscoverytoday.com/2026/06/09/lawyers-on-both-sides-of-a-case-cite-fake-hallucinated-cases-artificial-intelligence-trends/amp/

  7. KPMG withdraws AI report over inaccuracies and hallucinations (case-study) — Forty hallucinations across 45 citations in a published professional intelligence product, with named enterprise clients publicly refuting false claims, shows that market-and-competitive intelligence is not immune to the reliability crisis even at Big Four deployment scale. https://founderoperator.com/founders/kpmg-ai-report-hallucinations-withdrawal

  8. Agentic RAG — Evolution, Challenges, and Decision Criteria (industry-report) — The finding that 51 percent of enterprise AI failures are RAG-related and that retrieval quality rather than model size drives hallucination directly backs the summary's claim that the binding constraint has migrated from technology to data and governance. https://anthonywest.co.uk/research/agentic-rag-evolution/summary

  9. The Agentwashing Crisis: Why 79% of Enterprises Claim AI Agents But Only 11% Ship to Production (adoption-metric) — The 79-percent-claim / 11-percent-production gap, combined with Gartner's 40-percent-cancellation forecast by 2027, quantifies exactly the bifurcation the summary describes between stated adoption and actual production maturity for autonomous research. https://agentmarketcap.ai/blog/2026/06/07/agentwashing-crisis-enterprise-ai-agents-2026

  10. Artificial Analysis Long Context Reasoning Benchmark Leaderboard (adoption-metric) — Frontier models plateauing at 75–76 percent accuracy on long-document synthesis is the benchmark anchor for the summary's claim that the reliability ceiling is structural; this is the number behind "eighteen months of capability gains, zero reliability gain." https://artificialanalysis.ai/evaluations/artificial-analysis-long-context-reasoning