The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail
AI that retrieves relevant information from a single search query and synthesises a coherent answer with source attribution. Includes search-augmented generation and cited responses; distinct from deep research which conducts multi-step autonomous investigation.
Single-query research retrieval has crossed into production at forward-leaning organisations, but a persistent reliability gap keeps it from becoming standard infrastructure. The practice combines information retrieval with generative AI: a system executes one or a few searches, retrieves relevant sources, and synthesises a cited answer in a single pass. Perplexity, You.com, ChatGPT with browsing, and enterprise RAG deployments all embody this pattern. Adoption is real and growing fast -- Perplexity alone has surpassed 50 million monthly active users, and two-thirds of B2B buyers report using AI search tools. Yet the same deployments surface a stubborn accuracy ceiling: hallucinations, misattributed citations, and query-specific failures that aggregate metrics routinely mask. Architectural advances such as hybrid retrieval and adaptive depth are narrowing the gap, not closing it. The defining tension at leading-edge is exactly this: organisations are deploying at scale while knowingly accepting systematic factuality risks that no shipping product has resolved.
The vendor ecosystem is broad and hardening toward enterprise infrastructure. Perplexity has reached 45-50 million monthly active users with $454M ARR (50% monthly growth) and secures $750M 3-year Microsoft Azure partnership for production-scale deployment; You.com processes over one billion API queries per month for enterprise customers including Alibaba and DuckDuckGo; Databricks has launched Instructed Retrieval, a hybrid deterministic-probabilistic search yielding 35-50% recall gains; and Google Cloud now offers a production RAG platform on Vertex AI with hybrid search and re-rankers.
Enterprise deployments confirm measurable productivity value with named evidence. Ontop (global payroll company) deployed enterprise AI search reducing response time from 20 minutes to 20 seconds for legal compliance questions, saving legal team 130 hours monthly with 60% query acceptance rate. A 300-person manufacturing firm switched from Google to Perplexity Enterprise and reduced competitive-analysis research cycles from 7 days to 2 days using enterprise index integration. The Cleveland Cavaliers use Perplexity across 15+ teams reporting 10+ hours saved per employee per week; Databricks documents 5,000 working hours monthly savings. Yet a wide gap separates usage from bottom-line impact: 71% of organisations use generative AI regularly, but only 17% attribute more than 5% of earnings to it. Conversion analysis shows AI-cited traffic converts 14.2% versus 2.8% organic baseline—high intent but constrained by reliability.
Reliability remains the binding constraint, with May 2026 evidence sharpening the gap between deployment scale and citation accuracy. Critical failures documented across medical, legal, and peer-review contexts: Royal College of Surgeons (April 2026) found 25-34% of medical references fabricated in chatbot responses; Lancet study (May 2026) identified 12-fold rise in fake citations across 2.5 million biomedical papers since 2023; legal sanctions across four US jurisdictions (Pennsylvania, Northern California, Georgia, California) for attorneys using AI-generated false citations in court filings. Platform divergence matters: Perplexity shows 78% citation coverage on complex research queries versus ChatGPT's 62%, but only 11% citation overlap between systems on identical queries—suggesting platform-specific architectural success at scale. Quality improvements show frontier models reaching 1.0-2.5% hallucination on summarization tasks (up from 3-8% in 2023), yet hallucination rates vary 5-15x by topic class. The CRAG benchmark (Meta/HKUST, 4,409 QA pairs) documents the capability ceiling: advanced LLMs achieve ≤34% accuracy, basic RAG 44%, state-of-the-art solutions 63% without hallucination. Architectural research (SIRA, May 2026) shows single-pass retrieval can be optimized through LLM-guided corpus discrimination, compressing multi-round search into single queries while outperforming dense retrievers. However, practitioner case studies remain clear: audit findings show 29% of citations have conclusions misaligned with source content despite nominally correct references. Standard evaluation metrics continue to mask query-specific catastrophes—a system can score 78% precision on average while failing systematically on the queries users care most about. No published breakthrough has closed this gap between deployment momentum and operational reliability.
— Empirical analysis of 28,870 source events reveals 71% of sources exclusive to single model; 16–59% pairwise overlap across engines. Documents structural divergence in single-query retrieval architecture rather than convergence toward standard.
— 2026 ACM Web Conference study: high-quality synthetic content snowballs into 80%+ of top results while accuracy metrics stay reassuring—retrieval systems drift onto synthetic evidence invisibly. Critical systemic failure mode of single-query systems at scale.
— Anthropic web search API enables Claude to autonomously decide when to search, refine queries, and return cited results. Customizable domain allowlists, web search integrated into Claude Code beta. Shows major vendor expanding into single-query research space.
— Empirical study documents vector search dilution failure: Wyoming DOT corpus scaling 54→1,128 documents reduced accuracy 75%→below 40%. Proposes MASDR-RAG and identifies precision-faithfulness paradox—demonstrates adoption barriers when retrieval scales to large, noisy collections.
— Author's testing demonstrates hallucination reduction from 19% to 2% error rate with Citations API; legal domain goes 88%→52%, healthcare 56%→21%. Deployed on Anthropic API, Bedrock, Vertex AI with measured ROI (4 hours → 35 min audit trail).
— Ecosystem maturity signal: brands now running quarterly hallucination audits, optimizing entity schema for AI citation. Wikipedia overweights at 47.9% of ChatGPT top-10 sources. Organizations have normalized single-query retrieval as business infrastructure requiring active management.
— Reka AI released 374-question benchmark replacing saturated SimpleQA, achieving performance discrimination across 26.7–59.1% accuracy range. Signals ecosystem recognition that single-query search-augmented LLMs warrant dedicated rigorous evaluation.
— Hands-on testing across six AI tools (Gemini, ChatGPT, Copilot, Claude, Perplexity, DeepSeek) shows massive variation in citation coverage and UX; Gemini in-text complete, ChatGPT/Perplexity inconsistent, Claude lacks sources pane. Reveals citation attribution is non-standardized across platforms.