Single-query research retrieval & summary

The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.

AI Maturity by Domain

Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail

DOMAIN

BLEEDING EDGEESTABLISHED

LEADING EDGE

TRAJECTORY↑ Advancing

AI that retrieves relevant information from a single search query and synthesises a coherent answer with source attribution. Includes search-augmented generation and cited responses; distinct from deep research which conducts multi-step autonomous investigation.

OVERVIEW

Single-query research retrieval has crossed into production at forward-leaning organisations, but a persistent reliability gap keeps it from becoming standard infrastructure. The practice combines information retrieval with generative AI: a system executes one or a few searches, retrieves relevant sources, and synthesises a cited answer in a single pass. Perplexity, You.com, ChatGPT with browsing, and enterprise RAG deployments all embody this pattern. Adoption is real and growing fast -- Perplexity alone has surpassed 50 million monthly active users, and two-thirds of B2B buyers report using AI search tools. Yet the same deployments surface a stubborn accuracy ceiling: hallucinations, misattributed citations, and query-specific failures that aggregate metrics routinely mask. Architectural advances such as hybrid retrieval and adaptive depth are narrowing the gap, not closing it. The defining tension at leading-edge is exactly this: organisations are deploying at scale while knowingly accepting systematic factuality risks that no shipping product has resolved.

CURRENT LANDSCAPE

The vendor ecosystem is broad and scaling. Perplexity has reached 50 million monthly active users and ships production APIs (Agent API, Embeddings API) to general availability; You.com processes over one billion API queries per month for enterprise customers including Alibaba and DuckDuckGo; Databricks has launched Instructed Retrieval, a hybrid deterministic-probabilistic search yielding 35-50% recall gains; and Google Cloud now offers a production RAG platform on Vertex AI with hybrid search and re-rankers. Perplexity's acquisition of Carbon signals a push into enterprise grounding across internal data sources, complemented by a 200-seat law enforcement deployment that extends adoption into the public sector.

Enterprise deployments confirm productivity value. The Cleveland Cavaliers use Perplexity across 15+ teams, reporting 10+ hours saved per employee per week. Databricks documents 5,000 working hours of monthly savings. B2B buyer research finds AI search shortens research cycles by 34%. Yet a wide gap separates usage from bottom-line impact: 71% of organisations use generative AI regularly, but only 17% attribute more than 5% of earnings to it.

Reliability remains the binding constraint, with April 2026 evidence hardening the gap between adoption and performance. Large-scale accuracy testing (NP Digital, BBC/EBU) shows Perplexity at 49.3% accuracy with 12.2% outright error rate; 51% of AI answers contain significant issues, and 13% fabricate attributed quotes entirely. The CRAG benchmark from Meta/HKUST documents the capability ceiling: advanced LLMs achieve ≤34% accuracy, basic RAG improves to 44%, and state-of-the-art solutions reach 63% without hallucination—across just 4,409 question-answer pairs. A PRISMA review of 128 RAG studies documents persistent data quality failures across pipeline stages. A Nature-published Bixonimania experiment shows systems elaborated false statistics and contaminated peer review through citation laundering; practitioners report Charlotin database tracked 1,200+ hallucination cases by early 2026 (growing 5-6 daily), with Perplexity showing 37% citation error rate. Practitioner analysis finds that standard evaluation metrics mask query-specific catastrophes--a system can score well on average while hallucinating on the queries that matter most. Adaptive retrieval techniques show promise, with production implementations reporting 40-60% latency improvements, but these are engineering workarounds for a deeper problem: single-pass systems treat every query identically, under-serving complex questions while over-processing simple ones. No published breakthrough has closed this gap.

TIER HISTORY

ResearchNov-2022 → Nov-2022

Bleeding EdgeNov-2022 → Jan-2025

Leading EdgeJan-2025 → present

EVIDENCE (97)

ChatGPT vs Perplexity vs Google: Citation Differences - GroMachOpinion2026-05-02

— Comparative analysis of retrieval and citation behavior across three major single-query research platforms; documents platform-specific ranking architectures and source selection biases.

What Hallucination Actually... (Stanford 2026 AI Index)Industry Reports2026-05-01

— Stanford AI Index analysis documenting structural hallucination failure: knowledge-belief distinction collapse when users assert false premises; models fail at 86% rate under this condition.

ChatGPT vs Google Search in 2026: Market Share, User Data & What It Means for SEOAdoption Metrics2026-04-30

— Comprehensive market analysis: ChatGPT 900M weekly users, 17% of all digital queries; conversion impact shows AI-referred visitors convert 23x higher than organic; cites are new ranking signal.

AI Search Statistics (2025-2026): 55+ Data Points on GEO, Buyer Behavior, and Citation RatesAdoption Metrics2026-04-30

— Aggregated adoption metrics from primary sources (OpenAI, Adobe, Gartner); comprehensive evidence of mainstream single-query AI research adoption, search behavior shifts, and user quality expectations.

Why AI Search Is 60% Hallucination (And How To Be The Real StoryBrand Guide)Industry Reports2026-04-30

— Critical negative signal from Tow Center for Digital Journalism study. Shows real failure rates in production single-query research tools: Perplexity 37% wrong, ChatGPT 40% wrong, overall 60% inaccuracy.

AI Search Engines: 2026 Market Report & Key TrendsIndustry Reports2026-04-29

— Market analysis of AI search platforms (ChatGPT, Gemini, Perplexity); describes market shift from keyword search to AI-synthesized answers with citations, directly addressing single-query research transition.

AI Hallucination Rate Benchmarks 2026: 5-Model StudyResearch Papers2026-04-23

— Original rigorous benchmark: 5,000 prompts across frontier models measuring hallucination on factual recall, citation accuracy, and code reference; citation accuracy worst at 12.4% average.

Search Engine Market Share 2026: Global Data ReportAdoption Metrics2026-04-22

— Measures AI search referral adoption at 0.9% of web traffic (5x YoY growth), names platforms (ChatGPT, Perplexity, Gemini, Claude); projects 3-5% share by end 2027 with steeper growth curve than organic.

HISTORY

2022-H2: Dense retrieval foundations mature (300+ paper survey). QUILL system deployed at billion scale using retrieval augmentation for query understanding. You.com launches enterprise AI product with source attribution. Early adoption accelerating (ChatGPT 1M users by Dec 4). Fact hallucination documented as key reliability challenge; data protection concerns cited as adoption barrier.
2023-H1: Perplexity AI reaches 10M monthly visits with 100% MoM growth; secures Series A and explores partnerships with Instacart, Klarna. You.com adds multimodal chat. Academic research confirms widespread hallucination and inaccurate citation in production engines; large-scale MIT study (12K+ queries) shows users distrust AI search but false citations increase perceived trustworthiness. Critical reliability gap persists despite explosive adoption.
2023-H2: You.com launches web search APIs for LLM integration ($100/month) with enterprise adoption by LlamaIndex, Anthropic, Cohere—API-first strategy expands beyond consumer products. Professional adoption accelerates in healthcare and content workflows despite documented hallucination failures (arithmetic errors, citation fabrication). Practitioner guides compare citation quality to ChatGPT; international adoption signals emerge. Developer APIs and tooling infrastructure mature while reliability concerns remain unresolved.
2024-Q1: You.com scales production infrastructure to handle 1B+ monthly API calls across Search, Content, News, and Images endpoints. Perplexity continues expanding adoption despite critical failures: documented medical errors (wrong post-surgery guidance) reveal persistent accuracy gaps in real-world deployment, highlighting the reliability-adoption paradox where users embrace single-query systems despite known hallucination risks.
2024-Q2: Perplexity secures enterprise contracts with Zoom, HP, Stripe, and Cleveland Cavaliers, demonstrating willingness to deploy at scale despite reliability concerns. Academic research (NAACL, Google Cloud AI) identifies persistent technical limitations: retrieval augmentation inconsistently helps LLMs and can hurt performance, imperfect retrieval is widespread (70% of passages don't contain true answers), and enterprise deployment faces unresolved barriers around accuracy validation. Independent studies reveal systematic trustworthiness failures: citation of AI-generated sources and second-hand hallucinations detected in production use, defining the core adoption tension.
2024-Q3: Vendor ecosystem hardens: Coveo launches production-grade Relevance-Augmented Passage Retrieval API (September GA) to address precision and hallucination. Real-world deployment evidence emerges (HP salesforce adopting Perplexity for prospect research). However, peer-reviewed and independent evidence of quality failures intensifies: JMIR study shows 61.6% citation irrelevancy in medical chatbot evaluation (July); Cornell/UW/Waterloo benchmarking shows top models achieve only 35% hallucination-free responses (August); technical research highlights single-query limitations (RQ-RAG, July; EuroPython 2024 talk on RAG failure modes). Maturation visible but fundamental reliability gaps persist—systems scaling operationally while failing visibly in production.
2024-Q4: Enterprise deployment accelerates at scale: Cleveland Cavaliers adopt Perplexity across 15+ teams saving 10+ hours/week per employee; Amplitude deploys for market research; Perplexity launches Election Information Hub for fact-checked real-time election information. However, critical reliability evidence hardens: Columbia University Tow Center finds ChatGPT Search misattributes sources 76.5%; peer-reviewed analysis identifies 16 design failures (bias, hallucination, misattribution); consultant case studies document seven failure modes in production RAG systems. Operational maturity confirmed but reliability-adoption paradox crystalizes: enterprises deploy at scale while accepting systematic factuality and attribution failures.
2025-Q1: Infrastructure consolidation accelerates: Perplexity hardens pplx-api on NVIDIA infrastructure for production throughput; enterprise risk tolerance normalizes despite API outages (January 23, Perplexity). Technical landscape stalls—practitioner analysis shows most production systems remain at stage 1-2 (chatbots/reasoners) rather than advancing to autonomous agents. Reliability paradox deepens: 70% of enterprises depend on LLM-based research tools, yet persistent gaps in context understanding, sentiment analysis, and service availability define adoption ceiling. No breakthrough solutions emerge; the practice consolidates operationally while remaining technically constrained.
2025-Q2: Quality degradation and investor skepticism emerge. Databricks reports 5,000 working hours monthly savings with Perplexity, validating enterprise ROI, but counterbalanced by tech journalism reporting model collapse in AI search tools, documented finance accuracy failures (97% error rates), WhatsApp bot scaling outages, and investor skepticism (conference vote: Perplexity "most likely to flop"). Market reports 42% of users encounter misleading content. The practice shifts from "proven but unreliable" to "deployed but questioned"—deployment momentum persists but quality concerns begin eroding organizational confidence.
2025-Q4: Vendor RAG ecosystem accelerates at scale—Google Cloud ships production RAG platform with Vertex AI and hybrid search (Dec); RAGAS evaluation framework reaches 4,000+ GitHub stars and 5M+ monthly evaluations. Research adoption expands (Qualtrics: 72% of AI-using teams report increased organizational dependence). However, systematic evidence of limitations hardens: PRISMA systematic review of 128 RAG studies documents persistent data quality failures across pipeline stages; ICIS practitioner study identifies 15 data quality dimensions and failure modes; legal domain research reveals Document-Level Retrieval Mismatch failure in production systems. Quality degradation signals from Q2 persist with no published breakthrough solutions. Deployment breadth has hardened as incumbent infrastructure—single-query retrieval now standard in enterprise research workflows—but technical maturation has plateaued.
2026-Jan: Vendor innovation accelerates in single-query architecture: Databricks launches Instructed Retrieval combining deterministic and probabilistic search for 35-50% recall gains; Perplexity acquires Carbon to enable enterprise grounding across internal sources with mid-market rollout; Perplexity expands public-sector adoption (200-seat law enforcement deployment). Analysis of production failures deepens: practitioner research identifies persistent evaluation blind spots where metrics mask query-specific catastrophes; advocates for adaptive retrieval (query-aware dynamic depth) showing 40-60% latency improvements in production. Market survey shows 71% of orgs use GenAI regularly but only 17% realize 5%+ earnings impact—broad deployment with productivity-to-ROI gap persists. Single-query systems now incumbent infrastructure but reliability constraints remain unresolved despite architectural innovations addressing static retrieval limitations.
2026-Feb: Enterprise adoption metrics harden: 67% of B2B buyers use AI search tools (Perplexity 29% preference), with AI search shortening research cycles by 34%; Perplexity reaches 50+ million monthly active users (up from 15M in early 2025); You.com operates at billion-scale infrastructure (1B+ API queries/month) with enterprise customers (Alibaba, Amazon, DuckDuckGo). Vendor ecosystem matures: Perplexity releases Agent API and Embeddings API to general availability, enabling production-grade custom applications. Technical depth clarified: arXiv research confirms retrieval essential for accuracy (0% without retrieval vs. 79% with in SQL/API generation), validating core architectural premise. However, production challenges persist: practitioner analysis identifies systematic failure modes where standard metrics mask query-specific catastrophes—reliable single-query retrieval remains constrained by evaluation blind spots and deployment complexity despite market-scale adoption.
2026-Mar: Deployment evidence deepens across sectors. Conversion metrics accelerate: AI-referred B2B traffic converts 49-63% across industries (vs. organic 28-42%); 796% AI traffic growth YoY with 6,432% conversion growth and 87.4% of AI referral traffic from ChatGPT, signalling platform concentration. Healthcare deployment case study: a 65-person SaaS deployed RAG + ensemble + verifier pipeline reducing operational hallucination rate from 4.2% to 3.4% (FACTS benchmark, 90-day timeline). Critical limitations surface: Apple/Duke research identifies over-searching as a systematic failure mode (first search provides 0.874% accuracy ROI, subsequent searches show diminishing returns and hallucinations); Washington State University study finds ChatGPT achieves only 73% consistency across identical prompts and 16.4% accuracy identifying false hypotheses. 451 Research confirms vector database maturation now underpins RAG infrastructure at enterprise scale. Signal persists: broad adoption at scale with unresolved reliability constraints and evaluation blind spots that aggregate metrics continue to mask.
2026-May: Adoption metrics hardened further — ChatGPT at 900M weekly users and 17% of all digital queries — while reliability evidence continued to accumulate against it. Stanford AI Index documented models failing at 86% when users assert false premises, and a 5,000-prompt benchmark found citation accuracy averaging just 12.4% across frontier models. Tow Center for Digital Journalism found Perplexity at 37% wrong and ChatGPT at 40% wrong overall (60% inaccuracy rate), reinforcing the deployment paradox: single-query AI search is mainstream infrastructure with adoption growing 5x YoY, but production accuracy in retrieval remains structurally unresolved.
2026-Apr: Ecosystem maturation accelerates while reliability gaps harden. Major partnerships demonstrate infrastructure confidence: Microsoft commits $750M 3-year cloud partnership with Perplexity providing multi-model access (OpenAI, Anthropic, xAI); Samsung integrates Perplexity at OS level on Galaxy S26 (1B+ device ecosystem), signaling major vendor endorsement. Adoption metrics deepen: 73% of B2B buyers use AI tools for purchase research (multi-source meta-study across 680M citations); Perplexity enterprise product launched March 2026 with 100+ customers acquired first weekend. The CRAG benchmark (Meta/HKUST, 4,409 QA pairs) quantified the capability ceiling: advanced LLMs achieve ≤34% accuracy, basic RAG 44%, state-of-the-art solutions 63% without hallucination. A Nature-published Bixonimania experiment demonstrated citation laundering at scale — AI systems elaborated a fake disease into false statistics that contaminated peer review, illustrating how single-query systems amplify fabricated sources. Perplexity's Amazon Bedrock deployment documented production quality improvements (Claude 3 reducing hallucinations by half vs. Claude 2.1), while independent brand accuracy research found AI answers wrong about brands 40% of the time. Reliability constraints persist and sharpen: EACL 2026 peer-reviewed research formalizes error taxonomy for realistic RAG deployments; independent citation accuracy testing shows 78% precision across 847 queries; only 30% of AI-generated answer sources reappear in an identical follow-up query. Research confirms hybrid retrieval substantially outperforms single-stage methods; architectural advances address known failure modes but production systems remain constrained by aggregation blind spots that mask query-specific catastrophes. Broad adoption accelerates (900M weekly ChatGPT users, 1.2-1.5B Perplexity monthly queries) with 23x conversion advantage over organic search, but technical progress has plateaued around reliability ceilings no shipping product has systematically resolved.