Research & Knowledge — AI Maturity

The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.

AI Maturity by Domain

Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail

DOMAIN

BLEEDING EDGEESTABLISHED

🔬 Research & Knowledge

AI for finding, synthesising, verifying, and preserving organisational knowledge. Mostly leading-edge: literature review, competitive intelligence, and knowledge management tools are maturing quickly with five practices actively advancing. The main constraint is hallucination risk — fact-checking and source verification still require human oversight in high-stakes contexts.

14 practices: 2 good practice, 11 leading edge, 1 bleeding edge

Research & Knowledge — Biweekly Brief

The headline: AI research tools are now standard infrastructure at most large organizations, but their accuracy on the tasks that matter most -- synthesizing sources, verifying citations, conducting multi-step analysis -- has not materially improved in eighteen months.

The Picture

Most large enterprises have deployed AI for research and knowledge work. Email summarization now ships by default to three billion Gmail users. Enterprise search with AI-generated answers runs in 70-80% of large organizations, and the vendors behind meeting transcription have crossed $300M in annual revenue. These are not experiments -- they are embedded in daily workflows, and the early adopters are reporting real productivity gains (10+ hours per employee per week in some cases). But a sharp divide persists between what these tools do well (finding information, triaging documents, summarizing bounded content) and what they do poorly (verifying claims, attributing sources accurately, conducting autonomous multi-step research). No major vendor has closed that gap this cycle. Organizations that treat AI research tools as "good enough" without verification workflows are accumulating liability exposure, particularly in legal and regulated contexts where courts are now imposing six-figure fines for AI-generated citation errors.

This Fortnight

Autonomous research agents scored below 10% on a new end-to-end benchmark. AutoResearchBench tested frontier models on 1,000 multi-step research tasks requiring state tracking and evidence synthesis. Claude Opus 4.6 scored 9.39%, GPT-5.4 at 7.44%, Gemini 3.1 Pro at 7.93%. The failures are architectural -- not retrieval access, but the ability to track what has been established and verify constraints across steps. For any team evaluating agentic AI (software that acts on its own without being prompted) for research workflows, this means human review at every synthesis step remains non-negotiable.
Citation hallucination enforcement reached a new peak. A global database now tracks 1,348 cases of AI-fabricated citations in court filings across 30+ countries, with sanctions reaching $109,700 in a single case. Australia's Federal Court mandated AI disclosure and manual citation verification. Sullivan & Cromwell, among the world's most elite law firms, publicly acknowledged missing AI hallucinations (when an AI tool confidently makes things up) in a bankruptcy filing. General counsel and compliance teams should assume that citation verification requirements will become standard across jurisdictions within 12-18 months.
Domain-specific RAG proved viable at genuine enterprise scale. A health system deployed AI-powered question answering across 1.68 million patients and 166 million clinical notes, achieving 94.6% accuracy at 237ms response time for $4,000 per month. Separately, a mid-market firm improved RAG (retrieval-augmented generation -- giving the AI a document to reference) accuracy from 62% to 94% through data preparation changes alone, with no model upgrades. The lesson for technology leaders: accuracy problems in AI research tools are usually data problems, not model problems.
Gartner published its first Magic Quadrant for competitive intelligence platforms. Crayon, Klue, and AlphaSense were named Leaders, validating competitive intelligence as a recognized enterprise software category. Klue's Compete Agent demonstrated a 28% win rate lift for customers. Yet broader adoption remains confined to about 6-10% of enterprises -- the rest struggle with data readiness and the organizational discipline to act on insights, not just collect them.
Meeting intelligence litigation expanded. Cruz v. Fireflies.ai became the second active class-action lawsuit targeting a meeting transcription vendor for collecting voiceprints without statutory consent (under BIPA, Illinois's biometric privacy law). This follows the consolidated Brewer v. Otter.ai class action. Organizations using meeting AI bots should review their consent frameworks now, particularly for cross-state and international meetings where privacy laws vary.

Coming Up

Verification requirements will spread beyond courts. NIST AI 600-1 already designates confabulation (AI fabrication) as a Tier 1 risk requiring pre-deployment testing. As more jurisdictions follow Australia's mandate for AI disclosure and manual citation checks, organizations using AI for any externally facing research or analysis should build verification workflows into their standard operating procedures within the next six months.
Enterprise RAG spending will face ROI scrutiny. With 51% of enterprise AI failures attributed to RAG implementations and only 200 of roughly 1,000 attempted deployments succeeding, CFOs will demand sharper cost-benefit analysis. The emerging "RAGOps" discipline -- treating AI retrieval pipelines as production systems requiring monitoring, governance, and lifecycle management -- will separate organizations that sustain value from those that waste infrastructure budgets. Technology leaders should audit their RAG deployments for silent degradation (embedding drift, stale knowledge bases, chunking inconsistencies) before the next budget cycle.
Knowledge infrastructure will become a board-level conversation. Deloitte found 60% AI adoption but only 40% data management maturity. Stanford confirmed 88% of organizations use AI, but an emerging "presence versus execution gap" means most cannot move from pilots to production. The organizations pulling ahead are investing in knowledge graphs, governed taxonomies, and data quality pipelines -- not better models. Boards should ask their technology leaders a simple question: do we have the data foundations to make our AI investments productive, or are we spending on capabilities we cannot reliably use?

What's Hard About This

Accuracy improvements are not accumulating. Princeton-backed analysis found that eighteen months of model capability gains produced zero reliability improvement for production research agents. Hallucination rates on citation tasks remain at 12.4% across frontier models. Organizations cannot wait for the next model release to solve their accuracy problems -- they need verification infrastructure now.
Organizational readiness lags technology capability by years. Only 23% of organizations scale AI past pilot stage, and just 6% report measurable profit impact. Cisco found 85% piloting AI agents but only 5% trusting them in production. The bottleneck is governance, data quality, and process redesign -- not the tools themselves. This is a leadership problem, not a technology problem.
The information environment is degrading. Roughly 40% of web content is now AI-generated, contaminating the signal streams that research and horizon scanning tools depend on. Deep research agents treat citation volume as credibility, making them vulnerable to poisoned sources -- a fabricated article was cited as fact by Perplexity within 24 hours. The tools that find information are increasingly unable to distinguish real signals from synthetic noise, and no vendor has a structural solution.

Go deeper: the full Research & Knowledge briefing -- the longer analytical write-up, plus every practice we track in this domain with its maturity rating, the tools to consider, and the evidence behind our assessment.