The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail
AI that translates natural language questions into SQL queries and performs semantic search across structured and unstructured data. Includes text-to-SQL tools and embedding-based retrieval; distinct from enterprise RAG which retrieves from document collections rather than databases.
Natural language data querying has reached critical adoption momentum alongside persistent production barriers. Snowflake's May 2026 earnings inflection—50% of customer base (5,200+ weekly active Cortex AI accounts) with 200% growth in AI-related workloads—signals mainstream traction for the vendor platform. Yet success hinges entirely on upfront data engineering: context quality (semantic layer, metadata governance, schema curation), not model capability, determines whether text-to-SQL achieves production accuracy. The field consensus has crystallized: benchmark-to-reality gap remains unbridged despite four years of research; only enterprises investing heavily in semantic layer development and schema preparation achieve reliable deployments.
Production deployments (Uber 1.2M/month, Tapestry feedback analysis, finance teams using multi-turn conversational analytics for modeling and forecasting) demonstrate viability at scale for well-governed data environments. However, the limiting factor is consistently enterprise context: 70% of real SQL queries follow just 13% of templates (Cornell), yet 50% of frontier model failures stem from context/domain gaps rather than model capability (Berkeley Data Agent Benchmark). Academic text-to-SQL benchmarks report 79-87% accuracy; frontier models achieve 86.6% on Spider 1.0 but collapse to 10% on complex, real-world enterprise schemas. April 2026 security research quantifies deployment risks: generated SQL can violate permissions, leak sensitive fields, and return semantically wrong results despite syntactic correctness, requiring deterministic validation layers in production.
Vendors have consolidated around agentic architecture and mandatory semantic layers—an acknowledgment that pure text-to-SQL is insufficient. Cost-efficient fine-tuning emerges ($0.80/month for 22,000 queries with LoRA), and research momentum persists (ACL 2026 papers show 70%+ accuracy on specialized benchmarks via agentic approaches). Yet mainstream adoption without substantial implementation investment remains elusive. The practice reaches leading-edge maturity: production-ready for organizations with resources to invest in data governance and schema engineering; early-adopter advantage shifting from capability innovation to operational execution.
The vendor ecosystem has consolidated around agentic architecture and semantic layers as non-negotiable requirements. Google Cloud's QueryData (GA April 2026, #1 BiRD benchmark ranking with Hughes Network Systems deployment), Snowflake Cortex Analyst, AWS Quick Suite, ThoughtSpot Spotter for Industries, and emerging players signal ecosystem maturity. The April 2026 Snowflake earnings inflection -- 50% customer base (5,200+ weekly active users) using Cortex AI with 27% YoY growth in $1M+ spenders -- validates mainstream enterprise traction. Scale AI deployed TextQL's Ana at scale: 1,900 requests/week across Finance/Ops/HR on 1.9T rows with 74.9% monthly adoption growth. Uber's QueryGPT handles 1.2M queries monthly with 10→3 minute authoring speedup; Dream11's platform achieved 98.4% execution accuracy with fine-tuned 8B models on 250M users. These successes require substantial upfront investment in semantic layers, schema curation, and business context systems.
However, April 2026 evidence sharpens the benchmark-to-production reality gap. Practitioners document silent failures: queries execute without error but return wrong results due to semantic errors (fan-out traps in joins, NULL inconsistencies, ambiguous business terminology), performance failures, and SQL injection vulnerabilities. The gap is quantified: Spider benchmark uses 146 clean databases with 5-30 tables; production systems have 400+ tables with opaque naming conventions. GPT-4o achieves 90%+ accuracy on synthetic benchmarks but drops to 51% on real enterprise BI questions—a 39-point accuracy collapse. Amazon Science's PRACTIQ dataset addresses a core gap: production chatbots receive ambiguous questions and unanswerable queries that existing benchmarks never test. Practitioners favour agentic function-calling over direct text-to-SQL due to SQL dialect complexity and model limitations. Cost-efficient fine-tuning patterns emerge (AWS LoRA approach: $0.80/month for 22,000 queries), but fundamental deployment barriers persist: 10-20% of AI-generated answers meet business decision thresholds on heterogeneous enterprise systems without semantic layer governance and extensive schema curation.
— Independent analysis reporting Snowflake's 9,100+ weekly active Cortex AI accounts with 200% growth in AI workloads; 50% customer adoption of Cortex Code since November 2025 launch.
— Finance organizations using multi-turn NLQ for conversational financial modeling, dynamic scenario planning, and variance analysis—showing matured NLQ practice beyond simple Q&A.
— Production security assessment: text-to-SQL risks extend beyond SQL injection—generated SQL can leak permissions, violate access controls, or answer wrong questions; deterministic validation essential after LLM generation.
— Production evaluation framework enabling continuous monitoring without schema access—addresses critical gap: current evaluations require ground-truth queries and schemas, rarely satisfied in deployment.
— Tapestry (Coach/Kate Spade parent) deployed NLQ feedback analysis on AWS Bedrock, collecting 30,000 feedback pieces and achieving 10x faster AI application development with faster business decisions.
— ACL 2026 research aggregation: semantic layers boost accuracy 17-23 percentage points across frontier models (Opus 4.7, Sonnet 4.6, GPT-5.4); R³-SQL reaches 75% BIRD-dev execution accuracy.
— Technical architecture analysis: NL-BI requires four-layer design (intent parsing, semantic layer, SQL generation, validation). Vendor consolidation around semantic layers and deterministic validation as production requirements.
— Production deployment guide from Snowflake partner documenting three semantic layer architectures—modularity vs. accuracy vs. scalability tradeoffs in multi-tenant Cortex Analyst rollouts.