The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail
AI that performs initial exploration of datasets, identifying distributions, correlations, missing values, and notable patterns. Includes automated profiling reports and insight suggestion; distinct from predictive modelling which builds models rather than exploring data.
Automated exploratory data analysis (autoEDA) has solved the profiling problem but is stalled at the insight boundary. Forward-leaning organisations embed automated dataset profiling, quality checks, and visualization into production ML pipelines. The tooling—both enterprise and open-source—is mature. Yet most deployments remain narrowly focused on descriptive reporting and data quality checking. The defining tension is scope: automation excels at telling you what your data looks like but has not generalised to discovering novel, actionable insights that require domain judgment. Agentic approaches (autonomous query generation, multi-step reasoning, SQL code synthesis) are advancing in the vanguard—Meta, OpenAI, and Ramp have deployed production agentic analytics systems—but broader adoption faces two hard barriers: reliability (55% of agentic systems in real-world datasets reach unsupported conclusions) and cost control (96% of organisations deploying GenAI report unexpected cost overruns). Until these constraints are addressed, the practice will remain leading-edge but stalled.
The autoEDA ecosystem has matured into three distinct tiers: IDE-integrated, open-source profiling, and LLM-augmented enterprise platforms. PyCharm 2026.1 now embeds AI-powered automated data issue detection directly in Jupyter notebooks, signaling mainstream IDE vendor adoption. Enterprise platforms—DataRobot, H2O Driverless AI, AWS SageMaker Data Wrangler, Qlik Cloud, Decube, Google Meridian—embed profiling and quality checks directly into ML workflows and data platforms. DataRobot's two-stage EDA (schema detection, feature association, data quality) is standard in its pipeline, and peer-reviewed deployments report 90% reduction in model development time. Open-source libraries serve a parallel audience: ydata-profiling (13.4k GitHub stars, 1.57M monthly downloads), Sweetviz, DataPrep, and AutoViz let practitioners generate comprehensive profiling reports in a single line of code.
The emergent development is agentic exploratory analysis pushing past profiling into autonomous reasoning. Databricks Genie Agent Mode (April 2026, Public Preview) autonomously breaks down complex analytical questions into multi-step workflows and returns structured narrative reports. Azure Databricks Sample Data Explorer (April 2026, GA) translates natural language questions directly into SQL queries. dbt Labs reports that Meta, OpenAI, and Ramp have deployed production agentic analytics systems—Meta's internal agent scaled from weekend prototype to company-wide tool in six months with autonomous SQL writing and notebook generation. PingCAP's TiInsight achieves 86.3% accuracy on SQL generation in production. Edison Scientific's autonomous analysis of 242,000 drug sensitivity records identified biomarker relationships with statistical rigour (p = 1.7 x 10^-62). Enterprise AI analytics adoption stands at 59% (Gartner 2025, up from 33% in 2022), with agentic EDA as core component of the shift from query-driven to proactive, AI-initiated insight delivery.
These advances remain the vanguard. Agentic systems face material reliability constraints: peer-reviewed research shows 55% of agentic data science systems reach unsupported conclusions on real-world datasets, with poorly calibrated confidence. Cost overruns compound adoption barriers: 96% of organisations deploying GenAI report unexpected costs, 92% specifically for agentic workflows. The market ($4B+, 8% CAGR through 2033) reflects investment, but production deployments cluster narrowly around profiling and data quality within established platforms rather than widespread adoption of autonomous insight generation. Organisations treat automated profiling as operational capability (validation, deduplication, anomaly detection), yet scope remains bounded to descriptive analysis. Scaling agentic EDA requires solving reliability and cost visibility before advancement to good-practice tier.
— Databricks added May 2026 native data profiling to SQL Editor and Notebooks, providing automated statistical summarization (null counts, distributions, ranges) directly in query result exploration without external tools.
— Databricks Genie Chat moved to public preview (April 29) with scheduled tasks (April 30) for recurring automated exploratory prompts and weekly digest generation, representing production agentic interface for autonomous exploratory analysis.
— Telecom enterprise replaced 25 manual Excel formulas with automated feature importance analytics identifying which customer feedback topics drive NPS movement, eliminating 8 monthly hours of manual exploration and enabling proactive decision-making.
— Browser-based ReportMedic data profiler automates distribution analysis, cardinality assessment, null detection, and outlier identification with zero manual setup—demonstrating production-ready automated profiling across CSV, Excel, and enterprise data sources.
— Production automation workflow for Snowflake: Data Metric Functions (DMFs) track NULL_COUNT, ROW_COUNT, and domain-specific metrics with real-time execution via Tasks/Streams, enabling continuous profiling and schema drift detection at scale.
— Technical comparison reveals LLM-based exploratory analysis (ChatGPT) omits data profiling, cleaning documentation, and feature engineering; competing specialized tools required 3x longer but achieved 0.7% R² improvement and documented multicollinearity and interaction terms.
— dbt Labs co-founder reports agentic analytics now in production at Meta (deployed across company in 6 months), OpenAI, and Ramp, with agents autonomously writing SQL and publishing notebooks without human input.
— Peer-reviewed study shows agentic data science systems reach unsupported conclusions in 55% of real-world datasets, with poorly calibrated confidence, revealing material reliability limitations in automated exploratory analysis.
2019: AutoEDA emerges with commercial (H2O Driverless AI) and open-source (R packages, pandas-profiling) tooling. R ecosystem maturity confirmed by systematic review of 15 packages. Adoption metrics show significant downloads but focus remains on profiling and basic visualization rather than deep insight discovery.
2020: Ecosystem expands with new Python tools (Sweetviz, continued pandas-profiling adoption) and major vendor commitment (AWS SageMaker Data Wrangler GA December 2020). ACM SIGMOD paper reviews ML approaches to EDA automation. Deployment barriers emerge: tool reliability issues, integration complexity, and persistent questions about automating insight discovery beyond profiling.
2021: Competitive maturation across open-source Python libraries (AutoViz, Pandas Profiling, SweetViz, D-Tale, Dataprep) with documented performance tradeoffs. Critical academic commentary questions whether EDA/CDA distinction can survive automation. Production deployments at scale (e.g., EDF Lab preventive maintenance) show AutoML integration but reveal feature engineering limitations in open-source tools. Fundamental scope question remains: can automation extend beyond profiling to genuine insight discovery, or is the practice limited to data quality checking?
2022-H1: LLM integration emerges as new research direction (InsightPilot from HKUST and Microsoft Research) proposing to automate insight discovery via natural language prompts and production-quality insight tools. Open-source ecosystem consolidation continues: ydata-profiling secures corporate backing from YData (Feb 2022) with 50M downloads and broad enterprise adoption (FAANG, banks, insurance), expanding support for time-series and Spark workloads. Core tension persists: profiling automation is mature and widely deployed, but advancing to genuine insight discovery remains blocked by domain expertise requirement and LLM reliability concerns.
2022-H2: Commercial ecosystem expands with YData SDK launch (Nov 2022) offering automated profiling beyond open-source tools. Open-source tooling matures with continued ydata-profiling adoption but quality challenges surface (duplicate detection bugs, edge-case handling). Practitioner engagement deepens as vendors release advanced tutorials and feature expansions. AutoEDA remains bifurcated: robust profiling for data quality checking in production, but scaling challenges and LLM integration experiments are still research-stage with uncertain ROI.
2023-H1: Ecosystem consolidation continues with ydata-profiling reaching 10k GitHub stars milestone. Enterprise deployment advances: H2O Driverless AI integrates with Snowflake Snowpark for at-scale EDA without data movement. YData extends ydata-profiling with Spark support (April 2023), enabling distributed profiling. Academic validation emerges: research framework demonstrates 2x productivity gains from automated EDA in user studies. Practice remains production-focused on data quality and descriptive profiling rather than insight discovery; scope limitations persist despite wider enterprise availability.
2023-H2: No major tool releases or deployment breakthroughs documented. Ecosystem consolidation continues with ydata-profiling and H2O Driverless AI as dominant platforms; DataRobot and other vendors advance AutoML capabilities but not specifically AutoEDA. Open-source community contribution continues with incremental tool development. LLM-based insight discovery (InsightPilot prototype from early 2022) shows no evidence of production adoption or maturity by year-end. Practice remains at leading-edge with mature tooling but constrained scope—automation delivers on profiling and data quality checking but has not expanded to general-purpose insight discovery or causal inference.
2024-Q1: Observability vendors integrate LLM-powered EDA tools; Dynatrace announces Davis CoPilot for natural language data exploration in Grail. Practitioner discourse highlights persistent methodological tension: automated profiling tools improve efficiency but cannot substitute for purpose-driven, hypothesis-linked exploration. AutoEDA ecosystem remains bifurcated between mature open-source profiling libraries and AI-augmented vendor platforms, with scope limited to descriptive analysis and data quality workflows.
2024-Q2: Real-world deployments advance: Pecan AI case study demonstrates automated EDA catching data quality issues (duplicates, join errors) on 10 TB datasets; Actian launches data profiling GA in Data Platform; H2O.ai recognized as 2x Visionary in Gartner Magic Quadrants serving Fortune 500. Negative signal surfaces: critical security vulnerability (CVE-2024-37062) in ydata-profiling 4.0+ raises reliability concerns for enterprise adoption. Practical deployment optimization shows Spark profiling scalability solutions (25-minute reduction). Ecosystem maturity confirmed but adoption barriers persist around tool reliability and methodological limitations.
2024-Q3: Integration challenges emerge: practitioner reports Streamlit application crashes when using ydata-profiling, exposing tool compatibility and deployment reliability issues. Enterprise adoption patterns narrow: automated EDA remains focused on data quality checking and profiling within larger data platforms (SageMaker, Driverless AI, Snowflake integration) rather than expanding toward autonomous insight discovery. Ecosystem consolidation continues with H2O and YData as dominant platforms.
2024-Q4: LLM-powered EDA research advances with TiInsight (PingCAP production deployment) achieving 86.3% SQL accuracy; major vendors strengthen EDA capabilities (DataRobot Workbench enhancements, DagsHub RepoViz for unstructured data). R ecosystem remains mature with established packages (skimr, SmartEDA, DataExplorer). Deployment barriers persist: persistent dependency and compatibility issues (ydata-profiling in Streamlit) continue to limit production adoption. Practice remains at leading-edge—mature profiling and data quality automation widely available, but scope remains narrowly focused on descriptive analysis rather than autonomous insight discovery.
2025-Q1: Sustained ecosystem adoption confirmed: ydata-profiling maintains 1.57M monthly downloads and 13.4k GitHub stars; pandas-profiling legacy library still at 194k monthly downloads. Market data shows EDA tools market at $15 billion with 15% projected CAGR through 2033, indicating continued enterprise investment and competitive growth despite persistent tool reliability and integration challenges.
2025-Q2: Research innovation advances with QUIS system automating question generation and insight synthesis without human curation, signaling academic progress toward autonomous EDA. Real-world deployment shows continued integration into enterprise platforms (Pricefax EDA workflows for customer analysis). Tool maturity remains constrained by persistent limitations: sweetviz visualization degradation with high-cardinality datasets (100+ columns), ydata-profiling memory consumption issues with specific data patterns. Negative signals balance positive adoption, indicating practice remains at leading-edge with mature profiling capabilities but constrained scope on insight discovery and edge-case robustness.
2025-Q3: Ecosystem consolidation continues with sustained adoption: open-source EDA tools (ydata-profiling, sweetviz, Rath, great-expectations) show strong community traction via GitHub (9,469 repos tagged 'eda'). AI-augmented EDA approaches gain practitioner interest (Observable blog, vendor integration) but reliability concerns persist—LLM-based data exploration shows promise yet cannot fully substitute for domain-specific exploration. Practice remains at leading-edge: automated profiling is production-standard across enterprise platforms, but scope remains bounded to descriptive analysis and data quality checking rather than autonomous insight discovery.
2025-Q4: Sustained vendor investment in platform maturity: H2O releases v25.08.2 with enhanced Driverless AI capabilities; Google integrates automated data checks into Meridian's production MMM workflows; DataRobot achieves Gartner Peer Insights recognition with 90% reduction in model development time. Negative signal surfaces: IDC research reveals 96% of organizations deploying GenAI report unexpected cost overruns, 92% for agentic AI workflows, highlighting adoption barriers at scale. AI-augmented EDA research (QUIS, TiInsight) achieves production deployment maturity but remains specialized domain. Traditional profiling tools face persistent edge-case quality challenges (sweetviz high-cardinality visualization, ydata-profiling memory patterns, Streamlit compatibility). Practice remains at leading-edge with proven efficiency gains in model development but constrained scope on cost control, reliability, and autonomous insight discovery.
2026-Jan: LLM-powered EDA maturity advances with TiInsight arXiv preprint demonstrating production deployment at PingCAP; practitioner case study shows real-world adoption of Gemini-based automated survey coding (400 responses). Ecosystem remains focused on profiling and structured analysis; scope expansion into unstructured data exploration signals emerging capability but not yet mainstream. Leading-edge tier sustained.
2026-Feb: DataRobot and Edison Scientific document production-grade automated EDA with real-world deployments: DataRobot two-stage EDA (schema detection, quality checks, feature association) embedded in standard ML workflows; Edison Analysis autonomous EDA on 242k drug sensitivity records identifies biomarker patterns with statistical validation. Eric Ma practitioner analysis reports 5-10x speedup with AI coding agents while cautioning rigor loss—emerging best practice. ISEDA 2026 conference (Singapore, May) dedicates track to 'AI & Open Source EDA'. Market analysis confirms $4B+ market with 8% CAGR through 2033. Ecosystem bifurcation continues between mature profiling tools (ydata-profiling, Sweetviz) and AI-augmented research systems (TiInsight, Edison Analysis).
2026-Mar: Cloud-native EDA maturation confirmed: Azure Databricks GA automated data profiling with continuous metric computation across time-series, inference, and snapshot modes, enabling drift detection without manual setup. LLM-assisted production workflows emerge: DS Stream case study (Databricks) achieves 92% PII detection precision and reduces manual audit time from weeks to hours. Editorial and practitioner surveys confirm mainstream adoption of automated EDA (5-10x speedup documented across multiple tools). Agentic EDA architectures proliferate as emerging pattern for autonomous exploration. Ecosystem validation: YData Profiling tutorial updated March 2026 in Real Python; comprehensive ecosystem surveys (Analytics Insight) document mature tooling across six major libraries. Negative signals persist: GenAI-powered EDA adoption requires constant human validation per domain experts. Practice remains at leading-edge with proven cloud integration and LLM-augmented profiling, but scope remains bounded to descriptive analysis rather than autonomous insight discovery.
2026-Apr (early): IDE integration accelerates: PyCharm 2026.1 embeds AI-powered data issue detection in Jupyter notebooks, extending automated profiling beyond standalone tools into mainstream development workflows. Data catalog and governance platforms GA automated profiling as standard feature: Decube Profiler supports major data warehouses (Snowflake, Redshift, BigQuery, Databricks); Qlik Cloud enables field-level analysis without manual exploration. Enterprise AI analytics adoption reaches 59% (Gartner 2025, up from 33% in 2022), with EDA as core component. Research validates tool performance: DataPrep.EDA research demonstrates declarative interface outperforming pandas-profiling on speed and UX. Operational maturity evident: organizations embed profiling as continuous capability in pipelines (validation, deduplication, anomaly detection). Practice remains at leading-edge with broadened vendor ecosystem and IDE adoption, but scope remains bounded to descriptive analysis and data quality checking rather than autonomous insight discovery.
2026-Apr (late): Agentic analytics deployment accelerates at leading-edge orgs: Databricks Genie Agent Mode reaches Public Preview for autonomous multi-step exploratory analysis; Azure Databricks Sample Data Explorer GA with natural language-to-SQL translation. dbt Labs reports production agentic systems at Meta (scaled from prototype to company-wide in 6 months), OpenAI, and Ramp. However, reliability and cost barriers prevent broader adoption: peer-reviewed research (Sanity Checks for Agentic Data Science) documents 55% failure rate on real-world datasets with unsupported conclusions; IDC survey confirms 96% of GenAI deployments face unexpected cost overruns, 92% for agentic workflows. Practice remains at leading-edge but trend is stalled—agentic EDA advances show promise but cost control and reliability constraints block advancement to good-practice tier.
2026-May: Native profiling integrates deeper into major platforms: Databricks added May 2026 native data profiling directly in SQL Editor and Notebooks for automated statistical summarization on query results (null counts, distributions, ranges); Snowflake customers automate profiling at scale via native Data Metric Functions with schema drift detection; Genie Chat public preview (April 29) and Genie scheduled tasks (April 30) enable recurring automated exploratory prompts and weekly digest generation. Agentic EDA direction solidifies as leading-edge practice while traditional profiling becomes operational standard across platforms. Bifurcation widening: specialized purpose-built tools outperform generic LLMs on rigor and reproducibility (R² improvement, multicollinearity documentation, feature engineering), while enterprise telecoms cases show automated feature importance analytics eliminating 8 monthly hours of manual exploration. Trend remains stalled: profiling automation broadly adopted across major platforms, but barriers to reliable autonomous insight discovery persist despite agentic advancement.