The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail
AI that generates realistic but artificial datasets for testing, training, and privacy-preserving data sharing. Includes tabular, text, and image synthetic data; distinct from data augmentation which modifies real data rather than generating from scratch.
Synthetic data generation promises to break the deadlock between data access and data privacy, but after seven years of development the practice remains experimental, with production use confined to a handful of high-governance verticals. The core idea -- generating artificial datasets that preserve the statistical properties of real data across tabular, text, and image domains -- has attracted substantial vendor investment and regulatory attention. NVIDIA's acquisition of Gretel and Microsoft's integration of synthetic data into Phi-4 training signal genuine commercial confidence. Yet independent research from EPFL and Max Planck has formalised hard limits: for many use cases the trade-off between fidelity and privacy cannot be overcome algorithmically. Vendor consolidation reinforces the caution; multiple funded startups have shut down or been acqui-hired, and surviving companies are pivoting toward platform embedding rather than standalone tools. Recent June 2026 ICML research has refined understanding of model-collapse mechanisms, proving that quality-assurance verifiers optimized for local datasets (healthcare consortia, financial institutions) paradoxically accelerate collapse when distribution coverage is incomplete—turning safeguards into systemic risks. Where synthetic data works -- fraud detection in banking, clinical trial augmentation in pharma, QA in regulated software, high-fidelity simulation training in aviation -- it works within tightly bounded conditions with careful real-synthetic mixing. Broader enterprise scaling remains blocked not by a lack of tooling but by unresolved privacy-validation standards, relational-data quality gaps, and refined understanding of model-collapse risks in siloed operational environments.
— Critical analysis of ICML 2026 research proving quality verifiers with incomplete reference distributions accelerate model collapse—fundamental flaw in widespread synthetic data pipeline practices across healthcare, finance, government.
— IEEE published three coordinated standards projects on synthetic data fidelity, quality, and pre-training assessment, signaling industry-wide standardization convergence—leading indicator of maturation from leading-edge toward mainstream.
— ICML 2026 peer-reviewed paper proving selection-bias in siloed domains (healthcare consortia, finance) causes verifiers meant to prevent collapse to actually accelerate it via power-law diversity decay.
— Dana-Farber Cancer Institute deployment generating synthetic cohorts from 19,164 metastatic breast cancer patients with <2% re-identification risk and Kaplan-Meier curves matching real data, enabling data-sharing and trial optimization.
— Large-scale empirical study (504 configurations) proving expert-validated synthetic rationale data degrades clinical prediction relative to label-only fine-tuning due to structural conflict between narrative plausibility and discriminative optimization.
— Peer-reviewed multi-dimensional evaluation framework showing good distributional fidelity does not ensure clinical validity; models with strong fidelity exhibit poor calibration and distorted relationships.
— Three-year SINTEF institutional research project generating synthetic flight data with 97% accuracy, 0.99 feature alignment, 274x trajectory improvement, validating synthetic data performance in high-fidelity operational domains.
— Technical benchmark of 7 synthetic data generators on 70,000-sample validation set with standardized metrics (fidelity, utility, privacy); shows vendor differentiation and evaluation infrastructure maturity with no single dominant generator.