The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail
AI that generates realistic but artificial datasets for testing, training, and privacy-preserving data sharing. Includes tabular, text, and image synthetic data; distinct from data augmentation which modifies real data rather than generating from scratch.
Synthetic data generation promises to break the deadlock between data access and data privacy, but after seven years of development the practice remains experimental, with production use confined to a handful of high-governance verticals. The core idea -- generating artificial datasets that preserve the statistical properties of real data across tabular, text, and image domains -- has attracted substantial vendor investment and regulatory attention. NVIDIA's acquisition of Gretel and Microsoft's integration of synthetic data into Phi-4 training signal genuine commercial confidence. Yet independent research from EPFL and Max Planck has formalised hard limits: for many use cases the trade-off between fidelity and privacy cannot be overcome algorithmically. Vendor consolidation reinforces the caution; multiple funded startups have shut down or been acqui-hired, and surviving companies are pivoting toward platform embedding rather than standalone tools. Where synthetic data works -- fraud detection in banking, clinical trial augmentation in pharma, QA in regulated software -- it works within tightly bounded conditions. Broader enterprise scaling remains blocked not by a lack of tooling but by unresolved privacy-validation standards, relational-data quality gaps, and a growing model-collapse risk that turns synthetic convenience into a systemic liability.
— UK Financial Conduct Authority multi-stakeholder project deploying fully synthetic datasets with money laundering typologies for AML testing and innovation compliance.
— Authoritative EU regulatory framework defining synthetic data techniques, use cases, governance requirements, and privacy/fairness implications from data protection authority.
— FDA framework and regulatory guidance on synthetic data and digital twin acceptance in medical device validation with deployment pathways by domain.
— CHIMERA framework demonstrating high-quality synthetic data (9K samples) outperforms larger models on reasoning tasks via data-centric design; validates quality-over-scale paradigm.
— Google framework achieving independent control over quality, diversity, and complexity in synthetic data; signals major platform vendor confidence in practice maturity.
— Healthcare synthetic data market growth from $658M (2025) to $5.88B (2033) at 31.5% CAGR; adoption in clinical trials, AI training, and privacy-preserving analytics.
— Qualitest + Synthesized platform deployment at multi-billion-dollar insurer: 60% faster test data production, 28M+ rows secured, 100% referential integrity, zero security waivers; full enterprise production deployment.
— Named practitioner panel (Laurion Capital, T. Rowe Price, Jupiter Research Capital) on synthetic data in quantitative finance; documents operational deployment with clear scope limitations and ontology bias framing.