The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail
AI techniques that go beyond correlation to estimate causal effects and identify which interventions drive outcomes. Includes treatment effect estimation and counterfactual analysis; distinct from predictive modelling which forecasts outcomes without inferring causation.
Causal inference and uplift modelling occupy a frustrating position: the tooling is production-ready, but production adoption remains narrow. Unlike predictive modelling, which forecasts outcomes, causal inference estimates what would happen under a specific intervention — the incremental effect of a marketing campaign, a product change, or a credit offer. Uplift modelling applies this at the individual level, identifying which customers will actually respond to treatment rather than converting regardless.
Major cloud vendors now ship causal inference as GA components, open-source libraries like DoWhy have millions of downloads, and the methodology continues to advance. Yet deployment concentrates almost entirely in e-commerce and marketing, where randomised experiment infrastructure already exists. March 2026 benchmarks confirm that 62% of modern treatment-effect models underperform trivial baselines on real-world data; robustness under structural biases remains a critical gap. Healthcare, policy, and other observational domains generate growing research interest but zero clinical or operational integration. The practice is leading-edge in the precise sense: forward-leaning teams extract real value, but most organisations have not started, and the barriers that prevent broader adoption — data volume requirements, assumption validation complexity, and tool consistency gaps — have proven resistant to the vendor investment thrown at them.
Platform and tool investments accelerated into May 2026: Microsoft Azure and Fabric GA components (2025), Alembic's Series B $171M investment in real-time Causal AI platform serving airline/CPG/finance customers, and Causalens launching enterprise GA with named deployments (asset managers, investment banks, transportation, energy). These platforms reach production maturity with enterprise adoption signals. Netflix's decade-long production causal infrastructure spanning localization, recommendations, pricing, and retention optimization demonstrates maturity at scale, though it required PhD-level teams and multi-year investment. Production deployment gains momentum: Remerge's 20+ documented uplift test case studies across mobile marketing (2023-2026) show sustained, scaled RCT-based incremental measurement delivering consistent 30-60% CPA reductions across 100+ campaigns. Cassandra.app platform now serves 100+ marketing teams with geo-based uplift testing; Gina Tricot case demonstrates consistent ROI improvement across markets. Adyen's May 2026 Uplift product (GA) reports 10% conversion lift using causal inference on trillions of payment transactions. Meta's incremental attribution adoption in DTC shows 18% incremental sales growth in geo-test cases. Multi-channel marketing attribution research demonstrates uplift modeling can identify 30% budget discrepancy in traditional attribution. These deployments concentrate in e-commerce and marketing where randomised experiment infrastructure exists; healthcare research interest intensifies but clinical workflow integration remains absent. Real-world deployment barriers persist: April 2026 Amazon Science benchmark confirms 62% of CATE models perform worse than trivial predictors on heterogeneous data; peer-reviewed evidence from major pharmaceutical companies shows single-robust ML estimators can underperform parametric regression without doubly robust methodology and sample splitting, requiring TMLE or AIPW approaches. Production causal inference systems face hidden operational risks — model upgrades can shift causal risk estimates by 0.12-0.19 points and increase confidence interval widths by 23% on protected cohorts, creating deployment instability. Methodological advancement toward practitioner accessibility accelerates: TU Delft research formalizes methods to detect assumption violations and ensure robustness; hierarchical causal models validated on 3M active users demonstrate recovery of incremental effects under treatment overlap; agentic AI frameworks now automate variable selection and graph construction, reducing expert timeline from weeks to hours. Yet these advances have not yet expanded adoption beyond core marketing/e-commerce contexts despite years of vendor investment.
The research frontier is moving toward reliability assessment and practitioner accessibility. New benchmarks at ICLR 2026 exposed critical LLM weaknesses on causal reasoning: when evaluated across Pearl's causal ladder (discovery, intervention, counterfactual), LLMs achieve 93.5% on discovery but degrade sharply to 81.9% on intervention and 73% on counterfactual reasoning, limiting autonomous causal method selection. Methodological work continues to push boundaries with theoretical advances in heterogeneous treatment effects (HTE) clarifying assumptions for mechanism testing, multi-treatment effect identification under unmeasured confounding with √n-consistent estimators, and HTE estimation for survival outcomes with clinical application. NSF-funded educational tooling (thinkCausal with stan4bart) achieves randomized validation showing superior accuracy and speed over alternative methods, advancing practitioner accessibility. Yet these advances remain concentrated in academic and research settings rather than driving operational adoption.
The core adoption barriers are well-documented and persistent: minimum data volumes of 10,000+ per treatment arm, 10-20% ATE estimate divergence between major libraries using identical configurations, and model generalization failure across campaigns. Healthcare has generated over 4,300 clinical publications referencing causal methods, yet systematic reviews find zero integration into clinical workflows. Analyst surveys project enterprise interest — 62% planning a shift toward causal decision intelligence within 18 months — but the gap between stated intent and operational deployment defines this practice's stalled trajectory.
— Agentic AI framework automates causal variable selection and graph construction; reduces expert timeline from weeks to hours—expanding practitioner accessibility.
— Platform serves 100+ marketing teams with geo-based uplift testing; Gina Tricot case demonstrates consistent ROI improvement across markets.
— ICLR 2025 Amazon/UCLA benchmark: 62% of contemporary CATE models underperform trivial baseline on real-world heterogeneous data—critical reliability limitation.
— Adyen Uplift GA product reports 10% conversion lift using causal inference on trillions of payment transactions; independent Nord Security customer validation.
— Methodological advance validated on ~3M active users; demonstrates recovery of incremental effects under treatment overlap—addresses real-world multi-channel complexity.
— Meta's incremental attribution adoption in DTC segment; geo-test case study showed 18% incremental sales growth (NY +28% vs CA +10% baseline).
— TU Delft dissertation formalizes methods to detect assumption violations and ensure robustness in causal inference—advancing practitioner safety and reliability.
— Theoretical framework clarifying assumptions for using HTEs to test causal mechanisms; reveals theory-practice gap in HTE interpretation foundational to uplift modeling.