The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail
AI models that forecast future values from historical time series data across demand, revenue, usage, and other metrics. Includes deep learning forecasting and automated model selection; distinct from financial forecasting which applies time series to a specific finance context.
AI-driven time series forecasting has reached the point where forward-leaning organisations extract real value from it -- but most have not yet started, and the field's central question remains unresolved. Neural and foundation model approaches (Transformers, TimeGPT, TimesFM) promise zero-shot generality across demand, revenue, and operational metrics, yet empirical evidence stubbornly shows that simpler methods -- gradient boosting, ARIMA, exponential smoothing -- match or beat them on most production workloads. The M4 Competition, repeated benchmarking studies, and practitioner case studies all converge on the same finding: model performance is task-dependent, not architecture-dependent. What makes this a leading-edge practice is not proof that deep learning wins, but that a mature vendor ecosystem, cloud-managed services, and confirmed multi-sector deployments have made automated forecasting accessible at scale. The tension that defines this tier is method selection: organisations can deploy forecasting today, but choosing when neural complexity justifies its cost over classical alternatives still requires domain expertise and empirical validation rather than default architectural commitment.
The vendor ecosystem is consolidating around foundation models even as evidence mounts against their universal superiority. AWS completed its deprecation of Amazon Forecast, retreating from specialised forecasting-as-a-service -- a significant signal from the category's largest cloud provider. Foundation model vendors filled the gap: Google released TimesFM 2.5 (March 2026) with 200M parameters and 16k context length (8x expansion), integrated into BigQuery ML and Google Sheets for consumer-grade accessibility; Amazon Chronos-2 achieved 600M+ HuggingFace downloads and added multivariate/covariate support; Salesforce released Moirai-MoE with sparse mixture-of-experts outperforming larger rivals at 28x parameter efficiency. Datadog released Toto 2.0 (May 2026), an open-weights TSFM scaling from 4M to 2.5B parameters with continuous improvement and no saturation, signaling an ecosystem pivot toward scaling-driven architectures.
Real-world deployments confirm adoption breadth: retail (The Very Group: 9.9% SKU management improvement across 8M+ forecasts, o9 Solutions with AB InBev and Kraft Heinz achieving 60% stock-out reduction and 87% forecast accuracy at 99.5% service levels), manufacturing (Foxconn: 8% accuracy gain $553K annual savings; statworx case: 10% accuracy on 20K products), energy (renewable forecasting 14% balancing cost reduction, Belgium grid operators validating Chronos-2 and TimesFM 2.5 on volatile electricity pricing), and healthcare (ICLR 2026 confirms TSFM calibration superiority for risk-sensitive deployment; GlucoFM-Bench validates zero-shot transfer on diabetes prediction yet documents domain-specific challenges in T1D cohorts). Yet May 2026 production benchmarks reveal critical limitations: ARFBench on 63 real Datadog production incidents shows current TSFMs, LLMs, and VLMs achieve only 62.7% accuracy versus 87.2% oracle performance, documenting substantial gaps in multi-step reasoning capability over production data. Infrastructure-scale deployment evidence also surfaces barriers: data architecture gaps (nShift analysis of missing returns/cancellation data), misaligned optimization metrics (Expectations vs. Realities paper: MSE-optimal point forecasts systematically produce under-dispersed distributions failing in production), and empirical electricity market studies finding foundation models underperform on volatile real-world pricing signals.
June 2026 ecosystem maturity signals: production observability tooling emerging (ForecastOps open-source for TSFM monitoring), multi-model routing frameworks (TimeRouter achieving SOTA on GIFT-Eval without LLM overhead), domain-specific TSFM validation (APEX on 4,500 wireless networks demonstrating 18% MAE improvement over generic Toto), and next-generation benchmarking (TIME benchmark with 50 fresh datasets and zero-shot data-integrity validation). These advances indicate operational readiness—but also reveal fragmentation: specialist models (APEX for networks, domain-tuned GlucoFM instances) outperform generic TSFMs within their domains, yet zero-shot universality remains unproven; Amazon's SCOT (proprietary decade-refined supply chain optimizer) outperforms Chronos on domain data but is non-transferable, suggesting that domain-specific excellence and generic deployability are still misaligned. Peer-reviewed research continues converging on the core finding: traditional accuracy metrics (MAPE, MAE) correlate poorly with economic outcomes, and method selection complexity persists as the primary adoption barrier—not which model architecture to choose, but whether forecasting teams optimize for business value and whether zero-shot generic foundation models offer genuine ROI over domain-specific fine-tuning. Portfolio approaches (Amazon Science: specialist models outperforming single monolithic TSFMs) and hybrid routing strategies (Complexity Router assigning domains to optimal model classes) are emerging as pragmatic production patterns, reducing inference cost 70% while maintaining accuracy.
— Open-source observability tool for production TSFM deployments (PyPI, Apache 2.0); validates forecasts, detects leakage, scores against baselines—signals maturity: ecosystem moving from deployment to production monitoring and observability.
— Named case studies (AB InBev, Kraft Heinz): 60% stockout reduction, 53% inventory loss decrease, 70-90% touchless planning adoption, 87% forecast accuracy with 99.5% service levels—demonstrates production adoption across Fortune 500 food/beverage sector.
— Production deployment of domain-specific TSFM on 4,500 wireless networks; APEX-Large reduces MAE 18% vs Toto and 38% vs SARIMA with F1=0.93 anomaly detection; APEX-Edge enables sub-second edge inference—demonstrates domain-specific value over generic TSFMs.
— SOTA routing framework achieving GIFT-EVAL LB MASE=0.6765 without expensive LLM controllers; demonstrates ecosystem maturity: complementary specialist TSFMs benefit from lightweight selection routing, reducing inference cost and enabling agentic systems.
— Amazon research on Chronos-2 financial forecasting: learned event impact covariates achieve 21% WAPE reduction and 78% anomaly-detection improvement—demonstrates practical enhancement technique for production financial deployments.
— Amazon Science: specialist model portfolios consistently outperform single monolithic TSFMs at scale, achieving competitive performance with significantly fewer parameters—supports operational hybrid approaches.
— Comprehensive healthcare domain benchmark: TSFMs (Chronos-2, TimesFM) show strong zero-shot transfer, but lightweight LSTM outperforms TSFMs 4-21% with full task-specific data—negative signal on zero-shot universality in specialized domains.
— AWS Solutions Architect: SCOT (proprietary supply chain optimization, decade-refined) vs Chronos (generic foundation model). SCOT excels but non-transferable; Chronos broadly deployable—argues domain-specific excellence and zero-shot generality have different value.