The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail
AI that forecasts resource demand and automatically scales infrastructure ahead of load, rather than reactively. Includes predictive scaling based on traffic patterns and business events; distinct from reactive autoscaling which responds to current metrics only.
Predictive autoscaling is a proven, mature infrastructure practice available in GA from every major cloud provider and deeply integrated into the Kubernetes ecosystem. Rather than reacting to CPU or memory spikes, it forecasts demand from historical patterns and provisions capacity ahead of load. CNCF reports 74% enterprise adoption; documented deployments show 22-70% cost reductions and consistent 99.99% availability. Market analysis projects $6.47B by 2030 (24.4% CAGR) driven by AI-powered predictive analytics. Vendor innovation in May 2026 continues: Google Cloud released intent-based autoscaling for GKE with 5x faster reaction time (25s→5s) and native custom metrics eliminating external monitoring dependencies; AWS confirmed ongoing platform investment; Cast AI and Zesty expanded coordinated HPA/VPA optimization at production scale with named customer results (40% cluster optimization, 65% infrastructure reduction). The practice has cleared the "does it work" threshold; the question now is deployment reliability across layered systems. Capacity limits are the primary operational bottleneck preventing AI scaling: Datadog analysis of thousands of production systems shows 60% of AI request failures directly due to capacity constraints. Operationally, simple single-service scenarios remain reliable and straightforward. Multi-tier architectures expose harder problems: scaling the wrong bottleneck, thrashing from misconfigured thresholds, forecast blindness to business events, and GPU cold-start delays (30-120s) that render reactive HPA inadequate for LLM serving. Accurate prediction demands stable traffic history; anomalous spikes and unforecastable time series patterns still require reactive fallbacks and forecastability testing as prerequisites. The practice is table-stakes, but operational discipline separates teams that capture value from those that create new failure modes.
Vendor maturity and AI-specific acceleration (May 2026). Google Cloud launched intent-based autoscaling for GKE with native custom metrics eliminating external monitoring stacks and 5x faster reaction time (25s→5s). AWS maintains predictive scaling across EC2, ECS, and Auto Scaling Groups with API-level documentation confirming April 2026 product status. Azure continues KEDA integration into AKS, though VMSS operational reliability gaps persist in production. Cloud-native ecosystem has consolidated around KEDA (CNCF Graduated) for event-driven and GPU inference workloads, with production deployments at Alibaba, Grab, Calendly, Blizzard, and Grafana. Cast AI and Thoras now offer ML-powered predictive workload scaling as GA products. Specialized vendors (StormForge, Baseten, Kedify) focus on optimizing predictive autoscaling for AI inference, addressing the distinct challenge of GPU cold-start delays (5-8 minutes for 70B models).
AI-specific capacity constraints drive adoption. Industry-wide utilization analysis (Cast AI, April 2026) across tens of thousands of Kubernetes clusters reveals structural underutilization: GPU 5%, CPU 8%, memory 20%—indicating that predictive autoscaling capability exists but deployment challenges prevent efficient utilization. Datadog analysis of production AI systems confirms capacity limits as the primary bottleneck: 60% of AI request failures are directly caused by capacity constraints, establishing this practice as essential infrastructure for AI scaling. Practitioner benchmarks show concrete wins: KEDA queue-depth scaling for vLLM achieved 40% GPU spend reduction and 60% p99 latency improvement; Simplismart.ai deployed warm pools for inference scale-up achieving 60-70 seconds vs 5-6 minutes previously; Grab deployed ML predictive autoscaling for Kafka consumers reducing cost 55% while increasing utilization from 15% to 57%.
Deployment complexity and operational prerequisites remain barriers. Sedai CTO analysis documents reactive autoscaling timing lag (2-4 minutes) and production costs of feedback-loop engineering. Practitioner assessments flag a critical structural barrier: many teams build forecasting models on inherently unforecastable time series; diagnostic forecastability testing is now recommended as a pre-deployment prerequisite to avoid failed optimization investments. GPU cold-start mechanics are now well-documented: container image pull (4-6 minutes) dominates, while weight loading and CUDA graph capture are secondary; this establishes why predictive pre-scaling is minimum viable for LLM serving. Market research projects capacity management reaching $6.47B by 2030 (24.4% CAGR) driven by AI-powered predictive analytics and automation. ROI is strong where deployment is straightforward (70% cost reductions, 99.99% availability in documented case studies), but operational discipline is required to avoid wrong-bottleneck scaling, threshold oscillation, and over-provisioning in multi-tier orchestration.
— Market research: capacity management market projected $6.47B by 2030 (24.4% CAGR), driven by AI-powered predictive analytics, cloud deployment, and automation adoption; signals widespread enterprise adoption.
— Baseten AI inference platform autoscaling uses concurrency-target with asymmetric scale-up/down behavior; demonstrates modern AI workload autoscaling practice including scale-to-zero.
— Practitioner benchmark: KEDA queue-depth scaling for vLLM achieved 40% GPU spend reduction and 60% p99 latency improvement by scaling on inference queue depth instead of CPU metrics.
— Cast AI ML-powered predictive workload scaling forecasts future resource needs from historical patterns, moving beyond reactive scaling; represents ecosystem adoption of ML-based predictive autoscaling.
— Tampere University master's thesis: ARIMA predictive autoscaling forecasts CPU utilization 45s ahead, successfully scaled 1→8 replicas during demand spike, validating computational lightness of classical forecasting.
— Cast AI AI Enabler for vLLM autoscaling: replica-based scaling, intelligent hibernation for zero-cost idle periods, SaaS fallback routing; addresses AI-specific capacity management challenges.
— Kedify maintainer at DevOpsCon 2026: practical KEDA strategies for AI/LLM workload autoscaling and real-time traffic handling; reflects emergence of AI-specific autoscaling as distinct practitioner challenge.
— Sedai customer outcomes: typical 30%+ cost reduction through application-aware intelligent autoscaling; adoption metric showing commercial viability of predictive capacity optimization platforms.