The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
AI applied from user research through to shipped product experience. Wide maturity spread: A/B testing and analytics are established, prototyping and design systems are good practice, but nearly half the domain is bleeding-edge — generative UI, autonomous UX research, and AI-native product frameworks are experimental. Most practices are stalled, with more energy in tooling announcements than production adoption.
Product and design has entered the accountability phase. The domain's thirteen practices span a maturity arc from established infrastructure (A/B testing, session replay) through broadly deployable capability (accessibility auditing, competitive intelligence, user research synthesis, journey mapping) to areas where only disciplined organizations extract reliable value (design systems, personalization, requirements generation, UX copy) and practices that remain fragile in production (wireframe generation, product analytics interpretation, feature prioritization). The defining dynamic is no longer adoption -- designer AI usage sits above 70%, product manager weekly usage at 73%, and platform vendors from Adobe to Amplitude are shipping AI as core product rather than add-on. The defining dynamic is whether organizations can convert that adoption into business outcomes without compounding the risks that come with premature automation.
The evidence this cycle makes the accountability case concrete. Adobe's Firefly ARR crossed $250M with 45% quarter-on-quarter generative credit growth, proving that production-scale AI in creative tools generates real revenue. McKinsey reports clients earning $3 back for every $1 spent on AI -- but only when they focus on three or fewer domains and invest in measurement infrastructure first. The contrast is Terminal X's synthesis of twelve research reports showing 95% of AI pilots deliver zero P&L impact. Spotify operates 10,000+ experiments per year at 761 million monthly active users with sophisticated algorithmic controls including user-steerable personalization features. Lovable crossed $400M ARR in the prompt-to-product category. These are not pilot-stage numbers. They are the revenue and operational metrics of organizations that have built the governance, data, and process infrastructure to make AI productive. The gap between these leaders and the majority -- where 43% of AI-generated code fails in production, 47% of enterprise users have acted on hallucinated content, and only 5% of organizations achieve substantial AI ROI -- is not closing. It is hardening into a structural divide.
One tier change occurred this cycle: design system generation and enforcement advanced from the earliest production stage to a more proven position, driven by evidence that constraint-first architectures work. A B2B SaaS case study documented roughly 10x feature throughput using JSON-encoded component metadata (purpose, variants, anti-patterns, tokens), and Salesforce's enterprise AI design deployment shifted its success metric from speed to verification cost management. The practice crossed the threshold where structured, machine-readable design systems -- not just design tokens, but executable contracts for AI agents -- reliably improve output quality. But production failure rates (hallucination at 2-8%, agent loops at 12%) underscore that the advancement is conditional on organizational investment in system documentation, not on tool capability alone.
The most consequential development was the sharpening of the ROI accountability narrative across the entire domain. Three independent analyses (Terminal X across five sectors, ViviScape surveying 1,250 companies, KPMG's executive research) converged on the same finding: only 5% of organizations achieve substantial AI ROI, with 65% reporting difficulty scaling use cases (double the rate of a year ago) and 62% citing skills gaps as the primary barrier. This is no longer a technology adoption story. It is an organizational execution story, and the measurement priorities are shifting accordingly -- productivity gains as a metric declined from 23.8% to 18.0% importance, while financial impact rose to 21.7%.
The Figma-Anthropic competitive dynamic continued to develop. Claude Design's design-system-aware prototype generation from text kept Figma's stock under pressure (now down 55% YTD from IPO highs), and Anthropic's Claude Code reached $2.5B ARR, demonstrating that AI model providers can profitably move up the application stack. But practitioner assessments provided a corrective: Figma's AI features are uneven in production, with the MCP Server and Replace Content delivering genuine value while Make an Image and Interactions miss. The design-to-code category is bifurcating into two distinct markets -- design-to-code (requiring developer expertise and component discipline) and prompt-to-product (no developer needed) -- and the latter carries documented security risks, as Lovable's Broken Object Level Authorization vulnerability exposed over a million projects' source code and credentials.
Regulatory enforcement became tangible. The European Accessibility Act entered its enforcement phase, with six EU member states issuing fines between 5,000 and 40,000 euros to non-compliant organizations. Combined with the ADA Title II deadline extensions to 2027-2028 and California's CIPA litigation targeting session replay tools (projected at 3,500+ filings annually), the compliance cost of AI-generated content and behavioral tracking has shifted from theoretical to operational. Accessibility failure rates on the web's top million sites rose to 95.9% -- the first increase in six years -- with practitioners documenting that AI-assisted code generation is a contributing factor.
Across analytics and research practices, the pattern of tool maturation outpacing organizational readiness held firm. Amplitude shipped four autonomous analytics products with Q1 ARR at $374M, but a Sisense survey of 267 product leaders found teams spend 40% of their time validating AI insights before acting on them. User research platforms now run 68% of AI interviews in production (up from 31%), compressing synthesis time from 11 days to 4 hours, but 66% of employees trust AI outputs without verification -- sustaining hallucination risk even as throughput scales. Klarna's public reversal on AI copy (from "saved $10M/year" to "too much efficiency focus damaged quality") provided the highest-profile evidence yet that throughput without quality governance damages brand value.
The 5% problem is structural, not transitional. Three independent research syntheses this cycle converged on the same number: 5% of organizations achieve substantial AI ROI. KPMG found 65% report difficulty scaling AI use cases, double the prior year. The successful minority -- McKinsey's cohort earning $3 per $1 invested -- share a common profile: narrow focus (three or fewer domains), measurement infrastructure built before deployment, and dedicated governance teams. This is not a maturity curve that the majority will naturally ascend. It is a capability threshold that most organizations lack the infrastructure, talent, and process discipline to cross. The measurement shift from productivity metrics to financial impact signals that boards are beginning to recognize the distinction.
Platform disintermediation is real but messier than the stock market suggests. Figma's 55% YTD stock decline from IPO highs reflects investor conviction that AI-native generation will bypass traditional design tools. Claude Design, v0 (4M users), and Lovable ($400M ARR) are producing complete interfaces from natural language. But practitioner evidence complicates the narrative: Figma's MCP Server is emerging as the most valuable AI integration point, positioning design systems as the governance layer that generation tools depend on. The market is bifurcating into design-to-code (where developer expertise and component discipline determine quality) and prompt-to-product (where speed matters more than maintainability). Lovable's security breach -- exposing a million projects' source code -- illustrates the risk profile of the latter. The outcome depends less on which tool generates faster and more on whether organizations build the system infrastructure that makes any generation tool's output production-safe.
AI is making accessibility worse, not better. The most counterintuitive finding in this domain: WCAG failure rates on the web's top million sites rose to 95.9% in 2026, up from 94.8% -- the first increase in six years. Practitioners attribute the reversal partly to AI-assisted code generation producing semantically incorrect markup at scale. AI auditing tools now achieve 94.5% precision on WCAG violations, but automated tools catch only 25-30% of issues, and the European Accessibility Act's enforcement phase (fines issued, compliance mandatory) makes the gap between detection capability and remediation execution a direct financial risk. Only 12% of frontend engineers have screen reader experience. The detection-remediation gap is not a technology problem; it is a workforce capability problem that AI generation is actively exacerbating.
Throughput gains are real but create new failure modes. User research synthesis compressed from 11 days to 4 hours. AI-assisted teams run 4.7x more experiments per quarter. Design-to-code tools save 30-60 minutes per scaffold. These productivity numbers are genuine and reproducible. But they come with documented costs: DoorDash found 4.3% accuracy drops when A/B testing AI systems in production due to stochastic output variation; Kameleoon data shows 84% of marketers test monthly but only 33.5% achieve statistical significance; and OneStream's survey found organizations scaling to 10+ AI tools are 4x more likely to act on demonstrably bad data. Speed without verification infrastructure does not produce better decisions. It produces more decisions, faster, with higher variance in quality.
The governance-as-product thesis is being tested. Adobe, Figma, and Amplitude are all converging on the same strategic bet: that governance infrastructure -- brand enforcement, design system contracts, evaluation suites, verification workflows -- is the product, not the generation capability. Adobe's Firefly is explicitly positioned for "approval-ready" workflows rather than creative exploration. Figma's MCP Server makes design systems the API for AI agents. Amplitude's insight that "evals become the new PRDs" redefines requirements as executable verification. Klarna's reversal validates the thesis from the failure side. The question is whether organizations will pay for governance tooling or continue defaulting to the lowest-friction generation tool -- and whether governance vendors can demonstrate ROI before the 95% pilot-failure rate calcifies into institutional skepticism about AI investment altogether.
AI content marketing in 2026: Why 94% of teams fail (case study) — The Klarna reversal — from "$10M/year saved" to "quality decline forced us back" — is the single clearest production evidence that throughput without voice governance damages brand value, directly validating the summary's governance-as-product thesis from the failure side. https://blog.yourtenet.com/ai-content-marketing/
Accessibility Industry Update: April 2026 (industry report) — The first increase in WCAG failure rates in six years (to 95.9%) is causally linked here to AI-assisted code generation, making this the primary evidence for the counterintuitive "AI is making accessibility worse" tension in the summary. https://www.qualitylogic.com/knowledge-center/accessibility-industry-update-april-2026/
The European Accessibility Act Is Live — Developer Guide (news coverage) — Documents the transition from theoretical to operational compliance risk: real fines (€5k–€40k) issued to named SMEs, with automated tools catching only ~30% of issues — the enforcement gap that makes the WebAIM regression a direct financial exposure. https://dev.to/toolkitonline/the-european-accessibility-act-is-live-developer-guide-1pe6
April 2026: When the AI Labs Stopped Renting Shelf Space (opinion) — The most direct practitioner account of the Lovable BOLA security breach, Figma's 55% YTD stock decline, and the design-to-code vs. prompt-to-product market bifurcation — three of the summary's headline competitive dynamics in a single piece. https://eidosdesign.substack.com/p/april-2026-when-the-ai-labs-stopped
Amplitude Announces First Quarter 2026 Financial Results (product GA) — The simultaneous GA of four autonomous analytics products (Global Agent, Specialized Agents, AI Assistant, Agent Analytics) at $374M ARR is the concrete market signal that product analytics interpretation has moved from experiment to core vendor product. https://amplitude.com/press/first-quarter-2026-financial-results
The state of analytics 2026: Operationalizing AI insights and shifting to embedded analytics (industry report) — The 40%-of-time-validating-AI-insights finding from 267 product leaders quantifies the verification cost the summary identifies as the binding constraint on analytics AI scaling — adoption is real, operationalization is not. https://www.sisense.com/reports/state-of-analytics-2026/
Agentic Design System: How to Build a Component Library AI Agents Can Actually Use (case study) — The B2B SaaS deployment achieving ~10x feature throughput via JSON-encoded component metadata is the primary production evidence for the design system tier advancement described in the summary — constraint-first architecture delivering measurable output. https://designproject.io/blog/agentic-design-system/
Companies Are Scaling AI on Data They Don't Trust, New Study Finds (adoption metric) — The finding that organizations scaling to 10+ AI tools are 4x more likely to act on demonstrably bad data makes this the sharpest illustration of why the summary's "throughput creates new failure modes" tension intensifies rather than resolves as adoption grows. https://lagrangeceo.com/news/2026/05/companies-are-scaling-ai-data-they-dont-trust-new-study-finds/
AI Evals vs. A/B Testing: Why You Need Both to Ship GenAI (opinion) — The DoorDash case study — strong test performance collapsing to a 4.3% accuracy drop in production due to stochastic output variation — is the cleanest single-example proof that AI systems require fundamentally different testing infrastructure than deterministic software. https://www.growthbook.io/blog/ai-evals-vs-a-b-testing-why-you-need-both-to-ship-genai
The State of AI Customer Interviews: 2026 Mid-Year Update (adoption metric) — 68% of enterprises running AI interviews in production (up from 31%), synthesis compressed from 11 days to 4 hours, while 66% of employees trust outputs without verification — captures in a single source both the productivity gain and the hallucination-risk coexistence the summary identifies as the domain's defining paradox. https://getperspective.ai/blog/the-state-of-ai-customer-interviews-2026-mid-year-update