Perly Consulting │ Beck Eco

The State of Play

A living index of AI adoption across industries — where established practice meets the bleeding edge
UPDATED DAILY

The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.

The Daily Dispatch

A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.

Pick a role above to explore practices

BLEEDING EDGE

⌨️ SOFTWARE ENGINEERING
✍️ CONTENT & MARKETING
🔬 RESEARCH & KNOWLEDGE
⚖️ LEGAL, COMPLIANCE & RISK
🎧 CUSTOMER OPERATIONS
🏛️ AI GOVERNANCE & SAFETY
📊 DATA & ANALYTICS
🛡️ IT OPERATIONS & SECURITY
🎯 PRODUCT & DESIGN
💼 SALES & REVENUE
🎬 CREATIVE & GENERATIVE MEDIA
👁️ COMPUTER VISION & SENSING
💹 FINANCE & ACCOUNTING
🔄 OPERATIONS & PROCESS AUTOMATION
🚗 AUTONOMOUS SYSTEMS & VEHICLES
🦾 PHYSICAL AI & ROBOTICS
🎓 EDUCATION & LEARNING
PERSONAL EFFECTIVENESS

LEADING EDGE

⌨️ SOFTWARE ENGINEERING
✍️ CONTENT & MARKETING
🔬 RESEARCH & KNOWLEDGE
⚖️ LEGAL, COMPLIANCE & RISK
🎧 CUSTOMER OPERATIONS
🏛️ AI GOVERNANCE & SAFETY
📊 DATA & ANALYTICS
🛡️ IT OPERATIONS & SECURITY
🎯 PRODUCT & DESIGN
💼 SALES & REVENUE
🎬 CREATIVE & GENERATIVE MEDIA
👁️ COMPUTER VISION & SENSING
💹 FINANCE & ACCOUNTING
🔄 OPERATIONS & PROCESS AUTOMATION
👥 PEOPLE & TALENT
🚗 AUTONOMOUS SYSTEMS & VEHICLES
🦾 PHYSICAL AI & ROBOTICS
🎓 EDUCATION & LEARNING
PERSONAL EFFECTIVENESS

GOOD PRACTICE

⌨️ SOFTWARE ENGINEERING
✍️ CONTENT & MARKETING
🔬 RESEARCH & KNOWLEDGE
⚖️ LEGAL, COMPLIANCE & RISK
🎧 CUSTOMER OPERATIONS
🏛️ AI GOVERNANCE & SAFETY
📊 DATA & ANALYTICS
🛡️ IT OPERATIONS & SECURITY
🎯 PRODUCT & DESIGN
💼 SALES & REVENUE
🎬 CREATIVE & GENERATIVE MEDIA
👁️ COMPUTER VISION & SENSING
💹 FINANCE & ACCOUNTING
🔄 OPERATIONS & PROCESS AUTOMATION
👥 PEOPLE & TALENT
🚗 AUTONOMOUS SYSTEMS & VEHICLES
🦾 PHYSICAL AI & ROBOTICS
🎓 EDUCATION & LEARNING
PERSONAL EFFECTIVENESS

ESTABLISHED

⌨️ SOFTWARE ENGINEERING
✍️ CONTENT & MARKETING
🛡️ IT OPERATIONS & SECURITY
🎯 PRODUCT & DESIGN
💹 FINANCE & ACCOUNTING
👥 PEOPLE & TALENT

🎯 Product & Design

AI applied from user research through to shipped product experience. Wide maturity spread: A/B testing and analytics are established, prototyping and design systems are good practice, but nearly half the domain is bleeding-edge — generative UI, autonomous UX research, and AI-native product frameworks are experimental. Most practices are stalled, with more energy in tooling announcements than production adoption.

13 practices: 2 established, 4 good practice, 4 leading edge, 3 bleeding edge

Where AI Stands in Product & Design

Product and design is the domain where AI has most visibly outrun the organisations meant to wield it. The tooling is no longer the question. Across the board — from A/B testing and behavioural analytics, which are now established infrastructure, through to generative design systems and autonomous product analytics at the bleeding edge — vendors ship capabilities that were research-grade three years ago. Figma's Model Context Protocol reached general availability; Anthropic's Claude Design hit a million users in its first week; Amplitude, Mixpanel and Google Analytics all ship autonomous agents that generate hypotheses and investigate anomalies from raw usage data. What stalls, repeatedly and predictably, is the organisation. The binding constraint has migrated from model capability to governance infrastructure, specification discipline and the unglamorous work of acting on insight rather than merely surfacing it. The MIT-cited figure that haunts every practice in this domain — roughly 95% of organisations realising zero measurable return from generative AI investment — is not a statement about the tools. It is a statement about everyone else.

This produces a domain defined by bifurcation rather than uniform progress. A small vanguard — Spotify, Meta, Uber, Netflix, Stripe, a clutch of well-governed enterprises — extracts documented, compounding value: Uber's design-system enforcement delivers 3x faster development and 4x fewer visual parity issues; Spotify reframes twenty years of behavioural data as a "Large Taste Model" moat; Shopify's experiment portfolio compounds $2.3M of monthly revenue lift. Meanwhile the majority sit in what one practice file aptly calls pilot purgatory — armed with the same tools, blocked by data fragmentation, weak specification habits and the absence of the spec files, closed token sets and audit scripts that make AI output trustworthy. The gap between the two cohorts is widening, and almost nothing in this scan suggests it is closing. The most repeated observation across the domain is some variant of "the technology works; the organisation does not."

What distinguishes Product & Design from adjacent domains is the sharpness of the quality paradox now visible in the data. Adoption is not the problem — designer use of AI is near-universal (91% weekly), half of designers now ship AI-generated code to production, and 88% of product teams use AI for research synthesis. The problem is that adoption and quality have decoupled. AI raises throughput while raising defects faster: a 22,000-developer study found task completion up 33.7% but incidents per pull request up 242.7%. AI-generated UI is accessible by default only 66% of the time. Over half of LLM-personalised responses are no better than a generic baseline. The domain has comprehensively solved "can the tool produce something fast" and comprehensively failed to solve "can you trust what it produced" — and the second question is now where all the cost lives.

What's New, 2026-06-06 to 2026-06-20

This fortnight reinforced existing positions rather than moving them — every practice held its maturity level and its stalled or plateaued trend. The signal is consolidation, not breakthrough. The clearest movement was in design-system infrastructure: Google Labs open-sourced the DESIGN.md specification (a machine-readable standard for design-system rules that lets AI generate deterministically), Claude Design reached general availability with import, locking and bidirectional code sync, and Apple's Xcode 27 made Figma's connector its first native design-tool integration. Together these mark design systems crystallising as the control layer for AI generation across vendors — what practitioners increasingly call the "API that lets AI build your product safely."

Against that infrastructure progress, the quality evidence sharpened uncomfortably. New data quantified the throughput-quality inversion in design-to-code (incidents per pull request up 242.7%; senior engineers losing up to a third of their week triaging AI failures); OverlayQA's audit of 276 production sites found 94% have design tokens but only 3.2% achieve genuine compliance; and accessibility re-emerged as an unresolved production blocker, with Figma's own design-to-web beta shipping demo content carrying 210-plus WCAG violations. Several fresh enterprise studies converged on the same structural finding — that governance maturity, not model intelligence, is the deployment ceiling, with only around 11% of claimed agent projects reaching production. The fortnight's net message: the vanguard's infrastructure got better while the evidence against careless adoption got harder to ignore.

Key Tensions

  • The vanguard pulls away while the majority stay stuck — and the dividing line is governance, not tooling. Identical tools produce 18% higher margins and 16x more deployed agents for organisations with designed-in enforcement (IBM's 2,000-executive study), and zero return for everyone else. Uber, Currents and a Korean fintech (35% cycle reduction, $250K annual savings) show what governance-first looks like; the OverlayQA audit showing 97% component drift across production sites shows what its absence looks like. Pilot-to-production conversion has held flat at roughly 10% for months.

  • Adoption and quality have decoupled, and quality is where the cost now sits. Designer confidence is soaring (89% report faster workflows, 91% better designs) while production reality diverges hard: 78% of enterprises adopting AI code report production incidents and a 1.7x defect multiplier. The same inversion recurs everywhere — AI usability research generates 64% false alarms; 47% of enterprise AI users have made major decisions on hallucinated research synthesis; synthetic research panels produce false conclusions 60% of the time. Review capacity, not generation capacity, is the new bottleneck.

  • Specification discipline has become the rate-limiter across every generative practice. As the cost of generating code and copy collapses, the bottleneck shifts upstream to requirements. Anthropic now has Claude authoring 80%+ of its own merged code — but only because spec writing became the primary deliverable. The same pattern holds for PRDs (43% production failure rates without requirements discipline), design systems (spec files mandatory beyond tokens) and UX copy (governance scaffolding separates the teams seeing returns from the large majority that do not). Vague intent is now expensive precisely because rework feels free.

  • Regulation is hardening into a forcing function — unevenly across the domain. Accessibility is furthest along: more than 5,100 ADA lawsuits in 2025 (up 20%), live DOJ Title II deadlines, and a legal record establishing that overlays provide zero protection while documented source-code remediation moots claims. Personalisation faces tightening pressure too — a Texas probe of Spotify's pay-for-play discovery, EU Digital Services Act transparency mandates, 40-plus algorithmic-pricing bills across 24 states. Session replay draws CIPA litigation (800-plus claims in 2025). Where regulation bites, adoption is increasingly driven by liability rather than ROI.

  • Hallucination is increasingly understood as structural, not a quality bug to be patched. The domain has largely stopped waiting for the next model to fix accuracy. Stanford's benchmark of 26 foundation models shows hallucination rates spanning 22–94%; enterprise text-to-SQL accuracy collapses to 17% on real systems versus 85–90% on benchmarks; RAG systems hold 78% consistency in production versus 95% in the lab. The consensus response — deterministic architecture, retrieval grounding, eval sets, human review gates — is an admission that the binding constraint is data and process infrastructure, not intelligence, and that better models alone will not clear it.

Top 10 Evidence Items

  1. Faros study finds AI coding throughput rose while bugs and incidents rose faster (adoption-metric) — The single sharpest quantification of the throughput-quality inversion: task completion up 33.7% in a 22,000-developer study, but incidents per pull request up 242.7%, making this the empirical spine of the summary's central claim that adoption and quality have decoupled. https://vibegraveyard.ai/story/faros-ai-acceleration-whiplash-study/

  2. Senior engineers are spending their week cleaning up AI-generated code (adoption-metric) — Puts a human-cost figure on the defect multiplier: senior engineers losing up to a third of their week to triage, reframing the throughput gains as a cost transfer rather than a net saving. https://www.helpnetsecurity.com/2026/06/15/ai-generated-code-review-issues/

  3. We Audited 276 Agency Websites. 8 Used Their Own Design System. (adoption-metric) — OverlayQA's audit finding that 94% of production sites have design tokens but only 3.2% achieve genuine compliance is the hardest data point on the gap between documented intent and enforced reality — the "97% component drift" figure the summary cites. https://overlayqa.com/blog/agency-design-system-study/

  4. Google Stitch Open-Sources DESIGN.md: The Spec That Makes AI Agents Consistent With Your Brand (product-ga) — The most concrete infrastructure move of the fortnight: a machine-readable spec standard that lets AI generate deterministically from design-system rules, directly instantiating the summary's claim that design systems are crystallising as the control layer for AI generation. https://pasqualepillitteri.it/en/news/1251/google-stitch-design-md-open-source-spec-2026

  5. Building a design system specced for engineers and agents (Currents/Evil Martians) (case-study) — A practitioner account of what governance-first looks like in the vanguard: spec files, closed token sets, and audit scripts treated as mandatory infrastructure rather than optional documentation — the pattern the summary says separates returners from the stuck majority. https://evilmartians.com/chronicles/building-a-design-system-specced-for-engineers-and-agents

  6. Is AI-generated UI accessible by default? (research-paper) — Derek Featherstone's experiment quantifying that AI-generated interfaces are accessible only 66% of the time by default is the accessibility-specific instance of the quality paradox, and the evidence behind the summary's claim that careless adoption raises defects faster than throughput. https://feather.ca/experiments/ai-ui-accessibility-baseline/

  7. Q1 2026 ADA Lawsuit Report — 1,037 Cases (adoption-metric) — 1,037 ADA digital-accessibility cases in a single quarter demonstrates that regulation has hardened into a genuine forcing function, not a distant threat, which is why the summary treats accessibility enforcement as the domain where liability now drives adoption more than ROI does. https://www.ecomback.com/ada-website-lawsuits-recap-report/q1-2026

  8. AI Found 11 Usability Problems Humans 'Missed.' 10 Were Wrong. (opinion) — A practitioner's controlled test finding a 91% false-alarm rate from AI usability analysis, anchoring the summary's claim that AI research synthesis generates 64% false alarms and that half of enterprise users have made major decisions on hallucinated findings. https://www.heykaleb.com/musings

  9. The Synthetic Data Trap: Is Your AI-Driven Strategy Based on a Lie (adoption-metric) — Examines how synthetic research panels produce false conclusions at scale, giving structural grounding to the summary's observation that the domain has solved "can the tool produce something fast" while comprehensively failing to solve "can you trust what it produced." https://briefglance.com/articles/the-synthetic-data-trap-is-your-ai-driven-strategy-based-on-a-lie

  10. The new bottleneck (opinion) — Stack Overflow's analysis that specification discipline — not model capability — is now the rate-limiter for AI-assisted development directly supports the summary's third key tension, and its timing (June 2026) marks the moment this observation crossed from practitioner folklore into mainstream engineering discourse. https://stackoverflow.blog/2026/06/18/the-new-bottleneck/