The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
AI for individual productivity, communication, organisation, and self-directed learning. The most polarised domain: writing assistance and meeting summarisation are good practice, but nearly half the practices are bleeding-edge — personal AI agents, life planning, and autonomous scheduling lack reliable implementations. Most trajectories are stalled, reflecting a gap between consumer hype and sustained daily utility.
This is the domain where almost everyone has adopted AI and almost no one can prove it paid off. Writing assistance, email drafting, brainstorming, scheduling, research summarisation, spreadsheet work — the consumer-facing tools that promised to give knowledge workers their hours back — are now ubiquitous. Microsoft 365 Copilot reaches 90%-plus of the Fortune 500; Grammarly serves 40 million daily users; Google's Gemini sits inside Gmail for three billion people; Gamma's presentation generator crossed 100 million users. The capability is real and the adoption is genuine. Yet the defining fact of this domain in mid-2026 is the chasm between that adoption and any measurable return. Across a now-overwhelming body of independent research, only a low-double-digit percentage of organisations report meaningful business value from these tools, while the rest cannot reliably distinguish real productivity gains from the perception of them.
The reason is no longer a mystery, and it is not a capability gap that a better model will close. Two structural costs sit underneath the measurement problem, and further investment in frontier models — the most capable current AI — cannot buy them away. The first is verification: fluent AI output masks error rather than reducing it, and study after study finds that the time saved drafting is largely reabsorbed into reviewing, correcting, and re-running. This scan gave the phenomenon a name and a number — "botsitting" — and pegged it at over half of the time AI appears to save. The second cost is cognitive: longitudinal evidence now shows AI assistance can erode the very capabilities it appears to augment, from a 26.9% collapse in study time on susceptible maths problems (with a 25% drop in learning) to a documented "augmentation trap" in which human-plus-AI decision teams perform worse than AI alone. The domain has finished its generation phase. It is stuck in a verification phase that nobody has costed properly.
What distinguishes Personal Effectiveness from flashier AI domains is that the bottleneck is human and organisational, not technical. The tools work; the barriers — authenticity erosion, trust collapse, skill atrophy, governance gaps, the sheer overhead of checking — are resistant to vendor investment. The result is a sharp, durable bifurcation. A handful of bounded, high-governance workflows extract genuine value: translation has matured into proven enterprise infrastructure (DeepL won 94% of blind matchups against the leading frontier models); finance teams report 250% returns on spreadsheet automation where verification gates are tight. Almost everywhere else, individual power users gain real time while organisational value remains elusive, and the gap is widening, not closing.
The defining theme of this scan is the hardening of a single insight: AI's measurable productivity return is being absorbed by hidden labour, and the field finally has the data to quantify it. The Work AI Index 2026 — 6,000 digital workers studied across Stanford and UC Berkeley — found that workers gain roughly 11 hours a week from AI but spend 6.4 of them on "botsitting": fixing output, re-running failed prompts, and debugging. Sixty-nine percent admit shipping unverified work, and only 13% of organisations realise meaningful business value despite 87% adoption. This puts a name and a number on the verification tax that prior scans could only describe. It converged with a wave of corroborating data: BCG found 42% of regular users now save a full workday a week (the gains are real for someone), while GoTo's Pulse of Work documented 39% of users reporting that AI reliance erodes their capabilities — rising to 46% among Gen Z — and 50% admitting overreliance.
Two further signals sharpened the domain's profile rather than moving its maturity. First, who benefits is becoming clear: Anthropic's analysis of roughly 400,000 Claude Code sessions established that domain expertise — not job title — predicts AI effectiveness, with expert users triggering twelve or more actions per prompt versus novices' five, and novices abandoning a fifth of sessions on errors. AI is a skill-leveller that paradoxically rewards existing skill. Second, leadership remains blind to its own tool landscape: only 41% of HR leaders can name two tools their workforce actually uses, while 29% still believe adoption is at the pilot stage — even as shadow AI spreads. No tier or trend movements occurred this cycle; every practice held position. Stability here is itself the signal. After three years, the structural barriers have not yielded, and the conversation has shifted decisively from "does AI help?" to "can anyone afford the overhead of running it well?"
The verification tax is now the dominant cost, and it scales with use. "Botsitting" consumes 55% of the hours AI saves (Work AI Index 2026); independent measurement repeatedly finds that roughly 80% of recovered writing time is reabsorbed into review. In regulated settings this becomes disqualifying: legal tools hallucinate 17–33% on domain queries, around 900 hallucination incidents have hit US court filings since 2023, and 25-plus federal courts now require AI-use certification before filing. More capable models do not fix this — fluency masks error rather than reducing it.
Adoption has decoupled from value, and the gap is structural. Stanford's AI Index recorded inaccuracy overtaking cybersecurity as the top enterprise AI risk for the first time. Only 13% of organisations see business value at 87% adoption; 56% of CEOs report no AI ROI; only 21% of finance leaders who deployed automation report clear returns. The barrier is not the tool — it is unchanged workflows, absent governance, and the inability to measure what changed. Companies keep buying because everyone else is, not because the numbers close.
The gains are real but concentrate among the already-skilled. Anthropic's 400,000-session analysis and converging research establish that AI disproportionately rewards domain experts, who extract far more per interaction, while novices abandon sessions on errors and — critically — risk skill erosion. World Bank data on 26,000 students found autonomous AI raised practice rates 18% but cut exam performance 20%. The tool that promised to level the field instead widens the gap between those who can verify its output and those who cannot.
Authenticity and trust are eroding faster than capability is improving. Recipients rate supervisor sincerity at 40–52% for AI-drafted emails versus 83% without; audiences detect AI-generated content within 30 seconds via tone patterns; human-written content draws 5.44x more traffic than AI alternatives despite 75% adoption. Deloitte found confidence in agentic systems — software that acts on its own — collapsed 89%. The more these tools are used at scale, the more the homogenised, neutral-toned output they produce becomes a liability rather than an asset.
A narrow band of bounded, governed workflows is pulling decisively ahead. Translation is proven enterprise infrastructure: DeepL won 94% of blind matchups against frontier models, with live 60-language Vatican deployments and 70-million-word monthly throughput at KBC Bank. Finance spreadsheet automation delivers 250% returns where verification gates are tight. The common thread is a tightly scoped task, structured data, and a hard human-review gate — and the looming EU AI Act high-risk classification (binding from December 2027) is about to make that governance discipline mandatory in healthcare, legal, and critical-services translation rather than optional.
Work AI Index 2026: Botsitting Consumes 55% of AI-Saved Hours (adoption-metric) — This is the defining data point of the scan: 6,000 workers across Stanford and UC Berkeley finally put a name and a number on the verification tax, revealing that the productivity story is real but the net gain is roughly half what the headline figure suggests. https://productimpactpod.com/news/botsitting-work-ai-index-2026/
Agentic Coding and Persistent Returns to Expertise — Anthropic (research-paper) — 400,000 Claude Code sessions showing domain experts trigger 12+ actions per prompt versus novices' 5, with novices abandoning 19% of sessions on errors, directly evidencing why AI widens rather than closes the skill gap the summary describes. https://www.anthropic.com/research/claude-code-expertise?ms=email
GoTo's Pulse of Work 2026: AI Adoption Has Outrun Work Design (adoption-metric) — 82% adoption but 39% of users report eroding skills and 50% admit overreliance, with 43% knowingly delivering suspected-low-quality output — the human cost of the adoption-value gap rendered in worker self-report. https://www.seriousinsights.net/gotos-pulse-of-work-2026/
AI Tools at Work — What 50,000 Employees Actually Use vs What HR Thinks (adoption-metric) — Only 41% of HR leaders can name two tools their workforce actually uses and 29% still believe deployment is at pilot stage, documenting the leadership blindness that makes governance reform structurally difficult. https://www.we360.ai/blog/ai-tools-at-work-survey-2026
KPMG's AI Report Had 40 of 45 Fabricated Citations — GPTZero Calls It 'Vibe Citing' (case-study) — A major consulting firm's hallucinated report on agentic AI (40/45 citations fabricated, contradicting KPMG's own prior data) is the sharpest possible illustration that fluent output masks error even at the professional tier where verification should be strongest. https://nilsliu.dev/en/insights/2026-06-14-kpmg-ai-hallucination-vibe-citing/
AI Hallucinated Citations Are Flooding Courts (opinion) — Approximately 900 hallucination incidents in US court filings since 2023, six-figure sanctions, and 25+ federal courts now requiring AI-use certification: the verification tax in regulated settings is no longer tolerable overhead but a disqualifying liability. https://chatgptdisaster.com/0615-ai-hallucinated-court-citations-lawyer-sanctions-record-fines-2026.html
AI-Assisted Emails May Put Trustworthiness at Risk in Workplace Communications — USC Marshall (research-paper) — Academic confirmation that AI-drafted email measurably collapses perceived sincerity (40–52% versus 83% without AI), grounding the summary's authenticity-erosion claim in peer-reviewed evidence rather than anecdote. https://www.marshall.usc.edu/news/ai-assisted-emails-may-put-trustworthiness-at-risk-in-workplace-communications
Helped by GPT-5, Then Left to Their Own Devices: A Randomized Trial Measures the Learning Cost of AI Assistance (research-paper) — A 1,222-participant multi-site RCT showing unguided AI assistance reduces persistence and undermines independent problem-solving within roughly 10 minutes, establishing the behavioral mechanism behind the skill-atrophy pattern the summary identifies. https://www.actuia.com/en/news/helped-by-gpt-5-then-left-to-their-own-devices-a-randomized-trial-measures-the-learning-cost-of-ai-assistance/
AI-Assisted Learning and the Illusion of Competence (research-paper) — 1,498 undergraduates showing a 7.62/10 AI-assisted output quality against 5.55/10 independent mastery, quantifying the gap between what AI-augmented work looks like and what users actually learned — the mechanism behind the "augmentation trap" named in the summary. https://ojed.org/jise/article/view/10832
Customer Story: KBC Bank — DeepL (case-study) — Belgium's largest bank processing 70 million words monthly across 55 language pairs with a 20% in-house translator productivity gain is the domain's clearest positive case: bounded scope, structured data, measurable throughput — the exact conditions the summary identifies as prerequisites for genuine value. https://www.deepl.com/en/customer-stories/kbc-bank