The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
AI for teaching, tutoring, assessing, and managing learning experiences. Mostly leading-edge: adaptive tutoring and automated grading are approaching good practice, but institutional adoption is slow due to academic integrity concerns and uneven infrastructure. Three practices are bleeding-edge, including AI-generated curricula and autonomous classroom agents. Most trajectories are stalled — policy and pedagogy lag behind the technology.
Education is the domain where AI adoption most dramatically outpaces institutional readiness. Across fifteen practices spanning tutoring, assessment, content generation, accessibility, and administration, the pattern is remarkably consistent: the technology works in controlled settings, vendors have scaled to millions of users, and yet the organizations responsible for deploying these tools -- schools, universities, districts -- remain structurally unprepared to use them well. Eighty percent of U.S. students use AI for schoolwork; only 6% attend schools with clear AI policies. Sixty-eight percent of K-12 teachers use AI weekly, up from 29% in January 2025, but only 34% believe it makes them more effective. Khan Academy's chief learning officer conceded in May 2026 that "so far I am not seeing the revolution in education," despite Khanmigo reaching 1.4 million cumulative users and processing 108 million interactions. These are not early-stage signals. They describe a domain where the gap between tool availability and institutional capacity has become the defining constraint on impact.
The maturity profile reflects this tension. Two practices -- adaptive pacing and skills assessment -- have reached the point where most organizations could adopt them with standard effort. The bulk of the domain sits at leading-edge: technically proven, deployed by forward-leaning institutions, but not yet standard practice. Conversational tutoring, automated grading, curriculum design, formative feedback, question generation, learning analytics, language learning, accessibility support, simulated practice environments, coding education, and education administration all occupy this tier. Two practices -- AI content detection and educational content summarization -- remain experimental, stuck at bleeding-edge by barriers that have not materially changed in over a year. Only two practices show forward momentum: education administration (where CRM and SIS automation delivers measurable ROI) and simulated practice environments (where sales and contact center training has crossed into mainstream enterprise deployment with 58% Fortune 500 penetration). The rest are stalled -- not because the tools have stopped improving, but because the institutional, pedagogical, and governance infrastructure required to deploy them responsibly has not kept pace.
What distinguishes education from other AI domains is the stakes of getting deployment wrong. In most industries, a failed AI pilot wastes money. In education, it can degrade learning outcomes. A Turkish RCT demonstrated that unrestricted AI access caused students to score 17% worse on exams than controls. An ICLR 2026 paper showed 39% accuracy degradation across 15 major language models in multi-turn conversations -- the exact format used by AI tutors. Stanford research documented systematic demographic bias in AI feedback: Black students received praise emphasizing "leadership," Hispanic students triggered grammar corrections, and white students received structural critique on argument quality, regardless of submission quality. A widely cited meta-analysis claiming ChatGPT boosts learning was retracted by Springer Nature in May 2026 for methodological discrepancies, after accumulating 262 peer-reviewed citations. The evidence base itself is unreliable. The domain is not stalled because the technology fails -- it is stalled because the conditions for responsible deployment remain unmet at most institutions.
This scan cycle produced substantial new evidence across every practice in the domain, though no practice changed its maturity tier or trend direction. The most consequential developments center on three themes: the widening gap between adoption and efficacy, the collapse of AI content detection as a viable strategy, and new evidence crystallizing the design constraints that determine whether AI helps or harms learning.
Khan Academy's public acknowledgment that only 15% of students with access regularly engage with Khanmigo -- triggering a full summer 2026 platform redesign -- is the cycle's most important signal. It confirms what scattered evidence has suggested for months: tool availability does not translate to sustained learning engagement. A parallel meta-analysis of 23 coding education studies quantified the productivity-versus-learning gap directly, finding moderate productivity gains (g=0.33) but no significant learning outcome improvements (g=0.14). Springer Nature's retraction of a widely cited ChatGPT education meta-analysis further weakened the evidence base that proponents have used to justify rapid deployment. On the positive side, a massive RCT on South Africa's Siyavula platform (160,000 students, 17 million practice problems) demonstrated that careful UI design -- written prompts and visual nudges -- measurably improves student persistence after failure, with combined effects of +11%. A meta-analysis of 72 AI teaching studies confirmed a significant positive effect size (g=0.586), though heterogeneity was driven by AI type and implementation context, not by AI presence alone.
In AI content detection, the University of Texas at Austin formalized a ban on all third-party detection software, joining a growing list of institutions (Curtin, Vanderbilt, UCLA, Yale, Johns Hopkins, Northwestern) abandoning these tools. Independent testing showed detector accuracy has plateaued since 2023 with no improvement, while vendor claims diverge sharply from audited performance. Expert humans detect AI-generated text at 92.7% accuracy with 4% false positives -- significantly outperforming all commercial detectors on humanized text. The market for simulated practice environments continued to expand, with TDCX's 600-agent e-commerce deployment showing 20% customer satisfaction gains and 50% faster proficiency ramp, and insurance firms deploying compliance training simulations with learning gains of 0.73-1.3 standard deviations. Stability across the rest of the domain is itself a signal: the structural barriers -- governance gaps, training deficits, equity concerns, the persistent gap between identification and effective intervention -- remain unchanged.
The adoption-efficacy gap is now quantified and undeniable. Sixty-eight percent of K-12 teachers use AI weekly but only 34% believe it increases their effectiveness, per a RAND survey of 4,200 teachers. Khan Academy reports only 15% active engagement among students with Khanmigo access. A coding education meta-analysis found productivity gains (g=0.33) but no learning improvements (g=0.14). The gap is not between believers and skeptics -- it is between organizations that have deployed AI and organizations that have deployed AI effectively. The difference lies in implementation design, pedagogical guardrails, and institutional support structures that most schools and districts have not built.
The evidence base for AI in education is itself unreliable. Springer Nature retracted a widely cited meta-analysis claiming large positive ChatGPT learning effects, after it had accumulated 262 peer-reviewed citations. A meta-analysis of 936 learning analytics papers found 70% lacked any learning outcome measures. Independent analysis of 1,000+ student success initiatives found 40% showed little or no measurable impact. Meanwhile, the U.S. Department of Education's research infrastructure has been gutted -- DOGE terminated $881 million in IES contracts, reducing staff from 200 to 31 and eliminating the capacity to measure whether programs work. Institutions are making deployment decisions against a backdrop of weak evidence and disappearing evaluation capacity.
Demographic bias is systematic across every assessment-adjacent practice. Stanford research documented consistent, directional bias across four AI grading models: different feedback tone and pedagogical expectations by student race, gender, and achievement level. AI content detectors produce false positive rates two to three times higher for non-native English speakers -- Stanford measured 61.3% false positives on TOEFL essays. Learning analytics systems show 19-21% false negative rates for Black and Hispanic students versus 6-12% for white and Asian students. Skills assessment tools used in hiring select resumes with white-associated names at 85% versus 9% for Black-associated names. These are not isolated findings; they reflect training data biases that current architectures reproduce at scale.
Governance has shifted from optional to regulatory, but infrastructure lags. Twenty-five states have active AI-in-education bills; three (Idaho, Ohio, Georgia) have signed laws mandating AI governance by July 2026. COPPA 2026 took effect April 22, requiring parental consent for AI-powered learning analytics in K-12. The EU AI Act classifies educational AI as high-risk from August 2026. Yet 87% of U.S. schools use AI tools without formal governance frameworks, 71% of teachers received no formal AI training, and Los Angeles USD -- the nation's second-largest district -- reversed its AI adoption after a vendor tool generated inappropriate imagery for fourth graders and a separate $6.2 million vendor fraud was uncovered. The regulatory framework is arriving; the institutional capacity to comply is not.
The "teacher-amplifier" model has won, but its economics are unresolved. Across tutoring, grading, feedback, and curriculum design, the field has converged on a consensus: AI works when it drafts and humans validate. Formative feedback platforms maintain mandatory human review. Essay grading systems require teacher override. Curriculum tools produce "structurally competent and educationally generic" output requiring substantial revision. This model delivers value but does not deliver the cost savings that justified procurement. Every AI artifact requires validation, creating what practitioners call "supervision debt." The question facing institutions is whether the teacher-amplifier model saves enough time to justify its total cost of ownership -- including training, governance, and quality assurance -- or whether it simply redistributes workload without reducing it.
Only 15% of students use Khanmigo, Khan Academy reveals redesign (adoption-metric) — Khan Academy's public admission that only 15% of students with access regularly use the platform, despite 108 million cumulative interactions, is the clearest quantification of the adoption-efficacy gap that defines this domain: availability does not produce engagement. https://www.edtechinnovationhub.com/news/only-15-percent-of-students-with-access-to-khanmigo-actually-use-it-khan-academy-admits
A meta-analysis of the effect of generative AI on productivity and learning in programming (research-paper) — Maier et al.'s 23-study meta-analysis finding moderate productivity gains (g=0.33) but no significant learning improvement (g=0.14) directly quantifies the distinction between making students faster and making them better — the central unresolved tension across every AI education practice. https://arxiv.org/abs/2605.04779
Publisher withdraws study claiming ChatGPT boosts learning (news-coverage) — Springer Nature's retraction of a widely-cited meta-analysis (262 peer-reviewed citations) after methodological discrepancies illustrates why the domain's evidence base cannot support the deployment pace institutions have adopted: the literature optimists cite is itself unreliable. https://www.computing.co.uk/news/2026/ai/publisher-pulls-study-claiming-chatgpt-boosts-learning
AI gives more praise, less criticism to Black students (research-paper) — The Hechinger Report's coverage of the Stanford LAK best-paper nominee documents that demographic bias in AI feedback is not incidental but directional and consistent across four models, making it a structural feature of current architectures rather than a correctable edge case. https://hechingerreport.org/proof-points-ai-bias-feedback/
AI Detection Software Guidance by UT Austin (case-study) — UT Austin's formal ban on all third-party AI detection software, joining a cohort of major research universities, marks the institutional collapse of detection as a viable governance strategy and forces the field toward assessment redesign — a harder, slower path. https://ailearninsights.substack.com/p/ai-detection-software-guidance-by
Are AI Detectors Getting Better in 2026? An Evidence-Based Year-in-Review (opinion) — Independent analysis showing detector accuracy has plateaued since 2023 with no improvement, while vendor claims diverge sharply from audited performance, explains why institutional abandonment is accelerating: the tools never worked at the accuracy levels marketed. https://tohuman.io/blog/are-ai-detectors-getting-better-2026
AI-Powered Roleplay Simulator Accelerates CX Training for a Global E-Commerce Leader (case-study) — TDCX's 600-agent deployment achieving 20% CSAT gain and 50% faster proficiency ramp is the strongest production-scale counterpoint in the domain: where AI training operates in bounded, measurable professional contexts rather than open-ended learning, it delivers on its promises. https://www.tdcx.com/insights/case-studies/ai-roleplay-simulator-cx-training/
Design Tweaks That Keep Students Learning (research-paper) — The Siyavula RCT (160,000 students, 17 million problems) showing that written prompts and visual nudges after failure increase persistence by 11% combined demonstrates that adaptive pacing outcomes are highly sensitive to UI design decisions most institutions are not equipped to make. https://www.cmu.edu/news/stories/archives/2026/may/design-tweaks-that-keep-students-learning
Universities use AI to cut costs, predict student dropouts and align programs with labor market demand (adoption-metric) — Concrete retention gains from named institutions (IU Pennsylvania +4pp, Georgia State +7pp graduation) confirm that learning analytics applied to retention is the practice area with the clearest ROI — and contrasts sharply with practices where comparable evidence does not exist. https://completeaitraining.com/news/universities-use-ai-to-cut-costs-predict-student-dropouts/
The 5 Hour Difference: What Early Research Reveals About AI Quality in Schools (case-study) — Panorama's district deployment data showing 55% first-year adoption, 5 hours/month saved, and 79-84% quality ratings for AI-generated lesson plans illustrates exactly what the teacher-amplifier model looks like in practice — and why its economics remain contested when supervision costs are included. https://www.panoramaed.com/blog/ai-quality-research