The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail
AI that supports individuals in acquiring new skills through structured practice, feedback, and progression tracking. Includes practice exercise generation and personalised feedback; distinct from L&D recommendations which suggest resources rather than structuring practice.
AI-powered deliberate practice works in controlled settings but June 2026 evidence reveals a critical performance-learning paradox with identified behavioral mechanisms that challenges the entire category. Efficacy research remains strong: Google's Gemini study (Sierra Leone, Italy) showed +0.26 to +0.38 SD gains with learning-science grounding; Wharton's adaptive sequencing RCT demonstrated +0.15 SD improvements through proactive pedagogical design; Harvard physics RCT found AI tutoring d=0.73-1.3. Yet the same period exposed the core mechanism: Liu, Christian, Bakker, and Dubey's 1,222-participant RCT documents unguided AI assistance reduces persistence and independence within ~10 minutes, establishing behavioral dependency as intrinsic risk. World Bank analysis of 26,000 Chinese students shows autonomous AI for homework raises practice 18% but reduces exam performance 20% (1.4 SD effect)—establishing metacognitive laziness at scale. The categorical tension is no longer research vs. deployment but performance vs. learning: unguarded AI enables visible short-term task completion while degrading durable skill acquisition through three mechanisms: (1) cognitive offloading (reduced persistence and independent problem-solving), (2) illusion of competence (AI-assisted output quality masks poor mastery), (3) cognitive load penalties (higher load without compensating skill gains in high-stakes domains). Properly designed systems (guardrails, effort preservation, Socratic structure) can preserve learning per EFFORT-AI framework but require pedagogical expertise and institutional commitment most organizations lack. Khan Academy founder Sal Khan admitted June 2026 that Khanmigo's real-world impact is "mixed bag / neutral-to-marginal positive." Enterprise context shows identical pattern: 85% of workers cannot apply AI training to their roles because training context disconnects from work context. Duolingo (135M users, 21% YoY growth) and Khan Academy (700K+ users, 380+ U.S. districts) prove infrastructure scale; limited adoption gains prove design matters more than deployment volume. The defining barrier is behavioral and institutional, not technical.
June 2026 market bifurcation is stark: consumer platforms (Duolingo, Khan Academy) prove infrastructure viability; institutional adoption remains bottlenecked by three critical gaps identified by Brookings Institution synthesis: (1) teacher involvement requirement (gains only when teachers remain present), (2) infrastructure readiness (40% of primary, 50% of secondary schools globally lack internet), (3) rigorous independent evaluation (missing in most rollouts). Duolingo scaled to 56.5M DAU (+21% YoY) with 12.5M paid subscribers, accelerating investment in AI-driven Speaking Adventures, Video Call (2x spoken words/user), and spoken tokens—demonstrating sustained deliberate practice commitment at scale and profitable model. Khan Academy reached 700K+ students across 380+ U.S. school districts with Canvas LMS integration; critical engagement reality: only 15% of students with Khanmigo access actively use the AI tutor despite 108M+ total interactions since 2023 launch, prompting platform redesign for June 2026 rollout. Market signals diverge: AgentConn's May 2026 analysis identifies 95% engagement exclusion rate in Khanmigo pilot design and 60% engagement cliff after 3 weeks unfacilitated use; teacher tools (MagicSchool 6M educators, Brisk 1M) won 2024-25 market share from student tutors. Pedagogical design determines outcome: OECD May 2026 meta-analysis isolates mechanism—generic "fast AI" tools (e.g., ChatGPT for homework) boost immediate practice accuracy (+48-127%) while reducing exam performance (-17%) due to cognitive offloading; "slow AI" with safeguards (step-by-step scaffolding, restricted hints, Socratic questioning) preserves learning. Behavioral evidence at scale: ChatGPT's Nov 2022 release caused 26.9% collapse in college study time on AI-susceptible math problems with 25% decline in learning gains, establishing population-level "cognitive surrender" as adoption risk. High-stakes domain evidence: meta-analysis of surgical skill training shows AI tutoring produces small technical gains (0.20 SD) but significantly higher cognitive load, indicating design quality matters more than technology capability. Enterprise context mirrors K-12 pattern: 85% of corporate learners cannot apply AI training to their roles because training disconnects from work context—indicating skill transfer fails without environmental alignment. Core barrier is not technology but institutional design: teacher training burden, infrastructure readiness, student behavioral patterns (passive tool use instead of active help-seeking), and absence of mechanisms for cognitive effort preservation essential for durable skill development.
— World Bank analysis of 26,000 Chinese students: autonomous AI use for homework improved practice 18% but reduced exam performance 20% (1.4 SD effect), entrance exam scores drop 18-24%, establishing metacognitive laziness at population scale.
— Peer-reviewed study of 1,498 undergraduates: gap between AI-assisted output quality (7.62/10) and independent knowledge mastery (5.55/10) documents illusion of competence, showing AI assistance improves perceived but not actual skill acquisition.
— Multi-site RCT (1,222 participants) by Liu, Christian, Bakker, Dubey: unguided AI assistance reduces persistence and undermines independent problem-solving; effect occurs within ~10 minutes, establishing behavioral mechanism critical to deliberate practice design.
— Brookings synthesis of 20+ years EdTech research: success requires three conditions (teacher involvement, infrastructure readiness, rigorous evaluation); cites RCTs from Nigeria and Ghana showing gains only when teachers remain present; establishes deployment prerequisites.
— Peer-reviewed framework directly addressing AI-supported deliberate practice: EFFORT-AI specifies six phases to preserve cognitive effort and retrieve, explain, monitor, and transfer as learner-owned processes rather than delegated to AI.
— Meta-analysis of 4 RCTs (268 surgical trainees): AI tutoring showed small OSATS skill gain (0.20 SD) but significantly higher cognitive load, with authors concluding findings do not support replacing human tutors and indicating hybrid model required.
— Khan Academy founder candid assessment: AI tutoring outcomes are 'mixed bag' / 'neutral-to-marginal positive'; disconfirms 2023 leading-edge claims; acknowledges cheating concerns and low adoption despite infrastructure scaling.
— Large independent survey (1,660 teachers, 10 countries): gap between adoption (50% positive) and meaningful impact (13% 'very positive'); teacher training critical (75% well-trained vs 38% untrained report positive impact).
2023-H1: Duolingo launched Max subscription (GPT-4 powered Video Call and Roleplay features, Mar 2023) and reported 20.3M daily active users in Q1. Khanmigo early pilot in Newark (Jun 2023) showed accuracy issues including fabricated misinformation, highlighting deployment reliability risks. Tutor CoPilot RCT (1,800 underserved K-12 students) validated AI-augmented tutoring with 4pp mastery gains (9pp for lower-rated tutors). Research demonstrated AI dialogue improves physics misconception correction by 10.5pp. Corporate L&D adoption remained minimal: CIPD survey found only 5% of training professionals using AI for learning, signaling enterprise adoption barriers despite consumer platform growth.
2023-H2: Duolingo reached 21M+ daily active users with new AI-powered roleplay and explanation features (Nov 2023); LearnLingo launched conversational AI language tutor on HN (Jul 2023); critical assessment raised questions about whether gamification-heavy practice produces durable learning outcomes.
2024-Q1: Duolingo deployed GPT-4 for lesson personalization and conversation practice, cutting contractor costs while scaling AI capabilities (Jan 2024); controlled studies showed ChatGPT-supported dental education outperforming traditional research-based learning on exams; Khan Academy's Khanmigo reported improved student comprehension and engagement in mathematics and science; engineering education studies revealed AI strength in conceptual teaching but limitations in numerical problem-solving.
2024-Q2: Duolingo published four-step human-AI course creation methodology in production (Jun 2024); Khan Academy secured Microsoft partnership to scale Khanmigo for Teachers free to all U.S. educators (May 2024). Empirical research confirmed AI-generated evaluative feedback enhances skill acquisition in sequential tasks (PLOS ONE, May 2024). Teacher adoption survey (RAND, n=1,020) showed 33% overall AI adoption, 18% regular use for skill-building; Temple University faculty integrated AI for formative feedback with positive student outcomes. Algorithm aversion emerged as critical adoption barrier despite proven AI reliability (UCL, Jun 2024); pedagogical critiques raised concerns about AI tutoring mechanics.
2024-Q3: Khanmigo expanded to five additional U.S. states with state education department partnerships; Microsoft partnership extended free access to 49 countries via Azure OpenAI (Aug 2024). Empirical validation continued with quasi-experimental studies showing Duolingo's impact on learner engagement; randomized trials validated deliberate practice methodology with structured peer feedback in professional training (Sep 2024). Teacher sentiment toward AI in classrooms shifted more positive (Clever survey, Sep 2024), though critical gaps in inclusive implementation persisted. AI hallucinations in mathematics tutoring emerged as concrete reliability concern, with documented Khanmigo failures on algebra problems (Aug 2024), threatening adoption credibility among institutions.
2024-Q4: Khan Academy's large-scale efficacy study of ~350K students confirmed 30+ minutes weekly usage associated with ~20% greater learning gains; 266 U.S. school districts piloting Khanmigo with documented teacher/student success. Philippines Department of Education partnership enabled nationwide free AI access with government data infrastructure. Stanford RCT validated human-AI collaboration model (Tutor CoPilot), with AI-augmented tutoring improving mastery by 4-9pp. However, enterprise AI scaling challenges persisted (deployment declining from 55.5% to 47.4%; ROI realization declining from 56.7% to 47.3%), and educators highlighted core limitation of curriculum-aligned problem-solving in addressing learning motivation and meaningful context necessary for durable skill development.
2025-Q1: Khanmigo engineering improvements accelerated (math calculator integration, GPT-4 Omni upgrade for numerical reasoning, improved error detection) alongside named school deployments (Enid High School in Oklahoma with documented ELL/special education benefits). Duolingo launched Lily, AI video call feature for real-time conversational speaking practice with personality and adaptive difficulty. Empirical research strengthened guardrails argument: controlled study showed properly designed AI tutors achieved 127% mastery improvement vs unrestricted ChatGPT's 17% loss (Penn, 1,000 students). News coverage documented Tutor CoPilot and Class Companion deployments showing 4pp mastery gains in 1,800-student studies. Critical assessment persisted: educator Dan Meyer documented specific Khanmigo interaction failures and argued AI tutors fail to approximate human teaching, highlighting limitations in current design patterns.
2025-Q2: Khanmigo expanded Canvas LMS integration across universities (UNMC, Rutgers). Duolingo's AI course generation accelerated to 150 new courses in 12 months but triggered labor displacement concerns about human curriculum designer roles; Lily feature engineering documented guardrailed design patterns for conversational practice. Research validated AI dialogue's effectiveness at correcting misconceptions in learning contexts. Intensifying pedagogical critique questioned whether gamification and habit-formation mechanics in Duolingo actually drive language fluency or primarily psychological dependence on streaks. Labor displacement emerged alongside continued enterprise ROI challenges (47.3%) as dual headwinds to broader institutional acceleration.
2025-Q3: Duolingo achieved 51% year-over-year DAU growth to 47M daily active users with $1B revenue forecast, demonstrating production-scale success of AI-first strategy. Khanmigo student user base expanded from 40K (one year prior) to 700K with 1M+ expected by 2025-26 school year. However, adoption barriers intensified: University of Iowa pilot revealed <1/week faculty usage despite positive concept reception due to integration friction; teacher anxiety about AI errors persisted (42% of untrained users); 51% of professionals reported AI training as excessive burden. Equity concerns surfaced—critical assessment documented Khanmigo failing underrepresented students lacking self-efficacy, raising questions about whose skill development is actually supported. Survey data showed 60% of US K-12 teachers using AI weekly with documented time savings, but deployment economics and training gaps remained structural bottlenecks to scaled adoption.
2025-Q4: Rigorous efficacy research strengthened the case for AI tutoring: Harvard physics RCT (published Scientific Reports, Nov 2025) demonstrated pedagogically-designed AI tutors outperforming human instruction with effect sizes d=0.73-1.3 and 70% efficiency gain; systematic literature review of 48+ studies confirmed comparative effectiveness across global contexts; multi-university qualitative studies provided adoption insights. However, critical limitations emerged: analysis documented Duolingo failing to prepare learners for real-world conversational deployment—grammar drills do not transfer to authentic speech with natural slang and context-dependent patterns. Khanmigo reached 1M+ users with sustained Canvas LMS expansion. Adoption barriers persisted (training burden, institutional friction, equity gaps), and enterprise ROI remained flat, indicating efficacy validation has not yet translated into scaled institutional breakthrough.
2026-Jan: Duolingo's DET practice test demonstrated efficacy across 25,000 test-takers with improved confidence and performance through adaptive AI practice generation; Khanmigo continued geographic expansion into New Mexico districts with new pedagogical features. However, critical research from Anthropic revealed that heavy AI assistance impairs skill acquisition (17% decrease in library-specific mastery despite productivity gains), and economic surveys documented persistent adoption failures—56% of CEOs reported no effect from AI investments on costs or revenue. Infrastructure scaling continued but real-world benefit realization remained misaligned with deployment growth.
2026-Feb: Regional adoption accelerated with LATAM higher education reaching 92% student and 79% faculty AI engagement, while financial pressures mounted on core platforms: Duolingo experienced 23% stock correction due to growth deceleration (18-20% target) and margin compression, signaling economic challenges in AI-first strategy. Critical analysis documented narrow scope of AI tutoring (16% of learning dimensions) and systemic adoption barriers: industry synthesis found 95% GenAI pilot failure-to-scale rate and 80.3% overall AI project failure. SIAI research balanced efficacy gains (Nature-published AI tutor outperformance) against persistent inequality and error risks, indicating maturation beyond early-stage hype toward recognition of real-world adoption constraints.
2026-Mar: Pedagogical design mechanisms came into sharp focus. Khanmigo case study documented Socratic method pedagogy (guiding with questions rather than direct answers) driving 68K→700K user growth across 380+ US school districts with Harvard/Stanford RCT validation. Harvard physics RCT with 194 students showed AI tutoring learning gains >2x versus active learning when designed with active problem-solving-first and cognitive load management. University economics RCT (334 students) found unrestricted GPT-4 access raised exam performance 0.23 SD vs control, contradicting premature-reliance concerns when students adopted scaffolding strategies. Reinforcement learning-enhanced problem sequencing (Taipei high schools, 770 students) improved exam performance 0.15 SD. Systematic review of 2013-2025 literature documented AI adaptive systems yield g=0.50–0.70 effects moderated by study quality and professional development availability. Duolingo reached 135M monthly users (+40% revenue growth YoY, March 2026). Institutional scaling pressures remained: teacher training burden, integration friction, and equity gaps concentrating benefits among high-self-efficacy learners.
2026-Apr: Deployment reality diverged sharply from efficacy research. Khan Academy founder Sal Khan admitted Khanmigo became "a non-event" for most students, with adoption stalling below initial 2023 promises (Chalkbeat, April 9). Named teachers at flagship schools (Hobart HS) reduced usage, citing student frustration; teacher sentiment skewed toward administrator enthusiasm rather than classroom uptake. Systematic evidence of limitations emerged: (1) RCT (N=1,222) showed AI assistance reduces persistence and impairs unassisted performance after just 10 minutes, directly undermining durable skill acquisition (Liu et al., arXiv April 6); (2) University of Sheffield analysis documented AI tutors delivering errors with identical confidence as correct answers (45% error rate, indistinguishable from correct responses); (3) Stanford preprint found systematic bias in AI tutor feedback by race, gender, and ability—high-achieving/White students receive development-focused critique, Hispanic/ELL students receive grammar-only correction; (4) Umbrella review (102 systematic reviews, Huang et al., Elsevier April 11) identified critical gaps in teacher training, student AI literacy, ethical frameworks, and theoretical grounding. Pedagogical design showed measurable impact: comparative testing found Socratic method (Khanmigo, 23% learning gain) outperformed direct-answer tutors (ChatGPT, 16%), but effectiveness bounded by prerequisite knowledge—Socratic design fails when foundational knowledge absent. ITS meta-analysis (28 studies, 4,597 K-12 students, npj Science of Learning April 10) confirmed medium-to-large effects vs traditional instruction but mixed vs non-intelligent tutoring, with effectiveness contingent on teacher guidance and pedagogical design, not technology. Duolingo's April "AI-first" announcement triggered user backlash (24.5% positive vs 41.1% negative globally; users citing job displacement and quality concerns). Institutional scaling remained stalled: University of Iowa pilot showed <1/week Khanmigo usage despite positive reception; 95% GenAI pilot failure-to-scale rate persisted. Equity gaps intensified: critical assessment noted Socratic design ineffective for underrepresented students lacking pre-existing self-efficacy; neither platform prepared learners for real-world deployment (authentic conversation, transfer to novel contexts).
2026-May: Production refinement and behavioral evidence emerged alongside new research documenting the performance-learning paradox at scale. Khan Academy released detailed A/B testing results (May 6) from 1.35M+ tutoring sessions showing learning history (+3.4%), prerequisite skill review (+2.7%), and conversation context (+5.09% cognitive engagement) drive measurable skill transfer gains—indicating iterative engineering improves outcomes. However, critical redesign decision (May 5) publicly acknowledged 15% active engagement rate despite 108M total interactions, triggering shift from passive to proactive AI architecture. Duolingo Q1 earnings (May 4) showed 56.5M DAU (+21% YoY), 12.5M paid subscribers (+21%), with accelerated investment in Speaking Adventures, Video Call, and voice-first features. A large-scale longitudinal study (3.2M math problem interactions, arXiv May 2026) found ChatGPT's release caused a 26.9% collapse in college study time on AI-susceptible problems with 25% decline in learning gains on proctored assessment, establishing behavioral cognitive surrender at population scale. OECD analysis documented generic "fast AI" tools raising practice performance +127% while reducing exam scores -17%, naming "metacognitive laziness" as the core mechanism. Enterprise context mirrored: Docebo survey of 2,000+ learners found 85% cannot apply AI training to their roles, with 78% trained in systems disconnected from work context—confirming that skill transfer fails without environmental alignment. Peer-reviewed research (AIED 2026, May 7) analyzing 10,235 code submissions revealed engagement-based behavioral signals predict student learning better than pedagogical quality alone, identifying substantial tutor-to-tutor effectiveness variation in driving student action. Critical assessment intensified: research from MIT, Carnegie Mellon, Oxford, UCLA (May 8) documented 10 minutes of AI assistance paradoxically impairs problem-solving persistence and analytical thinking, with students relying on direct answers underperforming after tool removal. Engineering playbook from vendor confirmed production ITS at 600K–1M+ students (Carnegie Learning MATHia, Khan Academy, Duolingo) requires five-layer architecture (learner model, curriculum, pedagogical strategy, interface, evaluation) to achieve effect sizes d=0.66–0.79; without guardrails, systems regress to unguarded chatbots. The picture stabilizes: efficacy gains remain real and replicable in controlled, pedagogically-aligned contexts; but real-world deployment is bottlenecked by behavioral adoption barriers, cognitive dependency risks (skill atrophy without scaffolding), training burdens, and unresolved equity gaps.
2026-Jun: Multiple converging research streams documented the performance-learning paradox at behavioral and population scale. A World Bank analysis of 26,000 Chinese students found autonomous AI use for homework raised practice rates 18% but reduced exam performance 20% (1.4 SD)—establishing metacognitive laziness as a real population-level effect. A 1,222-participant multi-site RCT (Liu, Christian, Bakker, Dubey) documented that unguided GPT-5 assistance reduces persistence and impairs independent problem-solving within ~10 minutes, identifying the precise behavioral mechanism. A peer-reviewed study of 1,498 undergraduates quantified the illusion of competence gap: AI-assisted output quality 7.62/10 vs. independent mastery 5.55/10. The EFFORT-AI framework (Frontiers in Education) proposed six phases to preserve cognitive effort as a learner-owned process, and a surgical skills meta-analysis (4 RCTs, 268 trainees) found AI tutoring yielded only 0.20 SD gains with significantly higher cognitive load, concluding hybrid human-AI models remain necessary. Founder candor accompanied the research: Sal Khan admitted Khanmigo's real-world impact is "mixed bag / neutral-to-marginal positive." The Brookings synthesis identified three non-negotiable prerequisites for any deployment showing gains—teacher involvement, infrastructure readiness, and rigorous independent evaluation—none of which are consistently met at scale.