Skill acquisition & deliberate practice support

The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.

AI Maturity by Domain

Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail

DOMAIN

BLEEDING EDGEESTABLISHED

LEADING EDGE

TRAJECTORY— Stalled

AI that supports individuals in acquiring new skills through structured practice, feedback, and progression tracking. Includes practice exercise generation and personalised feedback; distinct from L&D recommendations which suggest resources rather than structuring practice.

OVERVIEW

AI-powered deliberate practice works in controlled settings but deployment reality reveals a wide gap between efficacy research and real-world adoption. Rigorous trials tell a compelling story: a Harvard physics RCT found AI tutors outperforming human instruction with effect sizes up to d=1.3, a Penn study of 1,000 students showed 127% mastery improvement when AI tutors were properly guardrailed, and a large-scale international RCT (14,892 students across 4 countries) found MathMentor-GPT produced measurable math achievement gains with teacher integration amplifying effects. Pedagogical design matters enormously: Khanmigo's Socratic method and careful guardrailing show doubled learning gains over active-only instruction, and independent testing shows Socratic questioning (23% improvement) outperforms direct answers (16% improvement). Yet April 2026 interviews with Khan Academy founder Sal Khan documented a dramatic reversal: Khanmigo "was a non-event" for most students despite deployment to 700K+ students and 380+ U.S. school districts, with adoption waning at flagship schools. New systematic evidence reveals fundamental tensions: a 1,222-student RCT shows AI assistance reduces persistence and impairs unassisted performance (undermining durable skill acquisition); peer-reviewed working paper establishes cognitive harm as foreseeable documented risk with neuroimaging evidence; University of Sheffield analysis documents AI tutors delivering errors with identical confidence as correct answers (45% of AI responses contain significant inaccuracy); Stanford study finds AI tutors give unequal feedback based on race and ability (equity failure); and peer-reviewed umbrella review identifies critical gaps in teacher training and pedagogical theory. The OECD's 2026 policy framework reframes AI in education: benefits depend entirely on how AI is embedded in pedagogical design, not on whether students have access. Consumer platforms (Khanmigo at 700K+ students with only 15% engagement, Duolingo at 135M monthly users) prove infrastructure viability, but most institutions remain blocked by training burdens, integration friction, equity gaps, and motivational barriers. The defining tension is no longer efficacy research but deployment reality: proven gains in controlled contexts with proper design, yet persistent cognitive dependency risks, adoption barriers, and hidden failure modes in real schools.

CURRENT LANDSCAPE

Duolingo and Khanmigo continue at deployment scale but April 2026 signals reveal widening awareness of adoption barriers and learning design limitations. Duolingo has scaled to 135M monthly active users (March 2026), 11.5M paying subscribers, and 150 new courses; yet April 2026 announced an "AI-first" strategic pivot triggered massive user backlash (sentiment: 24.5% positive vs 41.1% negative globally), with users citing concerns about job displacement and quality degradation. Stock correction of 23% in early 2026 reflects investor concern about decelerating growth and margin compression. Khanmigo has grown to 700K+ students across 380+ U.S. school districts with Canvas LMS integration at universities (UNMC, Rutgers); however, Khan Academy founder Sal Khan admitted in April 2026 that the platform was "a non-event" for most students. New engagement metrics reveal the depth of adoption friction: Khanmigo reports 108M+ total interactions since 2023 launch but only 15% of students with access actively engage with it. Khan Academy's operational evolution (documented April 2026) shows four-phase maturation from intuition-based prompt engineering to production A/B testing, revealing technical and pedagogical improvements required for meaningful efficacy—yet these engineering advances have not translated to adoption breakthroughs. Named teacher at Hobart High School (featured in 60 Minutes) reports students "didn't really care for the bot" and found Khanmigo frustrating; the teacher no longer uses it in class.

Efficacy research remains strong in controlled trials but deployment reality shows critical friction. OECD's 2026 policy framework (April) reframes the landscape: "Benefits depend entirely on how AI is embedded in pedagogical design, not on whether students have access." Key finding: augmentation models (AI extends professional judgment) outperform substitution approaches, and Socratic AI tutoring shows medium effects in subject learning with substantial critical thinking gains—but this methodology fails for students lacking foundational knowledge. April 2026 systematic reviews of ITS effectiveness (28 studies, 4,597 K-12 students) confirm medium-to-large positive effects versus traditional instruction, but mixed results versus non-intelligent tutoring, with effectiveness contingent on pedagogical design and teacher guidance. Umbrella review (102 systematic reviews) identifies severe gaps: inadequate teacher training, insufficient student AI literacy, underexplored ethical risks, and theoretical research lagging behind technology deployment. Hidden failure modes now documented: April 2026 RCT (N=1,222) shows AI assistance paradoxically reduces persistence and impairs unassisted performance after just 10 minutes of use, directly undermining durable skill acquisition—practice gains of +48% reverse to -17% exam failure when tool removed. Cognitive harm research (April 2026) establishes this as foreseeable documented risk with neuroimaging evidence and institutional liability implications. Stanford study (April 2026) documents systematic bias in AI tutor feedback by race, gender, and ability—high-achieving/White students receive development-focused critique; Hispanic/ELL students receive grammar-only feedback. LATAM higher education reports 92% student AI engagement, but institutional scaling remains the bottleneck: University of Iowa pilot showed Khanmigo usage <1/week despite positive sentiment; 95% failure rate for GenAI pilots moving beyond proof-of-concept. The defining barrier is not efficacy but integration: teacher training burden, motivational friction (students must actively seek help from passive tutors), cognitive dependency risks, equity concentration among high-self-efficacy learners, and evidence that current AI systems lack mechanisms for metacognitive and social learning dimensions essential for durable skill development.

TIER HISTORY

ResearchMar-2023 → Jul-2023

Bleeding EdgeJul-2023 → Oct-2024

Leading EdgeOct-2024 → present

EVIDENCE (103)

Research Reveals Overreliance on AI May Undermine Problem-Solving Skills and Long-Term LearningNews Coverage2026-05-08

— Research from Carnegie Mellon, MIT, Oxford, UCLA shows 10 minutes of AI assistance impairs analytical thinking and problem-solving persistence; identifies cognitive dependency as foreseeable risk directly undermining durable skill development.

The Missing Evaluation Axis: What 10,000 Student Submissions Reveal About AI Tutor EffectivenessResearch Papers2026-05-07

— AIED 2026 peer-reviewed analysis of 10,235 code submissions from deployed AI tutors reveals engagement-based behavioral signals more strongly predict student learning than pedagogical quality alone; identifies substantial differences in tutor effectiveness in driving student action.

The Quiet Collapse of the AI Tutor DreamOpinion2026-05-07

— Practitioner documentation of classroom Khanmigo implementation: students skip guiding prompts, seek direct answers, avoid productive struggle; reveals behavioral adoption gaps and weak knowledge transfer despite system design for deliberate practice.

How Khan Academy Is Building a Better AI Tutor: Our Most Recent LearningsCase Studies2026-05-06

— 6-month A/B testing (Oct 2025–April 2026) on 1.35M+ tutoring threads shows: learning history +3.4%, prerequisite review +2.7%, conversation context +5.09% engagement; demonstrates iterative refinement driving measurable skill transfer gains.

Only 15% of Students Use Khanmigo, Khan Academy Reveals RedesignNews Coverage2026-05-05

— Critical deployment signal: Khan Academy's own Chief Learning Officer admits only 15% of students with Khanmigo access actually engage with it, triggering redesign from passive to proactive AI; reveals adoption gap despite 108M interactions.

Duolingo Q1 Earnings Call HighlightsAdoption Metrics2026-05-04

— Official metrics: 56.5M DAU (+21% YoY), 12.5M subscribers (+21%); product pivot to Speaking Adventures, Video Call, and spoken tokens demonstrates sustained deliberate practice investment; 20,500 AI-generated course units show content velocity for skill progression.

Can AI Replace the Human Tutor?Case Studies2026-04-30

— Harvard RCT (Kestin et al., Scientific Reports 2025) documents AI tutors outperforming active learning on engagement and motivation; emphasizes effectiveness contingent on pedagogical alignment and is primarily demonstrated in STEM-structured domains.

The Contingent Impact of Artificial Intelligence on Teaching Effectiveness: A Meta-Analytic ReviewResearch Papers2026-04-29

— Frontiers in Psychology meta-analysis of 72 studies shows AI-enabled teaching yields effect size g=0.586 but is fundamentally conditional on implementation type and application context; emphasizes importance of appropriate AI deployment over presence alone.

HISTORY

2023-H1: Duolingo launched Max subscription (GPT-4 powered Video Call and Roleplay features, Mar 2023) and reported 20.3M daily active users in Q1. Khanmigo early pilot in Newark (Jun 2023) showed accuracy issues including fabricated misinformation, highlighting deployment reliability risks. Tutor CoPilot RCT (1,800 underserved K-12 students) validated AI-augmented tutoring with 4pp mastery gains (9pp for lower-rated tutors). Research demonstrated AI dialogue improves physics misconception correction by 10.5pp. Corporate L&D adoption remained minimal: CIPD survey found only 5% of training professionals using AI for learning, signaling enterprise adoption barriers despite consumer platform growth.
2023-H2: Duolingo reached 21M+ daily active users with new AI-powered roleplay and explanation features (Nov 2023); LearnLingo launched conversational AI language tutor on HN (Jul 2023); critical assessment raised questions about whether gamification-heavy practice produces durable learning outcomes.
2024-Q1: Duolingo deployed GPT-4 for lesson personalization and conversation practice, cutting contractor costs while scaling AI capabilities (Jan 2024); controlled studies showed ChatGPT-supported dental education outperforming traditional research-based learning on exams; Khan Academy's Khanmigo reported improved student comprehension and engagement in mathematics and science; engineering education studies revealed AI strength in conceptual teaching but limitations in numerical problem-solving.
2024-Q2: Duolingo published four-step human-AI course creation methodology in production (Jun 2024); Khan Academy secured Microsoft partnership to scale Khanmigo for Teachers free to all U.S. educators (May 2024). Empirical research confirmed AI-generated evaluative feedback enhances skill acquisition in sequential tasks (PLOS ONE, May 2024). Teacher adoption survey (RAND, n=1,020) showed 33% overall AI adoption, 18% regular use for skill-building; Temple University faculty integrated AI for formative feedback with positive student outcomes. Algorithm aversion emerged as critical adoption barrier despite proven AI reliability (UCL, Jun 2024); pedagogical critiques raised concerns about AI tutoring mechanics.
2024-Q3: Khanmigo expanded to five additional U.S. states with state education department partnerships; Microsoft partnership extended free access to 49 countries via Azure OpenAI (Aug 2024). Empirical validation continued with quasi-experimental studies showing Duolingo's impact on learner engagement; randomized trials validated deliberate practice methodology with structured peer feedback in professional training (Sep 2024). Teacher sentiment toward AI in classrooms shifted more positive (Clever survey, Sep 2024), though critical gaps in inclusive implementation persisted. AI hallucinations in mathematics tutoring emerged as concrete reliability concern, with documented Khanmigo failures on algebra problems (Aug 2024), threatening adoption credibility among institutions.
2024-Q4: Khan Academy's large-scale efficacy study of ~350K students confirmed 30+ minutes weekly usage associated with ~20% greater learning gains; 266 U.S. school districts piloting Khanmigo with documented teacher/student success. Philippines Department of Education partnership enabled nationwide free AI access with government data infrastructure. Stanford RCT validated human-AI collaboration model (Tutor CoPilot), with AI-augmented tutoring improving mastery by 4-9pp. However, enterprise AI scaling challenges persisted (deployment declining from 55.5% to 47.4%; ROI realization declining from 56.7% to 47.3%), and educators highlighted core limitation of curriculum-aligned problem-solving in addressing learning motivation and meaningful context necessary for durable skill development.
2025-Q1: Khanmigo engineering improvements accelerated (math calculator integration, GPT-4 Omni upgrade for numerical reasoning, improved error detection) alongside named school deployments (Enid High School in Oklahoma with documented ELL/special education benefits). Duolingo launched Lily, AI video call feature for real-time conversational speaking practice with personality and adaptive difficulty. Empirical research strengthened guardrails argument: controlled study showed properly designed AI tutors achieved 127% mastery improvement vs unrestricted ChatGPT's 17% loss (Penn, 1,000 students). News coverage documented Tutor CoPilot and Class Companion deployments showing 4pp mastery gains in 1,800-student studies. Critical assessment persisted: educator Dan Meyer documented specific Khanmigo interaction failures and argued AI tutors fail to approximate human teaching, highlighting limitations in current design patterns.
2025-Q2: Khanmigo expanded Canvas LMS integration across universities (UNMC, Rutgers). Duolingo's AI course generation accelerated to 150 new courses in 12 months but triggered labor displacement concerns about human curriculum designer roles; Lily feature engineering documented guardrailed design patterns for conversational practice. Research validated AI dialogue's effectiveness at correcting misconceptions in learning contexts. Intensifying pedagogical critique questioned whether gamification and habit-formation mechanics in Duolingo actually drive language fluency or primarily psychological dependence on streaks. Labor displacement emerged alongside continued enterprise ROI challenges (47.3%) as dual headwinds to broader institutional acceleration.
2025-Q3: Duolingo achieved 51% year-over-year DAU growth to 47M daily active users with $1B revenue forecast, demonstrating production-scale success of AI-first strategy. Khanmigo student user base expanded from 40K (one year prior) to 700K with 1M+ expected by 2025-26 school year. However, adoption barriers intensified: University of Iowa pilot revealed <1/week faculty usage despite positive concept reception due to integration friction; teacher anxiety about AI errors persisted (42% of untrained users); 51% of professionals reported AI training as excessive burden. Equity concerns surfaced—critical assessment documented Khanmigo failing underrepresented students lacking self-efficacy, raising questions about whose skill development is actually supported. Survey data showed 60% of US K-12 teachers using AI weekly with documented time savings, but deployment economics and training gaps remained structural bottlenecks to scaled adoption.
2025-Q4: Rigorous efficacy research strengthened the case for AI tutoring: Harvard physics RCT (published Scientific Reports, Nov 2025) demonstrated pedagogically-designed AI tutors outperforming human instruction with effect sizes d=0.73-1.3 and 70% efficiency gain; systematic literature review of 48+ studies confirmed comparative effectiveness across global contexts; multi-university qualitative studies provided adoption insights. However, critical limitations emerged: analysis documented Duolingo failing to prepare learners for real-world conversational deployment—grammar drills do not transfer to authentic speech with natural slang and context-dependent patterns. Khanmigo reached 1M+ users with sustained Canvas LMS expansion. Adoption barriers persisted (training burden, institutional friction, equity gaps), and enterprise ROI remained flat, indicating efficacy validation has not yet translated into scaled institutional breakthrough.
2026-Jan: Duolingo's DET practice test demonstrated efficacy across 25,000 test-takers with improved confidence and performance through adaptive AI practice generation; Khanmigo continued geographic expansion into New Mexico districts with new pedagogical features. However, critical research from Anthropic revealed that heavy AI assistance impairs skill acquisition (17% decrease in library-specific mastery despite productivity gains), and economic surveys documented persistent adoption failures—56% of CEOs reported no effect from AI investments on costs or revenue. Infrastructure scaling continued but real-world benefit realization remained misaligned with deployment growth.
2026-Feb: Regional adoption accelerated with LATAM higher education reaching 92% student and 79% faculty AI engagement, while financial pressures mounted on core platforms: Duolingo experienced 23% stock correction due to growth deceleration (18-20% target) and margin compression, signaling economic challenges in AI-first strategy. Critical analysis documented narrow scope of AI tutoring (16% of learning dimensions) and systemic adoption barriers: industry synthesis found 95% GenAI pilot failure-to-scale rate and 80.3% overall AI project failure. SIAI research balanced efficacy gains (Nature-published AI tutor outperformance) against persistent inequality and error risks, indicating maturation beyond early-stage hype toward recognition of real-world adoption constraints.
2026-Mar: Pedagogical design mechanisms came into sharp focus. Khanmigo case study documented Socratic method pedagogy (guiding with questions rather than direct answers) driving 68K→700K user growth across 380+ US school districts with Harvard/Stanford RCT validation. Harvard physics RCT with 194 students showed AI tutoring learning gains >2x versus active learning when designed with active problem-solving-first and cognitive load management. University economics RCT (334 students) found unrestricted GPT-4 access raised exam performance 0.23 SD vs control, contradicting premature-reliance concerns when students adopted scaffolding strategies. Reinforcement learning-enhanced problem sequencing (Taipei high schools, 770 students) improved exam performance 0.15 SD. Systematic review of 2013-2025 literature documented AI adaptive systems yield g=0.50–0.70 effects moderated by study quality and professional development availability. Duolingo reached 135M monthly users (+40% revenue growth YoY, March 2026). Institutional scaling pressures remained: teacher training burden, integration friction, and equity gaps concentrating benefits among high-self-efficacy learners.
2026-Apr: Deployment reality diverged sharply from efficacy research. Khan Academy founder Sal Khan admitted Khanmigo became "a non-event" for most students, with adoption stalling below initial 2023 promises (Chalkbeat, April 9). Named teachers at flagship schools (Hobart HS) reduced usage, citing student frustration; teacher sentiment skewed toward administrator enthusiasm rather than classroom uptake. Systematic evidence of limitations emerged: (1) RCT (N=1,222) showed AI assistance reduces persistence and impairs unassisted performance after just 10 minutes, directly undermining durable skill acquisition (Liu et al., arXiv April 6); (2) University of Sheffield analysis documented AI tutors delivering errors with identical confidence as correct answers (45% error rate, indistinguishable from correct responses); (3) Stanford preprint found systematic bias in AI tutor feedback by race, gender, and ability—high-achieving/White students receive development-focused critique, Hispanic/ELL students receive grammar-only correction; (4) Umbrella review (102 systematic reviews, Huang et al., Elsevier April 11) identified critical gaps in teacher training, student AI literacy, ethical frameworks, and theoretical grounding. Pedagogical design showed measurable impact: comparative testing found Socratic method (Khanmigo, 23% learning gain) outperformed direct-answer tutors (ChatGPT, 16%), but effectiveness bounded by prerequisite knowledge—Socratic design fails when foundational knowledge absent. ITS meta-analysis (28 studies, 4,597 K-12 students, npj Science of Learning April 10) confirmed medium-to-large effects vs traditional instruction but mixed vs non-intelligent tutoring, with effectiveness contingent on teacher guidance and pedagogical design, not technology. Duolingo's April "AI-first" announcement triggered user backlash (24.5% positive vs 41.1% negative globally; users citing job displacement and quality concerns). Institutional scaling remained stalled: University of Iowa pilot showed <1/week Khanmigo usage despite positive reception; 95% GenAI pilot failure-to-scale rate persisted. Equity gaps intensified: critical assessment noted Socratic design ineffective for underrepresented students lacking pre-existing self-efficacy; neither platform prepared learners for real-world deployment (authentic conversation, transfer to novel contexts).
2026-May: Production refinement and behavioral evidence emerged. Khan Academy released detailed A/B testing results (May 6) from 1.35M+ tutoring sessions showing learning history (+3.4%), prerequisite skill review (+2.7%), and conversation context (+5.09% cognitive engagement) drive measurable skill transfer gains—indicating iterative engineering improves outcomes. However, critical redesign decision (May 5) publicly acknowledged 15% active engagement rate despite 108M total interactions, triggering shift from passive to proactive AI architecture. Duolingo Q1 earnings (May 4) showed 56.5M DAU (+21% YoY), 12.5M paid subscribers (+21%), with accelerated investment in Speaking Adventures, Video Call, and voice-first features (spoken tokens, realistic conversational scenarios)—demonstrating sustained deliberate practice commitment at scale. Peer-reviewed research (AIED 2026, May 7) analyzing 10,235 code submissions revealed engagement-based behavioral signals predict student learning better than pedagogical quality alone, identifying substantial tutor-to-tutor effectiveness variation in driving student action. Meta-analysis (Frontiers Psychology, Apr 29) of 72 studies confirmed AI-enabled teaching effect size g=0.586, contingent on implementation type and application context (not mere presence). Critical assessment intensified: research from MIT, Carnegie Mellon, Oxford, UCLA (May 8) documented 10 minutes of AI assistance paradoxically impairs problem-solving persistence and analytical thinking, with students relying on direct answers underperforming after tool removal. Practitioner documentation (May 7) of classroom implementation revealed students bypass guiding prompts for shortcuts, avoiding productive struggle—behavioral adoption gap persisting despite system design. Engineering playbookfrom vendor confirmed production ITS at 600K–1M+ students (Carnegie Learning MATHia, Khan Academy, Duolingo) requires five-layer architecture (learner model, curriculum, pedagogical strategy, interface, evaluation) to achieve effect sizes d=0.66–0.79; without guardrails, systems regress to unguarded chatbots. The picture stabilizes: efficacy gains remain real and replicable in controlled, pedagogically-aligned contexts; but real-world deployment is bottlenecked by behavioral adoption barriers, cognitive dependency risks (skill atrophy without scaffolding), training burdens, and unresolved equity gaps. Investment and infrastructure scale continuously, but adoption barriers are not primarily technical—they are behavioral, institutional, and pedagogical.

TOOLS

Duolingo Duolingo Max LearnLingo Khan Academy Khanmigo