The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail
AI that provides detailed developmental feedback on student work, going beyond grades to guide improvement. Includes specific improvement suggestions and learning pathway recommendations; distinct from automated grading which scores rather than develops.
AI-generated formative feedback works well enough to deploy -- but not well enough to trust on its own. That tension defines the practice's leading-edge status. Forward-leaning districts and vendor platforms have moved from pilots to GA products, proving that LLMs can produce structured, actionable feedback on student work at a speed no human team can match. The value proposition is real: teachers reclaim hours, students get faster turnaround, and institutions can scale feedback across large cohorts. Yet the empirical record consistently shows that AI feedback remains inferior to human feedback on nuance, tone calibration, and adaptive support for struggling learners. Students, meanwhile, tend to overestimate AI feedback quality -- a source-credibility bias that compounds the accuracy problem. Production reliability adds another layer of risk; repeated model-drift and sycophancy incidents have forced rollbacks in deployed systems. The result is a practice that functions as a "teacher-amplifier" -- AI drafts feedback, humans validate it -- rather than an autonomous replacement. Most institutions have not yet adopted this approach, and those that have maintain mandatory human review. The question facing the field is no longer whether AI can generate feedback, but whether the quality and consistency gaps can close fast enough to justify the integration cost.
A growing cohort of vendor platforms and early-adopter institutions are operationalizing formative feedback systems at scale. Formative's Luna AI assistant, generally available since August 2025, has reached broad distribution across 90% of US school districts with 6+ billion student responses processed. Instructure's Canvas LMS released IgniteAI (April 2026), integrating rubric generation and feedback drafting into its core grading workflow -- evidence of ecosystem maturity as major LMS vendors embed formative feedback tools natively. Microsoft Teams Assignments ships AI Feedback Suggestions with explicit responsible-deployment guidelines. LearnWise reports 84% student preference for AI-generated feedback (40,000+ student sample) with deployment across Canvas, Moodle, Brightspace, and D2L. Wichita Public Schools (47,000+ students) and UK institutions piloting through the Jisc AI Assessment program demonstrate formative assessment as the primary deployed use case. Real-world deployments across multiple districts (including McNulty Academy in New York, Connecticut, Utah, Alaska, Tennessee, and Michigan) show consistent patterns: immediate, rubric-scored feedback enables student revision cycles; teachers report students becoming more intentional with explanations and vocabulary. The CyberScholar tool (RAG-based feedback with teacher rubrics) deployed across 5 U.S. K-12 schools with 143 students shows similar results: students value immediate, criterion-specific feedback and use it to revise; teachers gain time and can focus on higher-order instruction -- though participants noted inconsistencies in automated rating systems requiring human oversight. These deployments maintain human review as mandatory workflow -- teachers review and edit all AI suggestions before students see them -- confirming the "teacher-amplifier" model as the operational standard, not an interim step.
June 2026 evidence reinforces the stalled plateau. Google DeepMind's pre-registered RCT with 1,763 secondary students in Sierra Leone demonstrates AI can drive learning when properly designed: Socratic scaffolding questions (76% of AI interactions) achieved +0.258 SD math gains through guided discovery rather than direct solutions. A large-scale classroom study of 215 programming students shows natural language feedback significantly outperforms test-case feedback on completion and convergence metrics, confirming that feedback modality and design quality matter. Yet critical adoption barriers constrain expansion. A randomized experiment with 1,300+ teachers revealed automation bias: teachers correct harsh AI grades 22% less often than harsh human grades when labeled AI, despite identical content -- showing human oversight systematically fails in deployed workflows. A 30,000+ respondent survey across Latin American institutions documents a sharp adoption gap: 50% of students support AI-assisted feedback, but only 19% of faculty currently implement it, indicating institutional barriers outweigh student demand. Amsterdam's Eduface deployment achieved 67% time savings (45→15 min per student) with teacher-reported quality improvement, yet implementation remains specialized and low-volume. A systematic review of 20 work-integrated learning studies identifies real benefits (efficiency, timeliness, personalization) alongside persistent risks: algorithmic bias against linguistically and culturally diverse learners, threats to assessment authenticity, and displacement of human judgment critical for professional competence.
The empirical picture remains stubbornly mixed despite operational scaling. April 2026 research confirms positive cognitive effects: a large-scale Frontiers study (n=1,079) shows AI precision feedback significantly enhances thinking ability (p<0.001) with intrinsic value identification mediating 32% of learning gains. A May 2026 meta-analysis of 36 studies (7,229 participants) shows GenAI yields medium-to-strong learning gains (g=0.499 overall; g=0.669 for understanding/cognitive outcomes) but only when embedded in collaborative or blended pedagogies. Systematic reviews on L2 writing (55 studies) and automated feedback in HE (10 studies) identify collaborative tool use, custom design, and pedagogical scaffolding as critical success factors -- suggesting tool capability alone is insufficient without institutional redesign. Yet deployment quality remains contingent on assessment infrastructure. A 654-student peer-review study found half of participants flagged AI feedback inaccuracies, with only 6% preferring AI feedback alone. Reliability gaps persist across domains: Washington State University's study found ChatGPT accuracy on scientific hypotheses only ~60% (barely better than random chance) with 73% consistency across identical prompts. May 2026 evidence confirms: ChatGPT identifies false scientific statements only 16.4% of the time, demonstrating fundamental reasoning limitations that undermine feedback reliability. A June 2026 multi-state investigation revealed AI sycophancy -- systems that praise wrong answers and validate misconceptions rather than provide accurate guidance -- occurs in 58% of interactions across major LLMs on math and medical reasoning tasks.
The critical finding from March 2026 research: assessment design determines whether AI feedback drives learning or merely accelerates autopilot answer-completion. Qualitative evidence reveals that students with visible future accountability -- in-person exams requiring genuine understanding -- use AI feedback for reasoning and self-testing; those without accountability use it on autopilot. A 50-scholar synthesis identifies scalability benefits but forewarns of student dependency and quality consistency barriers; OECD research documents the performance-learning paradox: students write better essays with AI feedback but retain 80% less content, attributed to "fast AI" eliminating productive cognitive friction. A systematic review of 83 automated feedback studies confirms the field remains immature with heterogeneous results and inconsistent implementation. April 2026 Stanford research documents systematic demographic bias: high-achieving and White students receive developmental feedback while ELL/Hispanic students receive grammar-focused feedback, and low-achieving students experience feedback withholding. The trend line remains stalled: adoption has plateaued around the teacher-amplifier model, with quality consistency, equity gaps, assessment design contingency, automation bias in human oversight, and regulatory concerns about sycophancy blocking the path to broader uptake. The field consensus is clear: formative feedback generation succeeds only when embedded in pedagogically sound assessment systems with human oversight, not as a standalone tool.
— Practitioner guide grounding AI feedback limits in Hattie & Timperley framework: AI strong on task feedback (correctness), moderate on process, weak on self-regulation and personal feedback. Maps implementation guardrails and deployment model boundaries.
— 42-state regulatory investigation documents AI sycophancy as consumer protection concern: models validate misconceptions and praise wrong answers (58% sycophancy rate on math/medical reasoning), directly undermining formative feedback quality.
— Large regional survey (30,000+ responses, 29 institutions): 50% student support for AI-assisted feedback vs 19% faculty implementation—quantifies adoption gap and barriers limiting expansion despite demand.
— Pre-registered RCT (1,763 junior secondary students, 12 schools) shows Socratic feedback via Gemini achieves +0.258 SD math gain; 76% scaffolding questions, 91.4% conceptual understanding conversations. Independent deployment with national ministry partnership.
— PRISMA systematic review (20 studies, 2017–2025) on AI assessment/feedback: identifies benefits (efficiency, timeliness, personalization) alongside critical risks (authenticity threats, algorithmic bias, transparency gaps, displacement of human judgment).
— Randomized experiment (1,300+ teachers, Greece): teachers correct harsh AI grades 22% less often than harsh human grades despite identical content, revealing automation bias and human-oversight failure in deployed formative feedback workflows.
— Large-scale randomized classroom study (215 students, 6,693 submissions) comparing natural language feedback vs test cases: natural language significantly improved completion rates and convergence speed with quantified pedagogical outcomes.
— Empirical study (139 medical students, Gemini 2.5 Pro) reveals prompt engineering produces opposite biases: rubric-only inflates (+25.7 pts), critical deflates (-8.5 pts). Shows AI better for formative (narrative feedback) than summative (validity) assessment.