Perly Consulting │ Beck Eco

The State of Play

A living index of AI adoption across industries — where established practice meets the bleeding edge
UPDATED DAILY

The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.

The Daily Dispatch

A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.

Pick a role above to explore practices

BLEEDING EDGE

⌨️ SOFTWARE ENGINEERING
✍️ CONTENT & MARKETING
🔬 RESEARCH & KNOWLEDGE
⚖️ LEGAL, COMPLIANCE & RISK
🎧 CUSTOMER OPERATIONS
🏛️ AI GOVERNANCE & SAFETY
📊 DATA & ANALYTICS
🛡️ IT OPERATIONS & SECURITY
🎯 PRODUCT & DESIGN
💼 SALES & REVENUE
🎬 CREATIVE & GENERATIVE MEDIA
👁️ COMPUTER VISION & SENSING
💹 FINANCE & ACCOUNTING
🔄 OPERATIONS & PROCESS AUTOMATION
🚗 AUTONOMOUS SYSTEMS & VEHICLES
🦾 PHYSICAL AI & ROBOTICS
🎓 EDUCATION & LEARNING
PERSONAL EFFECTIVENESS

LEADING EDGE

⌨️ SOFTWARE ENGINEERING
✍️ CONTENT & MARKETING
🔬 RESEARCH & KNOWLEDGE
⚖️ LEGAL, COMPLIANCE & RISK
🎧 CUSTOMER OPERATIONS
🏛️ AI GOVERNANCE & SAFETY
📊 DATA & ANALYTICS
🛡️ IT OPERATIONS & SECURITY
🎯 PRODUCT & DESIGN
💼 SALES & REVENUE
🎬 CREATIVE & GENERATIVE MEDIA
👁️ COMPUTER VISION & SENSING
💹 FINANCE & ACCOUNTING
🔄 OPERATIONS & PROCESS AUTOMATION
👥 PEOPLE & TALENT
🚗 AUTONOMOUS SYSTEMS & VEHICLES
🦾 PHYSICAL AI & ROBOTICS
🎓 EDUCATION & LEARNING
PERSONAL EFFECTIVENESS

GOOD PRACTICE

⌨️ SOFTWARE ENGINEERING
✍️ CONTENT & MARKETING
🔬 RESEARCH & KNOWLEDGE
⚖️ LEGAL, COMPLIANCE & RISK
🎧 CUSTOMER OPERATIONS
🏛️ AI GOVERNANCE & SAFETY
📊 DATA & ANALYTICS
🛡️ IT OPERATIONS & SECURITY
🎯 PRODUCT & DESIGN
💼 SALES & REVENUE
🎬 CREATIVE & GENERATIVE MEDIA
👁️ COMPUTER VISION & SENSING
💹 FINANCE & ACCOUNTING
🔄 OPERATIONS & PROCESS AUTOMATION
👥 PEOPLE & TALENT
🚗 AUTONOMOUS SYSTEMS & VEHICLES
🦾 PHYSICAL AI & ROBOTICS
🎓 EDUCATION & LEARNING
PERSONAL EFFECTIVENESS

ESTABLISHED

⌨️ SOFTWARE ENGINEERING
✍️ CONTENT & MARKETING
🛡️ IT OPERATIONS & SECURITY
🎯 PRODUCT & DESIGN
💹 FINANCE & ACCOUNTING
👥 PEOPLE & TALENT

🎓 Education & Learning

AI for teaching, tutoring, assessing, and managing learning experiences. Mostly leading-edge: adaptive tutoring and automated grading are approaching good practice, but institutional adoption is slow due to academic integrity concerns and uneven infrastructure. Three practices are bleeding-edge, including AI-generated curricula and autonomous classroom agents. Most trajectories are stalled — policy and pedagogy lag behind the technology.

15 practices: 2 good practice, 11 leading edge, 2 bleeding edge

Where AI Stands in Education & Learning

Three and a half years after ChatGPT arrived in classrooms, the defining feature of AI in education is no longer capability — it is the widening gap between what the technology can demonstrably do and what institutions can actually make it deliver. The research base has matured to the point of near-consensus on the core question: carefully designed AI tutoring, with pedagogical guardrails and human oversight, produces real learning gains. A Harvard meta-analysis spanning 34 studies now puts the effect size at 0.795; a pre-registered randomised trial of Google's Gemini Guided Learning across 12 schools and 1,763 students recorded math gains of 0.258 standard deviations — the equivalent of 1.2 to 1.7 years of progress compressed into eight weeks. Bloom's celebrated "two-sigma problem" — the long-standing finding that one-to-one tutoring beats classroom instruction by two standard deviations — looks, for the first time, computationally tractable. That is the optimistic reading, and it is genuine.

The pessimistic reading sits directly alongside it, drawn from the same evidence base. The benefit is conditional on design, and the conditions are routinely violated in the wild. The same studies that validate scaffolded Socratic tutors also document that unrestricted access to a generic chatbot produces 17% worse exam performance than no tool at all. A World Bank analysis of a 2.5-year Chinese longitudinal study found that unstructured AI use for homework dropped exam scores by 1.4 standard deviations — four times larger, in the opposite direction, than the benefit careful adaptive systems confer. Stanford's SCALE Initiative, observing two real K-12 districts, found that students used scheduled AI tutors for 2 to 5 minutes a week against a 30-minute platform recommendation, with barely half engaging at all despite dedicated time. Khan Academy's founder has publicly described Khanmigo's reception among many students as "a non-event." The lesson of two decades of education technology, restated by Brookings this cycle, holds: the tool is never the intervention. Engagement infrastructure, teacher mediation, and rigorous measurement are.

This produces a domain that is institutionally busy and pedagogically cautious in equal measure. The clearest momentum is in the unglamorous operational layer — admissions automation, enrolment chatbots, transcript processing, accessibility captioning — where ROI is measurable and the stakes of error are low. The hardest-fought ground is anywhere AI touches a consequential judgement: grading essays, screening applicants, detecting cheating, replacing a human tutor. There the line holds, and increasingly it is being held not by technologists but by an organising profession. The American Federation of Teachers (1.7 million members) and the New York State United Teachers both formalised age-staged restriction frameworks this cycle. The question that will decide the next two years is not whether AI can teach — it can, under conditions — but whether the surrounding system can be engineered to reproduce those conditions at scale, and whether it will be permitted to.

What's New, 2026-05-20 to 2026-06-19

No tier or trend shifted this cycle, and that stability is itself the story: the domain has settled into a well-understood shape, and the new evidence deepens rather than disrupts it. The most consequential movement is the hardening of organised professional resistance. The American Federation of Teachers formalised a 10-point framework on 27 May — a K-2 screen ban, teacher-mediated AI only through grade 5, student-facing AI confined to grades 9–12, plus a proposed $5 billion levy on large technology firms — grounded explicitly in developmental psychology. Days later, on 1 June, NYSUT passed its own resolution backing age-staged limits, including a ban on social-companion chatbots for under-16s. Two of the largest teaching unions in the United States now hold a coordinated line against unrestricted early-childhood deployment. Against that, governance is also formalising on the enabling side: Maryland's Artificial Intelligence Ready Schools Act took effect on 1 June, mandating an AI-literacy curriculum statewide by mid-2027, and Indonesia, India, and Italy all advanced national AI-in-education frameworks this cycle.

The research wave reinforced the central paradox. On the upside, ETH Zurich's full-semester bioTutor deployment (400-plus students, 10,000-plus interactions) showed Socratic design driving deep-understanding gains above 77%, and a large interrater-reliability study across 13 institutions and 7,406 scored responses found AI grading agreement matching human-to-human agreement. On the downside, plagiarism detection's credibility crisis sharpened further — a Springer study confirmed Turnitin fails below 5–10% AI content and is defeated outright by a single paraphrase, while named institutions including UC Berkeley, Vanderbilt, and Johns Hopkins have now disabled detection. A 42-state regulatory investigation flagged AI "sycophancy" — models praising wrong answers, with a 58% sycophancy rate on math and medical reasoning — as a consumer-protection concern that directly undermines formative feedback. And the UK government's own contracting language conceded that AI tutoring tools remain "limited in quantity, scope and evidence base" even as it plans to roll them out to 450,000 disadvantaged students by the end of 2026.

Key Tensions

  • The design-conditional benefit. AI tutoring works when scaffolded and fails — measurably, in the wrong direction — when it is not. A Wharton trial recorded 0.15 SD gains under pedagogical guardrails; the same access pattern unrestricted produced learning loss, and a peer-reviewed study of 1,498 undergraduates quantified an "AI-Learning Gap" of 2.07 standard deviations between AI-assisted output quality and actual independent mastery. The technology is not the intervention; the constraint regime around it is. This makes deployment quality, not model capability, the binding variable — and quality is precisely what is hardest to guarantee across thousands of classrooms.

  • Access is not engagement is not outcomes. The decisive failure mode this cycle was not bad AI but unused AI. Stanford's SCALE study found 2-to-5-minute weekly usage against a 30-minute recommendation and barely 50–61% engagement despite scheduled time; Khanmigo reaches over a million students yet only 15% engage regularly. Procurement and rollout announcements systematically overstate impact because they measure availability, not adoption — a gap that hands governments and districts a flattering metric and a disappointing result.

  • Organised labour as the binding constraint. The most durable limit on adoption is no longer technical or regulatory but professional. With the AFT and NYSUT now aligned on age-staged restrictions and developmental-psychology-grounded screen bans, the political economy of early-childhood and assessment automation has shifted. Teacher mediation is being written into policy as a requirement, not a recommendation — which is also what the evidence says the technology needs to work, making this an unusually load-bearing alliance between caution and efficacy.

  • Detection has structurally failed, but enforcement persists. AI-content detection is the clearest case of a practice that does not work being used anyway. False-positive rates reach 40–61% for non-native English writers (Stanford), a single paraphrase defeats Turnitin entirely, and courts (Newby v. Adelphi) have ruled detection scores "probabilistic guesses." Yet 60%-plus of higher-education institutions still deploy these tools, AI-related misconduct now fills 35% of disciplinary caseloads at major universities, and the harm — false accusations, an evasion market on Taobao — accrues to the most vulnerable students. The exit has begun but lags far behind the evidence.

  • The deployment chasm is organisational, not technical. Across grading, feedback, and skills assessment, the recurring finding is that capability has outrun institutional readiness. Only 4% of teachers use AI for grading despite 80% using AI broadly; only 5% of enterprises report measurable P&L impact from AI upskilling; a McKinsey survey found 88% deploying AI but 86% lacking operational readiness. The frontier of difficulty has moved decisively from "can the model do it" to "can the institution govern, validate, and trust it" — and that frontier moves at the speed of change management, not compute.

Top 10 Evidence Items

  1. Measuring the impact of learning with AI in Sierra Leone and beyond (case-study) — The pre-registered RCT anchoring the summary's "optimistic reading": 1,763 students, +0.258 SD math gains in 8 weeks, with 76% of interactions being scaffolding questions rather than direct answers — the design constraint that made it work. https://deepmind.google/blog/measuring-the-impact-of-learning-with-ai-in-sierra-leone-and-beyond/

  2. A Warning Shot for Human Capital: Evidence of an AI Learning Penalty (opinion) — The World Bank's analysis of a 2.5-year Chinese longitudinal study provides the sharpest quantification of the central paradox: unstructured AI homework use produced a 1.4 SD drop in exam scores, four times larger in the wrong direction than the benefit of well-designed adaptive systems. https://blogs.worldbank.org/en/investinpeople/a-warning-shot-for-human-capital--evidence-of-an-ai-learning-pen

  3. AI tutor access alone doesn't equate to student gains, study says (news-coverage) — The Stanford SCALE study is the primary evidence for the "access is not engagement is not outcomes" tension: 2–5 minutes of weekly usage against a 30-minute recommendation, barely 53–61% engagement despite scheduled time and dedicated deployment. https://www.k12dive.com/news/ai-tutor-access-alone-doesnt-equate-to-student-gains-study-says/823214/

  4. AI-assisted learning and the illusion of competence (research-paper) — Introduces the "AI-Learning Gap" construct (N=1,498, 2.07 SD between output quality and independent mastery) that gives precise empirical form to the summary's claim that the constraint regime around AI matters more than model capability. https://ojed.org/jise/article/view/10832

  5. Rethinking Scaffolding in LLM Tutors: The Interactional Mismatch Between Benchmarks and Real-World Deployments (research-paper) — Analysis of 9,490 real deployment conversations showing students systematically bypass Socratic scaffolding; benchmarks assume uptake that does not occur in practice — the mechanism behind why real-world results consistently underperform RCT results. https://arxiv.org/abs/2606.15766

  6. New study: How are combinations of human-written words and LLM-generated words detected by Turnitin? (research-paper) — The Springer peer-reviewed confirmation that Turnitin fails below 5–10% AI content and is defeated by a single paraphrase pass, grounding the summary's claim that "detection has structurally failed" in fresh experimental evidence rather than prior-cycle findings. https://maricruzgarciavallejo.substack.com/p/new-study-how-are-combinations-of

  7. ChatGPT Faces 42-State Probe: Sycophancy Design Flaw Named in Subpoena (news-coverage) — A 42-state attorney-general investigation naming AI sycophancy (58% rate on math and medical reasoning) as a consumer-protection concern directly connects the generic chatbot problem to the formative feedback failure mode the summary describes. https://www.techtimes.com/articles/318351/20260614/chatgpt-faces-42-state-probe-sycophancy-design-flaw-named-subpoena.htm

  8. 'Limited' evidence for AI tutoring tools, government admits (news-coverage) — The UK government's own assessment language — "limited quantity, scope and evidence base, with few providing full tutoring capacity" — appearing in contracts for a 450,000-student rollout is the clearest illustration of the gap between institutional momentum and evidentiary support. https://www.tes.com/magazine/news/general/limited-evidence-ai-tutoring-tools-government-admits?amp

  9. Scoring Students' Critical Thinking at Scale (research-paper) — The AACSB interrater reliability study across 13 institutions and 7,406 scored responses, showing AI agreement matching human-to-human agreement, is the strongest current evidence for the positive case on automated grading — and the contrast with the 4% teacher adoption rate underscores the deployment chasm the summary identifies. https://www.aacsb.edu/insights/articles/2026/06/scoring-students-critical-thinking-at-scale

  10. Teachers are using AI. Just not for instruction (adoption-metric) — The NPR/Ipsos poll finding that only 23% of teachers use AI for classroom instruction (versus 54% for admin) crystallises the domain's defining asymmetry: the operational layer is winning while the pedagogical layer stalls, which is precisely the summary's framing of where the clearest momentum lies. https://districtadministration.com/article/teachers-are-using-ai-just-not-for-instruction/