The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail
AI tools that detect plagiarism and identify AI-generated content in student submissions. Includes text similarity matching and AI writing detection; distinct from content authentication in creative media which verifies media provenance rather than academic integrity.
AI content detection remains an experimental capability caught between commercial momentum and documented unreliability. Vendors have scaled aggressively -- the education detection market reached an estimated $520 million -- yet the tools carry risks that forward-leaning institutions increasingly find unacceptable. Independent testing consistently shows detection accuracy collapsing from 99% on raw AI text to 70-80% after basic paraphrasing, while false positive rates for non-native English speakers run two to three times higher than for native writers. This equity gap has driven a growing cohort of research universities to formally discontinue detection altogether. The core tension is structural: human and AI-generated text may share enough statistical overlap that reliable, high-stakes detection is not achievable with current approaches. Institutions that continue deploying these tools do so largely out of administrative inertia, not demonstrated confidence in outcomes. The practice sits firmly in experimental territory -- commercially active but technically unproven for the academic sanctioning decisions it is being asked to support.
The vendor ecosystem continues to scale despite mounting institutional pushback. Turnitin, Copyleaks, and GPTZero maintain deep LMS integrations, and Copyleaks released its V9 detector with support for GPT-4o, Gemini, and Claude outputs. Turnitin added an "AI bypasser detection" feature targeting humanizer tools. These are arms-race responses, not capability breakthroughs -- and the evasion side is winning on economics. Six AI humanizer Custom GPTs in ChatGPT recorded 86,000+ user engagements, offering detection evasion at $5/month versus institutional subscriptions running $50-300.
Institutional rejections have accelerated. University of Western Australia committed 350 staff to assessment redesign and ran 98,000 invigilated exams rather than rely on detection. San Francisco State discontinued Turnitin, citing both cost and reliability. They join Waterloo, Vanderbilt, UBC, and the University of Iowa in formal policy reversals. Vanderbilt's arithmetic is instructive: even a 1% false positive rate would wrongly accuse 750 of its 75,000 students.
Independent testing underscores the brittleness. A February 2026 comparison of eight detectors found the best performer (Originality.ai) at 89% accuracy with 11% false positives; false positive rates jumped to 40% for non-native English writing. Stanford research documented a 61.3% false positive rate on TOEFL essays. Meanwhile, California State University continues spending $1.1 million annually on Turnitin despite a CalMatters investigation describing the tools as offering "only a shadow of accurate detection." The gap between vendor claims and field-tested reality remains the defining feature of this market.
By April 2026, institutional strategy has shifted decisively. Turnitin's Learning Integrity Insights Report shows 60% of customers now prioritize transparency over automated flagging, signaling a move away from binary detection toward pedagogical integration. Independent benchmarking (Digital Applied, Feb-Mar 2026) confirms no tool exceeds 85% accuracy, with detectors missing 15-30% of AI-generated content and false positive rates rising to 12% or higher for non-native English speakers. Major institutions including Australian Catholic University have formally abandoned detection tools after documenting high dismissal rates for flagged cases. The market bifurcation visible since 2024 has solidified: leading research universities employ assessment redesign and human review; others continue tool reliance out of administrative inertia. Deployment remains broad (60%+ of higher education institutions maintain detection systems), but institutional confidence has eroded to a level where detection serves primarily as a triage signal, not dispositive evidence.
— University of Texas at Austin bans all third-party AI detection software; emphasizes course design and assessment redesign over detection; cites student IP and instructor liability concerns.
— University of Florida misconduct cases surged from 0 (2021-23) to 66 (Spring 2025); UF director acknowledges even best detectors carry 4% false positive rate, creating fairness risks.
— Gartner survey of 2,500 higher ed IT decision-makers shows 18-24% of AI budgets allocated to assessment/detection tools; AI-assessment market growing 28% CAGR; Turnitin processes 200M+ annually.
— Russell et al. (2025) empirical study: expert humans detect AI at 92.7% accuracy with 4% FPR; commercial detectors collapse on humanized text (Binoculars 6.7%); humans with AI experience outperform automated tools.
— Comprehensive 2026 analysis shows detector accuracy has plateaued since 2023 with no improvement; documents vendor-claims gap (Turnitin claims 98%, independent audits find 4-50%+ FPR) and institutional reversals.
— News aggregation documents institutional abandonment of Turnitin detection across multiple universities (Curtin, Vanderbilt, UCLA, Yale, Johns Hopkins, Northwestern) citing false positives and bias.
— Independent empirical testing shows ZeroGPT 23% false positive and 18% false negative rates; paraphrasing defeats detectors trivially; documents structural brittleness as LLMs improve.
— Named institution (University of Sydney) integrates Turnitin detection as decision-support tool only—not dispositive evidence—within transparency-based integrity framework.
2023-H1: ChatGPT adoption drives urgent institutional demand for AI detection tools. Turnitin scales to 65M papers analyzed by June; 43% of students using AI tools. Independent research confirms detectors have fundamental reliability limits: paraphrasing defeats detection, false positive rates are 4-10%, and distributions of human vs. AI text may be inherently indistinguishable. Institutional responses diverge-some adopt tools despite caveats, others formally reject them as unreliable.
2023-H2: Detection ecosystem expands despite mounting evidence of unreliability. Turnitin reaches 76M papers by July; D2L integrates Copyleaks into Brightspace. OpenAI withdraws its own detector (26% true positives, 9% false positives) in July. Independent institutional testing and multi-course research confirm post-editing defeats detectors. Market continues scaling deployment despite technical ceiling; institutional responses diverge between policy-based approaches and continued tool reliance.
2024-Q1: Deployment scale continues as Turnitin reaches 200M papers analyzed by March; new products emerge (Turnitin paraphrasing detection, Binoculars detector) promising improved accuracy. Simultaneously, independent investigations document tools offering a "false sense of accuracy" and produce harmful false positives; educator experiences and institutional assessments conclude detection tools are fundamentally ineffective. Student AI usage (59% regular users) continues outpacing educator adoption and tool capability.
2024-Q3: Turnitin launches paraphrasing detection feature (July); deployment metrics hold at 200M+ papers reviewed with 3% flagged as 80%+ AI-written. Peer-reviewed research presented at IDSTA 2024 (September) documents that detection tools fail entirely on paraphrased ChatGPT text, confirming adversarial vulnerability. Major institutions (UBC, University of Iowa) formally reject tool deployment in favor of pedagogical approaches, citing research showing no tool exceeds 80% accuracy and citing evidence of false positive harms. Market fragmentation deepens: vendors continue scaling despite institutional policy reversals and academic validation of fundamental unreliability.
2024-Q4: Vendor infrastructure scaling continues: AWS case study (November) documents Turnitin processing 2M papers daily; Canisius University replaces Turnitin with Copyleaks via D2L (November); Kritik integrates GPTZero (December). Yet independent testing (University of Adelaide, October) confirms Copyleaks at 85.2% detection but all tools fail on paraphrased text. Critical analysis documents Turnitin accuracy drops to 31% and 0% after Quillbot/humanization tools; equity analysis shows non-native English speakers face 2-3x higher false positive rates. Market remains bifurcated: vendors scaling through ecosystem integration despite institutional policy reversals and evidence that detection offers only administrative inertia, not pedagogical value.
2025-Q1: Professor adoption peaks (65% use detection tools, up from 30% two years prior) even as evidence harddens against tools. Peer-reviewed testing (JALT, January) shows Turnitin's claimed 100% accuracy masks inconsistencies; vendor analysis documents evasion tactics already defeating detection (layered rewriting, hybrid drafting, translation). By March, independent testing exposes Copyleaks at 30% misclassification; critical analysis shows agentic AI systems render detection obsolete. Equity barriers harden with documented false accusations at multiple universities and non-native speakers facing 2-3x higher false positive rates. Institutional adoption remains driven by inertia, not confidence.
2025-Q2: Major institutional rejections accelerate: University of Waterloo discontinues Turnitin AI detection effective September 2025; joins Vanderbilt, UBC, and University of Iowa in formal policy reversals citing research showing no detector exceeds 80% accuracy. Paradoxically, market infrastructure scaling continues: global detection market at $1.79B with education at $520M; 48% of top 100 universities have integrated detection into learning platforms. Vendor partnerships deepen (Turnitin-GPTZero integration, Copyleaks analyst recognition). Independent testing continues documenting flaws: GPTZero effective on pure AI (91-100%) but unreliable on human texts; dental research shows human experts outperform automated tools. Investigative reporting exposes institutional spending despite failures: CSU $163K annually on Turnitin detection. Equity gaps persist: Stanford study found 61.2% false positives on TOEFL essays; non-native speakers face 2-3x higher false positive rates. Institutional adoption reflects inertia—institutions with early deployments continue use despite evidence and known harm.
2025-Q3: Vendors escalate product development: Copyleaks releases AI Detector V9 (August) with support for GPT-4o, Gemini, and Claude; Turnitin launches "AI bypasser detection" feature (September) targeting humanizer tools. University of Waterloo formally discontinues Turnitin AI detection effective September 2025. Peer-reviewed research (August 2025, Acta Neurochirurgica) testing 1,000 texts finds detectors achieve AUCs 0.75-1.00 but fail to reach 100% reliability, with documented false positives. Los Angeles Times experiment (September 2025) reveals critical detector inconsistencies: human text flagged as AI, AI content missed, paraphrased text misclassified. Adoption rises to 68% of teachers using detectors (up from ~38% prior year) and 40% of four-year colleges, but adoption reflects institutional inertia rather than confidence. Equity barriers harden: non-native speakers face 2-3x higher false positive rates. Vendors engage in product arms race responding to evasion, but fundamental accuracy and bias problems remain unresolved.
2025-Q4: Real-world harm becomes quantifiable: Guardian investigation documents nearly 7,000 confirmed student cases caught using AI tools in 2023-24 (5.1 per 1,000 students), but parallel growth in false-accusation reports emerges as detector deployment deepens. Institutional spending continues despite documented failures: CSU maintains $1.1M annual Turnitin spending for "shadow of accurate detection"; College of Canyons $47K annually. By December, detector adoption reaches 86% of students globally using AI tools while 68% of teachers deploy detectors. Peer-reviewed evidence continues documenting fundamental brittleness: paraphrasing reduces Skyline Academic analysis shows detection drops from 74% to 26% with paraphrased AI text; vendor claims of 98-99% accuracy masked by documented failures on mixed human-AI content. Equity harm documented: non-native English speakers face 2-3x higher false positive rates; Vanderbilt's prior analysis estimated 750 students at wrongful accusation risk; Stanford study found 61.2% false positives on TOEFL essays. Market continues scaling ($1.79B global, $520M education) despite institutional reversals by Waterloo, Vanderbilt, UBC, Iowa—bifurcation hardens as vendor investment and institutional inertia sustain deployment despite growing evidence that detection offers administrative necessity rather than pedagogical value.
2026-Jan: Comparative testing in early 2026 reveals persistent tool differentiation: Copyleaks reaches 100% accuracy on human text in independent testing (95% post-paraphrasing), while Turnitin maintains 2-5% false positives on ESL submissions and misclassifies human academic work. Documented harm cases emerge: University of North Georgia student lost scholarship on false detection flag. Meta-analysis of peer-reviewed studies confirms systemic failures—Stanford 61.3% false positive rate on TOEFL essays, all 14 tools tested below 80% accuracy, Perkins average 39.5% (17.4% post-editing). Evidence across tools documents racial disparities (20% Black vs. 7% white students falsely accused) and deployment reality via LMS integrations (Canvas, Brightspace) showing field-wide reliance despite known limitations.
2026-Feb: Institutional rejections accelerate: University of Western Australia and San Francisco State University formally discontinue AI detection tools, citing unreliability, cost, and equity concerns, joining Waterloo, Vanderbilt, UBC, Iowa. Simultaneously, independent testing shows persistent brittleness: Humaneer test finds false positives 40% for ESL writers, Paper Checker documents detection collapse from 99% (raw AI) to 70-80% (paraphrased). Evasion ecosystem surge: 86,000+ user engagements on low-cost ChatGPT humanizer tools undermine institutional detection utility. Market bifurcation hardens—leading research institutions rejecting tools while vendors scale infrastructure through LMS partnerships and product iteration against evasion techniques.
2026-Mar: Vendor and institutional strategy shifts accelerate: Turnitin CPO (BETT 2026 keynote) announces detection-only era ending, reframes toward process transparency and learning integrity over product-based flagging. CSU continues institutional spending ($163K for AI detection add-on, 2025) despite documented unreliability. Student harm quantified: UK survey (n=2,373) finds 75% of AI users report stress over false positive accusations; 52% cite fear of wrongful charges. Toronto Metropolitan University documents 30% of integrity consultations involving AI charges; Academic Integrity Office confirms detector limitations and emphasizes human review. Regional adoption persists: all Singapore universities (NUS, NTU, SMU, SUTD, etc.) maintain institutional Turnitin with AI detection, though accuracy varies by language (English 98%, Chinese 85-90%). Top 20 university policy survey reveals institutional shift away from detection-based enforcement; most emphasize transparency and disclosure over automated flagging. Systematic literature review (PRISMA, 18 studies) concludes AI detection cannot replace human judgment in academic integrity. Deployment remains widespread (60%+ HE institutions) but institutional confidence erodes as arms race continues.
2026-Apr: Market data and independent research consolidate evidence against detection-only approaches. Large-scale empirical analysis (Quetext, 37.8M submissions, 25.4B words) confirms institutional adoption of detection at 68% (up from 38% in 2023) despite persistent technical brittleness; AI-generated content prevalence at 15% of essays (5x increase from 2023). Independent systematic benchmark testing (Digital Applied, Feb-Mar 2026) finds no detector exceeds 85% accuracy; detectors miss 15-30% of AI content; false positives reach 12% with accuracy drops of 20-30% on edited content. Peer-reviewed research (Journal of Academic Ethics, Leaton Gray et al., accepted April 2025) concludes detection-based governance models are 'inherently limited' and that assessment design reform is structurally necessary. Legal and institutional signals sharpen: a NY court ruling in Newby v. Adelphi University determined that AI detection scores are "probabilistic guesses" not proof, establishing institutional liability for over-reliance without due process; University of Queensland formally declares detection tools "flawed and unreliable" and mandates proctored or disclosure-based assessment instead. An empirical evasion study finds humanizer tools reduce Copyleaks detection accuracy from 91.3% to 27.8% (p<0.0001), confirming the evasion arms race is being won decisively. Turnitin's inaugural Learning Integrity Insights Report (April 2026) reveals institutional pivot: 60%+ customers prioritize transparency over flagging; fewer than 50% have formal AI policies. Institutional adoption remains broadly deployed (60%+ HE institutions) but confidence has shifted toward human judgment and assessment redesign as primary levers, with detection relegated to administrative triage signal.
2026-May: Institutional rejections expand and evidence consolidates. University of Texas at Austin formalizes ban on all third-party AI detection software, emphasizing course design over policing and citing student IP and instructor liability risks. Independent research (TokenMix, May 2026) documents detector brittleness: ZeroGPT achieves 79.5% accuracy with 23% false positive and 18% false negative rates; paraphrasing defeats detection trivially. Empirical comparison (Russell et al., May 2026) shows expert humans detect AI at 92.7% accuracy with 4% false positives, significantly outperforming commercial detectors on humanized text (Binoculars 6.7% true positive rate on o1-Pro). Analysis (ToHuman, May 2026) confirms detector category has plateaued since 2023 with no accuracy improvement; vendor claims (Turnitin 98%) diverge sharply from audited performance (4-50%+ false positive rates). University of Florida misconduct cases reached 66 in Spring 2025, up from zero in 2021-2023, with director acknowledging detectors carry 4% baseline false positive rate—creating fairness risk for any institution relying on tools for enforcement. News aggregation (Opus, April 2026) documents accelerating abandonment: Curtin, Vanderbilt, UCLA, UC San Diego, Cal State LA, Yale, Johns Hopkins, and Northwestern all discontinuing Turnitin AI detection in 2026, citing false positives disproportionately affecting non-native English speakers. Gartner survey (May 2026) shows universities allocating 18-24% of AI budgets to assessment/detection despite evidence; Turnitin processes 200M+ submissions annually. Market remains bifurcated: widespread deployment (68% institutional adoption) coexists with eroded confidence in detection as basis for high-stakes decisions.