The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail
AI tools that detect plagiarism and identify AI-generated content in student submissions. Includes text similarity matching and AI writing detection; distinct from content authentication in creative media which verifies media provenance rather than academic integrity.
AI content detection in academic integrity has bifurcated sharply by June 2026. Vendors continue scaling infrastructure—the education detection market reached $520 million globally—yet institutional confidence has collapsed. Detection accuracy remains fundamentally unreliable: independent testing shows 80-90% accuracy on unedited AI text but collapses to 60-80% after basic paraphrasing, while false positive rates for non-native English speakers climb to 40-61% (Stanford research). This equity crisis has triggered an accelerating exodus: University of Sheffield, University College Cork, Indiana University Kelley School, and dozens of peers have formally rejected detection tools entirely. The core technical problem is unsolvable: AI and human text exhibit overlapping statistical distributions, making probabilistic detection intrinsically error-prone at scale. Courts (Newby v. Adelphi) have ruled detection scores are "probabilistic guesses" not proof, establishing institutional liability for over-reliance. The practice sits in a state of technical stagnation—commercially active and broadly deployed (60%+ of higher education institutions), but increasingly recognized as unsuitable for high-stakes enforcement decisions.
The vendor ecosystem continues to scale despite acute institutional rejection. Turnitin, Copyleaks, and GPTZero maintain deep LMS integrations, and Copyleaks released its V9 detector with GPT-4o, Gemini, and Claude support. Turnitin added an "AI bypasser detection" feature. These are arms-race responses, not capability breakthroughs--the evasion side is winning on economics: six AI humanizer Custom GPTs in ChatGPT recorded 86,000+ user engagements, offering detection evasion at $5/month versus institutional subscriptions running $50-300.
Institutional rejections have accelerated dramatically. By June 2026, University of Sheffield and University College Cork formally rejected detection tools; Indiana University Kelley School (a major business education institution) explicitly banned all AI detection across its faculty, labeling tools "highly unreliable." These rejections join University of Western Australia (350-staff assessment redesign instead), San Francisco State, Waterloo, Vanderbilt, UBC, and University of Iowa in formal policy reversals. Court rulings (Newby v. Adelphi, 2026) have established that detection scores are "probabilistic guesses" not proof, creating institutional liability for over-reliance without due process.
Independent testing in the final days of May 2026 reveals systematic unreliability. The Authors Guild conducted independent professional testing of five major detectors on pre-2023 human articles (where accuracy should be perfect): Pangram returned 0% false positives but ZeroGPT flagged 18-76% of human text as AI. A June 2026 comprehensive analysis found native-speaker false positive rates of 1.6-12% but 61% for non-native English speakers--a fundamental equity barrier. Stanford research documented a 61.3% false positive rate on TOEFL essays. Performance collapses on realistic scenarios: detection rates fall from 80-90% on unedited AI to 60-80% on paraphrased content.
Turnitin's own April 2026 Learning Integrity Insights Report shows 60% of customers now prioritize transparency over automated flagging. Independent benchmarking (Feb-Mar 2026) confirms no tool exceeds 85% accuracy, with 15-30% of AI-generated content missed and false positive rates of 12%+ for non-native writers. The institutional consensus has solidified: leading research universities employ assessment redesign and human review; others maintain tool deployment out of administrative inertia. Deployment remains broad (60%+ of higher education institutions), but institutional confidence has eroded to zero for high-stakes enforcement. Detection functions only as a triage signal triggering human review, never as dispositive evidence.
— Independent testing: Turnitin (16K+ institutions) disabled by UC Berkeley, Vanderbilt, Johns Hopkins, Michigan State, Northwestern; GPTZero claims 99% but scores 63.77% in independent studies; accuracy gap between marketing and reality documented.
— Synthesis of 2023-2026 empirical studies: independent benchmark finds 99% accuracy claims collapse under realistic conditions; 280K+ assignment study shows accuracy near guessing on short coursework; Stanford-linked study with 94% non-detection of fully AI-generated exam answers.
— Peer-reviewed Springer study (Atamhenwan 2026) testing Turnitin on 81 scripts with 0-100% AI content: detection fails at 5-10% AI use; paraphrasing tools (RyneAI, QuillBot) bypass detection entirely; arms-race dynamic confirmed.
— Synthesis of 2025-2026 peer-reviewed evidence: 15-30% false positive rate across tools, disproportionate impact on multilingual writers; RAID Benchmark (ACL 2024) shows substantial accuracy shifts; University of Maryland: detectors approach random guessing as AI converges with human text.
— Indian national policy: 10-40% AI content requires resubmission; 40-60% triggers one-year bar; >60% cancels PhD registration. ShodhShuddhi infrastructure deploys DrillBit, Turnitin, iThenticate across all PhD submissions; major enforcement scale.
— Market leader (30M+ students, 15K institutions) faces trust crisis: 1.2/5 Trustpilot rating, 98% 1-star reviews. CPO admits 15% false-negative rate; Stanford: 61% of non-native English essays misclassified as AI; 12 universities disabled detection.
— Case study by writing pedagogy specialist (HTWG Konstanz): Wikipedia definition scored 86% similarity in Turnitin; after single ChatGPT paraphrase, dropped to 0%. Demonstrates viability of 'copy, shake, paste' evasion strategy.
— University of Sheffield policy explicitly rejects AI detection tools due to concerns over error rates and potential for false positives/negatives; continues plagiarism detection while excluding AI detection functionality.
2023-H1: ChatGPT adoption drives urgent institutional demand for AI detection tools. Turnitin scales to 65M papers analyzed by June; 43% of students using AI tools. Independent research confirms detectors have fundamental reliability limits: paraphrasing defeats detection, false positive rates are 4-10%, and distributions of human vs. AI text may be inherently indistinguishable. Institutional responses diverge-some adopt tools despite caveats, others formally reject them as unreliable.
2023-H2: Detection ecosystem expands despite mounting evidence of unreliability. Turnitin reaches 76M papers by July; D2L integrates Copyleaks into Brightspace. OpenAI withdraws its own detector (26% true positives, 9% false positives) in July. Independent institutional testing and multi-course research confirm post-editing defeats detectors. Market continues scaling deployment despite technical ceiling; institutional responses diverge between policy-based approaches and continued tool reliance.
2024-Q1: Deployment scale continues as Turnitin reaches 200M papers analyzed by March; new products emerge (Turnitin paraphrasing detection, Binoculars detector) promising improved accuracy. Simultaneously, independent investigations document tools offering a "false sense of accuracy" and produce harmful false positives; educator experiences and institutional assessments conclude detection tools are fundamentally ineffective. Student AI usage (59% regular users) continues outpacing educator adoption and tool capability.
2024-Q3: Turnitin launches paraphrasing detection feature (July); deployment metrics hold at 200M+ papers reviewed with 3% flagged as 80%+ AI-written. Peer-reviewed research presented at IDSTA 2024 (September) documents that detection tools fail entirely on paraphrased ChatGPT text, confirming adversarial vulnerability. Major institutions (UBC, University of Iowa) formally reject tool deployment in favor of pedagogical approaches, citing research showing no tool exceeds 80% accuracy and citing evidence of false positive harms. Market fragmentation deepens: vendors continue scaling despite institutional policy reversals and academic validation of fundamental unreliability.
2024-Q4: Vendor infrastructure scaling continues: AWS case study (November) documents Turnitin processing 2M papers daily; Canisius University replaces Turnitin with Copyleaks via D2L (November); Kritik integrates GPTZero (December). Yet independent testing (University of Adelaide, October) confirms Copyleaks at 85.2% detection but all tools fail on paraphrased text. Critical analysis documents Turnitin accuracy drops to 31% and 0% after Quillbot/humanization tools; equity analysis shows non-native English speakers face 2-3x higher false positive rates. Market remains bifurcated: vendors scaling through ecosystem integration despite institutional policy reversals and evidence that detection offers only administrative inertia, not pedagogical value.
2025-Q1: Professor adoption peaks (65% use detection tools, up from 30% two years prior) even as evidence harddens against tools. Peer-reviewed testing (JALT, January) shows Turnitin's claimed 100% accuracy masks inconsistencies; vendor analysis documents evasion tactics already defeating detection (layered rewriting, hybrid drafting, translation). By March, independent testing exposes Copyleaks at 30% misclassification; critical analysis shows agentic AI systems render detection obsolete. Equity barriers harden with documented false accusations at multiple universities and non-native speakers facing 2-3x higher false positive rates. Institutional adoption remains driven by inertia, not confidence.
2025-Q2: Major institutional rejections accelerate: University of Waterloo discontinues Turnitin AI detection effective September 2025; joins Vanderbilt, UBC, and University of Iowa in formal policy reversals citing research showing no detector exceeds 80% accuracy. Paradoxically, market infrastructure scaling continues: global detection market at $1.79B with education at $520M; 48% of top 100 universities have integrated detection into learning platforms. Vendor partnerships deepen (Turnitin-GPTZero integration, Copyleaks analyst recognition). Independent testing continues documenting flaws: GPTZero effective on pure AI (91-100%) but unreliable on human texts; dental research shows human experts outperform automated tools. Investigative reporting exposes institutional spending despite failures: CSU $163K annually on Turnitin detection. Equity gaps persist: Stanford study found 61.2% false positives on TOEFL essays; non-native speakers face 2-3x higher false positive rates. Institutional adoption reflects inertia—institutions with early deployments continue use despite evidence and known harm.
2025-Q3: Vendors escalate product development: Copyleaks releases AI Detector V9 (August) with support for GPT-4o, Gemini, and Claude; Turnitin launches "AI bypasser detection" feature (September) targeting humanizer tools. University of Waterloo formally discontinues Turnitin AI detection effective September 2025. Peer-reviewed research (August 2025, Acta Neurochirurgica) testing 1,000 texts finds detectors achieve AUCs 0.75-1.00 but fail to reach 100% reliability, with documented false positives. Los Angeles Times experiment (September 2025) reveals critical detector inconsistencies: human text flagged as AI, AI content missed, paraphrased text misclassified. Adoption rises to 68% of teachers using detectors (up from ~38% prior year) and 40% of four-year colleges, but adoption reflects institutional inertia rather than confidence. Equity barriers harden: non-native speakers face 2-3x higher false positive rates. Vendors engage in product arms race responding to evasion, but fundamental accuracy and bias problems remain unresolved.
2025-Q4: Real-world harm becomes quantifiable: Guardian investigation documents nearly 7,000 confirmed student cases caught using AI tools in 2023-24 (5.1 per 1,000 students), but parallel growth in false-accusation reports emerges as detector deployment deepens. Institutional spending continues despite documented failures: CSU maintains $1.1M annual Turnitin spending for "shadow of accurate detection"; College of Canyons $47K annually. By December, detector adoption reaches 86% of students globally using AI tools while 68% of teachers deploy detectors. Peer-reviewed evidence continues documenting fundamental brittleness: paraphrasing reduces Skyline Academic analysis shows detection drops from 74% to 26% with paraphrased AI text; vendor claims of 98-99% accuracy masked by documented failures on mixed human-AI content. Equity harm documented: non-native English speakers face 2-3x higher false positive rates; Vanderbilt's prior analysis estimated 750 students at wrongful accusation risk; Stanford study found 61.2% false positives on TOEFL essays. Market continues scaling ($1.79B global, $520M education) despite institutional reversals by Waterloo, Vanderbilt, UBC, Iowa—bifurcation hardens as vendor investment and institutional inertia sustain deployment despite growing evidence that detection offers administrative necessity rather than pedagogical value.
2026-Jan: Comparative testing in early 2026 reveals persistent tool differentiation: Copyleaks reaches 100% accuracy on human text in independent testing (95% post-paraphrasing), while Turnitin maintains 2-5% false positives on ESL submissions and misclassifies human academic work. Documented harm cases emerge: University of North Georgia student lost scholarship on false detection flag. Meta-analysis of peer-reviewed studies confirms systemic failures—Stanford 61.3% false positive rate on TOEFL essays, all 14 tools tested below 80% accuracy, Perkins average 39.5% (17.4% post-editing). Evidence across tools documents racial disparities (20% Black vs. 7% white students falsely accused) and deployment reality via LMS integrations (Canvas, Brightspace) showing field-wide reliance despite known limitations.
2026-Feb: Institutional rejections accelerate: University of Western Australia and San Francisco State University formally discontinue AI detection tools, citing unreliability, cost, and equity concerns, joining Waterloo, Vanderbilt, UBC, Iowa. Simultaneously, independent testing shows persistent brittleness: Humaneer test finds false positives 40% for ESL writers, Paper Checker documents detection collapse from 99% (raw AI) to 70-80% (paraphrased). Evasion ecosystem surge: 86,000+ user engagements on low-cost ChatGPT humanizer tools undermine institutional detection utility. Market bifurcation hardens—leading research institutions rejecting tools while vendors scale infrastructure through LMS partnerships and product iteration against evasion techniques.
2026-Mar: Vendor and institutional strategy shifts accelerate: Turnitin CPO (BETT 2026 keynote) announces detection-only era ending, reframes toward process transparency and learning integrity over product-based flagging. CSU continues institutional spending ($163K for AI detection add-on, 2025) despite documented unreliability. Student harm quantified: UK survey (n=2,373) finds 75% of AI users report stress over false positive accusations; 52% cite fear of wrongful charges. Toronto Metropolitan University documents 30% of integrity consultations involving AI charges; Academic Integrity Office confirms detector limitations and emphasizes human review. Regional adoption persists: all Singapore universities (NUS, NTU, SMU, SUTD, etc.) maintain institutional Turnitin with AI detection, though accuracy varies by language (English 98%, Chinese 85-90%). Top 20 university policy survey reveals institutional shift away from detection-based enforcement; most emphasize transparency and disclosure over automated flagging. Systematic literature review (PRISMA, 18 studies) concludes AI detection cannot replace human judgment in academic integrity. Deployment remains widespread (60%+ HE institutions) but institutional confidence erodes as arms race continues.
2026-Apr: Market data and independent research consolidate evidence against detection-only approaches. Large-scale empirical analysis (Quetext, 37.8M submissions, 25.4B words) confirms institutional adoption of detection at 68% (up from 38% in 2023) despite persistent technical brittleness; AI-generated content prevalence at 15% of essays (5x increase from 2023). Independent systematic benchmark testing (Digital Applied, Feb-Mar 2026) finds no detector exceeds 85% accuracy; detectors miss 15-30% of AI content; false positives reach 12% with accuracy drops of 20-30% on edited content. Peer-reviewed research (Journal of Academic Ethics, Leaton Gray et al., accepted April 2025) concludes detection-based governance models are 'inherently limited' and that assessment design reform is structurally necessary. Legal and institutional signals sharpen: a NY court ruling in Newby v. Adelphi University determined that AI detection scores are "probabilistic guesses" not proof, establishing institutional liability for over-reliance without due process; University of Queensland formally declares detection tools "flawed and unreliable" and mandates proctored or disclosure-based assessment instead. An empirical evasion study finds humanizer tools reduce Copyleaks detection accuracy from 91.3% to 27.8% (p<0.0001), confirming the evasion arms race is being won decisively. Turnitin's inaugural Learning Integrity Insights Report (April 2026) reveals institutional pivot: 60%+ customers prioritize transparency over flagging; fewer than 50% have formal AI policies. Institutional adoption remains broadly deployed (60%+ HE institutions) but confidence has shifted toward human judgment and assessment redesign as primary levers, with detection relegated to administrative triage signal.
2026-May: Institutional rejections expand and evidence consolidates. University of Texas at Austin formalizes ban on all third-party AI detection software; Curtin, Vanderbilt, UCLA, UC San Diego, Yale, Johns Hopkins, and Northwestern also discontinue Turnitin AI detection in 2026, citing false positives disproportionately affecting non-native English speakers. A Cornell study of 95,000 students at 20 U.S. public research universities (published in Science) finds 37% use GenAI monthly and 9% to cheat, generating pressure for assessment reform over detection-only approaches. GradPilot analysis of 66 universities' procurement records confirms tools in widespread use (Turnitin, Copyleaks, GPTZero, $2,768-$110,400/year) but documents many institutions disabling detectors due to ~4% false positive rates. IEEE Security & Privacy peer-reviewed research empirically confirms widespread commercial detector failure, including specific adversarial vulnerability data. Independent audits (TokenMix, Russell et al., ToHuman) confirm detector accuracy has plateaued since 2023 with claims-to-performance divergence. Market remains bifurcated: 68% institutional adoption persists alongside eroded confidence in detection as a basis for high-stakes enforcement decisions.
2026-Jun: Institutional policy rejection broadens: Indiana University Kelley School explicitly banned all AI detection tools, labeling them "highly unreliable"; University of Sheffield formally rejected detection tools; and UC Berkeley, Vanderbilt, Johns Hopkins, Michigan State, and Northwestern have all disabled Turnitin AI detection. New peer-reviewed research redefines the problem: a Springer study (Atamhenwan 2026) testing Turnitin on 81 scripts finds detection fails at 5-10% AI use and paraphrasing tools (RyneAI, QuillBot) bypass detection entirely; arXiv benchmarking confirms major detectors fail on realistic hybrid submissions; Stanford-linked synthesis documents 94% non-detection of fully AI-generated exam answers in one study and a 280K+ assignment analysis showing accuracy near guessing on short coursework. False positive data for non-native speakers hardens: comprehensive analysis documents 61% false positive rates for non-native English speakers (versus 1.6-12% for native speakers), and the Authors Guild's independent audit of five detectors on pre-2023 human articles found extreme variance (Pangram 0% vs. ZeroGPT 18-76%). India's UGC has moved in the opposite direction — revised norms treat unacknowledged AI use as plagiarism in PhD submissions and deploy DrillBit, Turnitin, and iThenticate across all national PhD programs, illustrating the divergence between institutional rejection in Anglophone research universities and regulatory enforcement expansion elsewhere.