The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail
AI that understands complex documents, diagrams, handwriting, and degraded or historical texts using vision-language models and specialised OCR. Includes architectural drawing interpretation and historical manuscript digitisation; distinct from standard document processing which handles structured forms and clear printed text.
Document and diagram understanding remains bifurcated between proven adoption in specialised contexts and unresolved limitations in horizontal AI approaches. Institutional deployments across cultural heritage, finance, and government continue delivering measurable ROI, but the fundamental constraint that blocked horizontal adoption in 2025 persists into June 2026: vision-language models systematically fail at diagram comprehension and rely on textual priors rather than visual grounding. This structural gap defines the leading-edge status and stalled trend. Forward-leaning organisations—archives transcribing medieval manuscripts at 9.7% error rate, governments digitizing millions of handwritten records, financial services processing loans in under 2 minutes—are scaling document understanding to institutional production use. But most organisations remain on manual workflows, and the general-purpose AI that could unlock horizontal adoption does not yet exist.
VLMs have crossed the threshold on text extraction in complex documents (90-99% accuracy on invoices and forms now routine), but diagram understanding remains categorically broken. Recent June 2026 research confirms the problem: frontier models achieve 51% accuracy on architectural object counting while maintaining 95% on text extraction—a 44-point gap indicating symbol-centric reasoning is unreliable. Engineering diagram description tokens reach only 3-18% F1-score despite 61-87% recall on parts. VLM failure on diagrams is not a tuning problem but a fundamental misalignment between how language models encode visual information and how diagrams encode structural relationships. Seventy percent of manufacturers still extract engineering tolerance data by hand. Until this visual-language gap closes, the practice will remain segmented: specialised tools (Transkribus, fine-tuned models, hybrid approaches) delivering production value, and manual workflows persisting where diagram understanding is required.
Transkribus dominates the cultural heritage segment at scale: 90 million images processed across 227 cooperative members in 30 countries. June 2026 deployments reinforce sustained adoption: Inria's CoMMa project transcribed 32,763 medieval manuscripts in 4 months with 9.7% character error rate and 3 billion-word corpus, confirming production-scale HTR for low-resource historical scripts. University of Georgia's Hargrett Library deployed Transkribus plus custom Python workflow for 20,000+ Colonial-era pages in under two months with 2-person team, establishing reusable institutional model. University of South Carolina Libraries processed 100,000+ handwritten pages with JSTOR Seeklight AI at 97% accuracy, demonstrating mainstream adoption in academic institutions. Vatican Library deployed ResNet-18 and Swin Transformer models on medieval manuscripts, achieving >80% accuracy on scribe identification with explainability requirements for humanistic scholarship. U of T/UCL researchers applied Transkribus to 13th-century Latin legal manuscripts, overcoming medieval abbreviations through collaborative retraining. Government deployments continue: India's Gyan Bharatam Mission documented 4.4M+ manuscripts with ₹491.66 crore funding through 2031; King County, WA cut document redaction time from 30 minutes to under five seconds at 96% accuracy.
Enterprise market acceleration through June 2026: Gartner's inaugural Intelligent Document Processing Magic Quadrant (September 2025) identified 5 Leaders (ABBYY, Hyperscience, Infrrd, Tungsten, UiPath), with accuracy converged at 90-99% and audit trail emerging as primary differentiator. IDP adoption reached 63% of Fortune 250, market sized at $4.31B (33% CAGR). Gartner data shows 67% of enterprises now evaluating agentic approaches versus 23% two years ago. VLM-based invoice processing achieves 85-94% accuracy at $1.20 per document. Production scale-ups: ArcelorMittal processes 300,000+ invoices annually at 90% accuracy with processing time reduced from 7-10 days to 1 day; M2P Fintech deployed Document Intelligence Agent with 18-24 hours → <2 minutes processing, 85-90% → 95%+ accuracy, ₹800-1,200 → ₹80-150 per-application cost, handling 150,000 pages/hour; Nevada County deployed Chandra model for 200K+ Gold Rush documents at 150X speedup (3 weeks → 2 hours) with 95-98% accuracy on modern and 90% on complex historical handwriting.
Cloud platform maturation continues through Q2 2026: Microsoft Azure Content Understanding GA (March 2026) achieved 40% accuracy improvement via labeled examples with named customers (DataSnipper, FinHero, Wolters Kluwer) confirming deployment value. Databricks released ai_extract and ai_classify functions as native GA capabilities (June 2026), integrating document understanding into core data platform workflows. UiPath Helix model family reached GA (May 2026) with improved extraction and classification. Vendor ecosystem expansion: ABBYY FineReader added layout analysis, handwritten and Chinese recognition, LLM integration; Google Cloud Document AI released quality scoring, digital PDF support, model versioning with named deployments (Jack Henry, PwC, Mr. Cooper). Technical skill expansion: June 2026 research demonstrates frontier models (Gemini 3.1 Pro) achieve 97.91-98.51% character accuracy on classical Arabic scripts (naskh, ruq'ah, ta'liq), indicating HTR advances for specialized non-Latin paleography.
Critical limitations persist and recent research crystallizes them. Diagram understanding remains fundamentally broken for general-purpose VLMs: AECV-bench (May 2026) shows best model (Gemini 3 Pro) achieves 51% accuracy on architectural object counting versus 95% on text extraction—44-point gap exposing symbol recognition as unreliable. Enginuity benchmark (June 2026) confirms: frontier models reach Recall@all 0.61-0.87 on engineering diagram parts but Token F1 only 0.03-0.18 on descriptions, quantifying the relationship-reasoning failure. June 2026 Vision-Grounded study documents that VLMs systematically rely on textual priors over visual grounding: proprietary models show 27-38% gaps between Vision-Grounded and baseline variants, signaling fundamental visual-language misalignment. Handwriting OCR accuracy varies 63-99% across platforms (block ~95%, cursive ~45%), heavily dependent on writing style and document type. Layout analysis emerges as critical bottleneck: DFG/AHRC-funded Tibetan newspaper research documents Transkribus failing on dense multi-script layouts, requiring custom TransYolo solution. Non-Latin script accuracy remains dependent on fine-tuning; specialized deployments on Tamil, Arabic, and Urdu scripts confirm HTR maturity concentrated in high-resource domains. Hybrid human-in-the-loop workflows remain production standard. Azure Document Intelligence reliability issues persist (May 2026 outages, extraction service hangs), constraining enterprise adoption. Systematic review of OCR evaluation (2006-2025) documents structural bias: evaluation frameworks center on modern Western documents, leaving historical and marginalized materials systematically underrepresented in maturity assessments.
— Peer-reviewed evaluation of Gemini 3.1 Pro on Arabic classical scripts achieved 97.91%-98.51% character accuracy across naskh/ruq'ah/ta'liq, demonstrating frontier VLM effectiveness for specialized non-Latin paleography.
— University of South Carolina deployed JSTOR Seeklight AI on 100,000+ handwritten pages with 97% accuracy; production integration with student workflow demonstrates sustainable institutional adoption.
— Inria ALMAnaCH deployed CoMMa project transcribing 32,763 medieval manuscripts in 4 months with 9.7% character error rate; 3B+ word corpus confirms production-scale HTR for low-resource historical scripts.
— Central Institute of Classical Tamil digitized 48% of Thirukkural manuscripts with corpus explicitly developed as training data for handwritten Tamil text recognition; shows document digitization as foundational for specialized script HTR advancement.
— Government of India Gyan Bharatam Mission deployed AI for handwritten manuscript digitization at scale: 4.4M+ manuscripts documented, 800K+ digitized, 129K+ public access; ₹491.66 crore funding through 2031 demonstrates government backing.
— M2P Fintech deployed Document Intelligence Agent in loan origination: TAT 18–24 hours → <2 minutes; accuracy 85–90% → 95%+; per-application cost ₹800–1,200 → ₹80–150; handles classification, extraction, authenticity verification, fraud detection at 150,000 pages/hour.
— Archion deployed Transkribus API at production scale (200K+ books, 32M images) with text overlay in research platform; outcomes: improved user experience, faster paleographic research, automatic model improvement adoption.
— Peer-reviewed benchmark exposing VLMs systematically rely on textual priors over visual grounding: all models degrade on Vision-Grounded variant; proprietary models show wider grounding gaps (27–38%), signaling fundamental failure mode for document/diagram understanding.