The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail
AI that understands complex documents, diagrams, handwriting, and degraded or historical texts using vision-language models and specialised OCR. Includes architectural drawing interpretation and historical manuscript digitisation; distinct from standard document processing which handles structured forms and clear printed text.
Document and diagram understanding is a bifurcated practice: specialised systems deliver proven value in cultural heritage, finance, and government, while general-purpose vision-language models remain unable to cross the threshold from text extraction into genuine diagram comprehension. That split defines the practice's leading-edge status. Forward-leaning organisations -- archives running Transkribus at scale, government agencies automating redaction, financial processors cutting per-document costs by 90% -- are extracting real ROI. But most organisations have not started, and the technology that could unlock horizontal adoption does not yet exist.
The core constraint is structural. VLMs can recognise entities in diagrams at 85%+ accuracy, yet peer-reviewed research shows they achieve only 40-54% on relational reasoning -- performance driven by background knowledge rather than visual comprehension. This means document text extraction has a viable, cost-competitive AI path, while diagram and relationship understanding remains manually intensive. Seventy percent of manufacturers still extract engineering tolerance data by hand. Until VLMs close the modality gap between text and visual content, the practice will remain segmented: specialised tools for those who invest in them, and manual workflows for everyone else.
Transkribus dominates the cultural heritage segment, with 90 million images processed across 227 cooperative members in 30 countries. Volunteer transcribers working on New France manuscripts report 3-4% character error rates with modest training data, and the platform's 2026 roadmap adds LLM integration and named-entity recognition while preserving the data-sovereignty guarantees that heritage institutions require. Recent deployments include U of T/UCL researchers applying Transkribus to 13th-century Latin manuscripts, overcoming medieval abbreviations and hyphenation to achieve precise specialist document transcription. In government, King County, WA cut document redaction time from 30 minutes to under five seconds at 96% accuracy. VLM-based invoice processing now achieves 85-94% accuracy with costs as low as $1.20 per document -- viable where human verification is built into the workflow.
The global OCR market reached $13.95 billion in 2024 and is projected to reach $46.09 billion by 2033, with production evidence mounting: ArcelorMittal processes 300,000+ invoices annually at 90% accuracy, reducing per-invoice processing time from 7-10 days to 1 day. Microsoft's production IDP pipeline demonstrates the maturation of orchestrated multi-model approaches, reducing manual document processing from 30-45 minutes to under 5 minutes through parallel extractors and human validation gates. Azure Content Understanding reached GA in March 2026 with 40% accuracy improvement via labeled examples across tax forms, legal, medical, and employment documents.
These successes sit alongside persistent limitations. Cloud reliability remains a barrier: Azure Document Intelligence continues experiencing production outages and extraction service hangs, constraining enterprise adoption. Handwriting OCR accuracy shows substantial variance (63-99% across platforms; block letters ~95%, cursive ~45%), indicating real-world performance depends heavily on document type. Independent research confirms VLMs are "semantically strong but spatially fragile"—geometric distortions cause 34pp accuracy loss, critical for scanned and degraded documents. Layout analysis emerges as an underappreciated bottleneck: DFG/AHRC-funded research on Tibetan newspapers documents how Transkribus fails on non-Latin dense layouts, requiring custom vision models (TransYolo) to detect and assign text lines. Specialised hybrid approaches fare better in narrow domains -- engineering drawing parsing via YOLOv11 and Donut reached 97.3% F1 -- but general-purpose diagram reasoning remains out of reach. Non-Latin script accuracy still depends on fine-tuned models, and hybrid human-in-the-loop workflows remain the production norm rather than the exception. A PRISMA systematic review of OCR evaluation (2006-2025) documents structural bias: evaluation frameworks centre on modern Western documents, leaving historical and marginalized materials systematically underrepresented.
— Benchmarking study of frontier models for document processing with named vendors, specific accuracy metrics, and detailed cost/performance trade-offs based on 25,000 documents tested.
— Amazon Research benchmark directly evaluating VLMs on long, visually complex documents. High-quality research from major vendor addressing scalability and performance on real-world document processing tasks.
— Google Document AI releases Intelligent Document Quality scoring, digital PDF support, and versioning (April 2026), demonstrating active vendor focus on production-grade document quality signals.
— Detailed technical analysis of production failure modes in document understanding systems, documenting the gap between benchmark (97%) and real-world performance across document types.
— Comprehensive leaderboard ranking 12+ models on OCR and document AI benchmarks, showing ecosystem breadth and saturation on structured tasks.
— Empirical comparison of OCR systems on historical handwritten manuscripts; combined Transkribus + Gemini pipeline achieved CER 0.047, demonstrating hybrid approaches outperform single models.
— Market sizing $8.4B (2026) → $16.6B (2034) at 8.8% CAGR; multimodal documents (tables, handwriting, images, mixed languages) identified as largest segment reflecting commercially significant challenges.
— Apryse (serving 20K+ companies, 85% Fortune 100) achieves GA on ICR SDK for handwritten documents, addressing production handwriting recognition gap in enterprise deployments.