Perly Consulting │ Beck Eco

The State of Play

A living index of AI adoption across industries — where established practice meets the bleeding edge
UPDATED DAILY

The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.

The Daily Dispatch

A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.

AI Maturity by Domain

Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail

DOMAIN
BLEEDING EDGEESTABLISHED

Radiology — autonomous preliminary reads

LEADING EDGE

TRAJECTORY

Stalled

AI that generates preliminary radiology reads autonomously, with radiologist confirmation for final diagnosis. Includes automated report generation and critical finding alerting; distinct from assisted detection which highlights findings for human interpretation rather than producing reports.

OVERVIEW

Autonomous preliminary radiology reads have moved from laboratory proof-of-concept into production deployment, but remain concentrated at a handful of high-volume operators and specialist vendors. Systems that generate draft reports for radiologist review show measurable efficiency gains in controlled settings, yet fundamental adoption barriers prevent broad clinical scaling. Radiology Partners processes 40-50 million cases annually through RADPAIR, achieving 2-5 second report turnaround with 12% error reduction. Northwestern Medicine's in-house autonomous reporting system achieves 40% productivity boost without accuracy loss; a multicenter evaluation of MIRA across 1.87 million reports from 42 hospitals found 69% of AI-generated impressions rated equivalent to radiologist originals. These results are credible, but they come from operationally mature organisations and domain-specific models trained on large radiology report corpora. Generic foundation models and multimodal LLMs show substantial capability gaps, with vision-language models exhibiting critical grounding failures (text-only baselines reach within 5.7% accuracy of multimodal models; some models ignore images entirely). Cross-lingual systems achieve 98.7% physician-edit rates, and foundation models exhibit severe age-related diagnostic bias. June 2026 evidence amplifies the leading-edge tension: FDA grants Breakthrough designations to Aidoc First Read and Cognita autonomous report generation systems (2,000+ hospitals, 120M+ cases processed), yet peer-reviewed research documents realism-reliability gaps, information degradation in autonomous rewriting (51% entity erosion), and safety validation gaps (missing clinical evidence 1.39× higher recall hazard). The regulatory and liability landscape remains the binding constraint: until governance, legal, and accountability frameworks evolve, autonomous AI remains unlikely to scale beyond niche high-volume screening applications.

CURRENT LANDSCAPE

Deployment remains concentrated among operationally mature high-volume operators and specialist vendors, with regulatory acceleration in June 2026. Aidoc's First Read received FDA Breakthrough Device Designation (June 26) for autonomous chest X-ray report drafting, deployed to 2,000+ hospitals processing 120M+ cases; Cognita simultaneously received Breakthrough status for autonomous report generation. Radiology Partners (3,900+ radiologists) runs RADPAIR in production; Mosaic Reporting deployed to thousands of radiologists with real-time draft generation during interpretation powered by Cognita foundation models. Yale New Haven Health System (700k+ annual exams across 16 centers) selected Rad AI for infrastructure-scale autonomous reporting rollout. Real-world deployment metrics: Northwestern Medicine's in-house autonomous reporting system achieves 40% productivity boost; 12-hospital academic health system achieved 15.5% documentation efficiency with autonomous draft reports; India autonomous TB screening increased case notifications 80% (162 confirmed cases). A systematic review of 11 studies confirms autonomous chest X-ray triage systems ready for clinical implementation, with 42.3% triage rate and 97.8% sensitivity across 500K+ real-world cases. These are genuine production deployments validating the business case for autonomous preliminary reads.

Technical capability gains compete with deepening reliability concerns. Harrison.Rad 1.5 foundation model passes FRCR 2B professional board exam (86.5 vs 73.2 cutoff), first autonomous report-generation model validated against independent professional standard. Yet peer-reviewed audits reveal critical robustness gaps: vision-language models show fundamental grounding failures—text-only models reach within 5.7% accuracy of multimodal models, some models ignore images entirely, raising questions about whether high accuracy scores reflect genuine multimodal understanding. Information degradation in autonomous report generation is quantified: LLM rewriting erodes 51.4% of clinical entities and 43.7% of hedging language (clinical uncertainty markers) in EHR summarization tasks. Domain-specific tuning proves critical: LLMs trained on 500M+ radiology reports outperform generic GPT-4; autonomy quality depends on specialization not architecture. Yet cross-lingual prospective validation shows 98.7% physician-edit rates with diagnostic performance degrading from 0.2938 (English) to 0.2149–0.2424 (non-English). Foundation models exhibit severe age-related bias. This evidence pattern—strong performance on narrow tasks, critical gaps exposed by robustness audits—defines the actual capability frontier.

Governance infrastructure is maturing but safety validation gaps persist. The American College of Radiology's June 2026 board chair statement endorses ARCH-AI (quality assurance), Assess-AI (post-deployment monitoring), and Healthcare AI Challenge Consortium (foundation model evaluation) as deployment readiness prerequisites, signaling institutional consensus that governance infrastructure is the frontier. Yet structural barriers remain unresolved: JAMA Network Open cohort study of 903 FDA-authorized devices found 30 radiology AI devices recalled (4.3%), with missing clinical evidence associated with 1.39× higher recall hazard, documenting that validation gaps predict downstream safety failures. MIT CSAIL (June 2026) exposes critical regulatory classification loophole: autonomous radiology AI functions as clinical agent while classified as decision support exempt from FDA review. Most FDA-cleared radiology AI classified as 'aid in detection' or 'triage,' not autonomous screening; regulatory framework designed for fixed-functionality devices constrains autonomous system authorization. Peer-reviewed critical assessment recommends autonomous systems deployed as 'constrained, auditable, continuously monitored components of clinical practice that augment—but do not replace—expert judgment.' Fundamental questions about liability, accountability, and post-deployment monitoring responsibility prevent scaling beyond high-volume screening. The field consensus, articulated by the Radiology Research Alliance in March 2026, emphasizes human-AI collaboration over full autonomy. This is the tension defining leading-edge: FDA Breakthrough designations and production deployments at scale clash against documented validation gaps, robustness failures, generalization limits, and nascent governance frameworks that remain prerequisites for scaling beyond specialists.

TIER HISTORY

ResearchJan-2023 → Jan-2023
Bleeding EdgeJan-2023 → Apr-2024
Leading EdgeApr-2024 → present

EVIDENCE (122)

— FDA Breakthrough Device Designation for Aidoc's First Read autonomous chest X-ray report generation system deployed at 2,000+ hospitals processing 120M+ cases, with $150M Series E funding validating commercial viability.

— STAT News independent reporting on dual FDA Breakthrough designations (Cognita, Aidoc) for autonomous report generation. Highlights regulatory watershed from diagnostic assistance to full-image synthesis and narrative generation.

— American College of Radiology governance framework announcement: ARCH-AI first international sites, Healthcare AI Challenge Consortium for evaluating radiology report drafting foundation models, Assess-AI post-deployment registry.

— Northwestern Medicine in-house autonomous reporting system achieves 40% productivity boost without accuracy loss; published in JAMA Network. Demonstrates real-world deployment of autonomous preliminary reads at tier-1 academic medical center.

— MICCAI 2026 research on reinforcement learning framework for autonomous RRG with clinically-meaningful precision-recall control. Advances technical maturity of autonomous report generation beyond fluency optimization toward clinical safety.

— Critical VLM reliability audit: text-only models reach within 5.7% accuracy of multimodal models; three models ignore images entirely. Exposes fundamental grounding failure masking high accuracy scores for VLM-based autonomous reads.

— Peer-reviewed critical assessment of generative AI in medical imaging covering autonomous report drafting. Identifies realism-reliability gap, automation bias, and human-factor risks; proposes trustworthiness evaluation framework and constraints-based deployment model.

— JAMA Network Open cohort study of 903 FDA-authorized devices: 30 radiology AI devices recalled (4.3%), missing clinical evidence 1.39× higher recall hazard. Documents safety validation gaps for autonomous systems.

HISTORY

  • 2023-H1: Emerging commercial deployment with multiple vendors at scale (3100+ sites, 10.7M scans processed by single vendor). Vendor landscape includes 179 CE-certified products by March 2023, with 67% having peer-reviewed evidence though evidence weighted toward diagnostic accuracy rather than clinical impact. Technical challenges documented: data imbalance in training sets, inadequate evaluation metrics, specific failure modes in non-standard imaging. Safety concerns noted including radiographer override rates and position-dependent accuracy degradation.

  • 2023-H2: Substantial real-world deployment acceleration across multiple health systems and use cases. UK NHS trusts (Frimley, Greater Glasgow) deployed qXR for large-scale triage with 99.7% normal-case accuracy and 58% workload reduction potential. India-based TB screening program demonstrated 15.8% incremental yield improvement with autonomous systems. Teleradiology platforms (InHealth) integrated autonomous triage for critical finding alerting. Major commercial deployment at RSNA 2023: Microsoft/Nuance launched PowerScribe Smart Impression on platform used by 80% of radiologists. Adoption sentiment among clinicians remained positive (78% of Chinese residents surveyed support AI embrace), though replacement concerns persist (30% feared workforce reduction).

  • 2024-Q1: Consolidation of commercial adoption and emergence of safety frameworks. Providence health system deployed PowerScribe autonomous reporting in largest US platform rollout, signaling mainstream health system adoption. Research momentum continues with IEEE review documenting methodological advances in automatic report generation. Emerging LLM approaches (ChatGPT/GPT-4) explored for autonomous reporting, though professional societies (ACR, CAR, ESR, RANZCR, RSNA) established formal evaluation and monitoring frameworks for autonomous AI tools, emphasizing rigorous safety assessment requirements alongside deployment.

  • 2024-Q2: Major vendor expansion and critical deployment-readiness assessment. Microsoft Azure launches Radiology Insights preview service, signaling major cloud vendor entry into autonomous radiology tooling ecosystem. GPT-4 demonstrates capabilities matching radiologists on error detection (82.7% accuracy, lower cost and faster than human review). Independent academic assessment from Imperial College and Royal College of Radiologists documents widespread implementation barriers post-regulatory approval: reliability validation, accountability, trust, and safety governance gaps limit real-world adoption despite regulatory clearance. Asian Oceanian Society of Radiology formalizes regional adoption guidance. Open-source momentum continues with 100+ papers and tools curated in community repositories.

  • 2024-Q3: Consolidation of clinical evaluation frameworks and LLM maturation. RSNA and MICCAI publish joint expert consensus on deployment barriers emphasizing trust, reproducibility, and accountability frameworks. LLM-based approaches demonstrate specific clinical capabilities: GPT-4 for synoptic report generation (F1 0.997), Claude-2 for RADS categorization (57% accuracy with prompting), and patient-friendly report generation (improving understanding scores). Radiologist adoption sentiment remains strong (75%+ engagement in AI practices in major markets). NHS Greater Glasgow & Clyde initiates prospective stepped-wedge clinical trial (RADICAL) for rigorous qXR evaluation across 24 months, signaling major health system commitment to evidence-based autonomous read validation. Systematic reviews synthesize efficiency gains (30-40% reporting time reduction) across diverse deployments, though net clinical outcome questions persist.

  • 2024-Q4: Infrastructure consolidation and human factors maturation concerns. Largest US radiology practice (Radiology Partners, 3,900+ radiologists) partners with generative AI vendor for co-developed autonomous reporting, signaling shift toward strategic health system integration. Large-scale deployment validation continues: 1.3M chest X-rays processed across 33 UAE visa screening centers documented in peer-reviewed publication. Yet critical limitations emerge: multi-site prospective study documents over-reliance risk when AI provides local explanations (even incorrect ones), and practitioner surveys show mixed adoption sentiment with 100-radiologist cohort expressing concerns about reliability, job displacement, and ethical implications. JMIR viewpoint highlights persistent research-to-practice gap despite 190+ FDA-approved radiology AI devices, suggesting deployment maturity lags regulatory approvals.

  • 2025-Q1: Broad hospital adoption and implementation gap evidence. Danish large-scale study demonstrates AI-driven mammography screening with 248,000+ images achieving 48.8% workload reduction while maintaining cancer detection, expanding evidence beyond chest X-ray modality. Pew Charitable Trusts survey (2022 data) documents 44% of US hospitals adopted AI imaging tools, yet critical implementation gaps persist: only 26% piloted before rollout, 34% lack comprehensive validation information, 31% lack monitoring protocols. Over 200 EU-approved radiology AI tools exist with limited real-world adoption, indicating continued research-to-practice gap. Patient acceptance studies show AI-simplified reports improve understanding but patients prefer physician delivery. Major health systems continue infrastructure investments despite unresolved questions about clinical effectiveness versus workflow efficiency gains.

  • 2025-Q2: Continued deployment expansion and platform consolidation. UH Cleveland activated qXR for autonomous lung cancer detection on chest X-rays, demonstrating sustained adoption by major US health systems. Intelerad-RADPAIR partnership combines workflow orchestration with generative AI-driven reporting, signaling integration of autonomous reporting into mainstream clinical informatics platforms. Radiologist interviews reveal variability in AI monitoring practices and implementation approaches. Research continues on report generation advancements and faculty/trainee adoption sentiment.

  • 2025-Q3: Vendor ecosystem consolidation and agentic AI emergence. RADPAIR-Fireworks partnership demonstrates production-scale autonomous reporting infrastructure at Radiology Partners (40-50M cases annually) with specific metrics: 15-20s reduced to 2-5s report turnaround, 25% time savings per case, 12% error reduction. RADPAIR expands geographic reach via AdvaHealth partnership into Asian healthcare markets. Commentary in Diagnostic and Interventional Radiology explores agentic AI for autonomous radiology workflows (RadGPT example for CT scanning). Academic research examines feasibility and barriers of autonomous CXR reporting in UK context, highlighting unresolved accountability framework gaps, regulatory challenges (IR(ME)R, GDPR), and need for post-market surveillance. RSNA covers autonomous report generation in neuroradiology using LLMs (GPT-4), discussing benefits and limitations including hallucinations and generalizability challenges.

  • 2025-Q4: LLM performance maturation and validation gap emergence. Peer-reviewed study demonstrates efficiency gains in semi-automated AI reporting: 6.1-to-3.43 minute turnaround reduction on 100 complex cases with improved accuracy (3.81→4.65/5.0) and confidence (3.91→4.67/5.0, p<0.0001). Scoping review of 67 LLM studies shows strong structured-task performance (>94% accuracy in report simplification) but inconsistent diagnostic performance (16%-86%) with 79% single-center proof-of-concept designs. European adoption expands to 48% of radiologists using AI (up from 20% in 2018); 115+ FDA-approved algorithms by mid-2025. Agentic AI architectural shift: RADPAIR launches PAIRsdk developer framework with industry coalition (Fovia, Interlinx, deepc, Intelerad) planning open-source standards for voice-first autonomous workflows in 2026. Critical expert assessment documents widening validation gap: 100+ tools exhibited at RSNA 2025 lack rigorous clinical validation; algorithms perform significantly worse in diverse populations. Fundamental accountability and clinical outcome questions persist unresolved.

  • 2026-Feb: Continued deployment expansion and regulatory acceleration. Autonomous TB screening in India demonstrates 80.21% increase in case notifications with 162 confirmed cases (44.63% positivity) at deployment sites in Chhattisgarh tribal population. FDA clearing accelerated with 56 new radiology AI devices (1,039 total devices, 80% of all FDA-authorized AI), signaling sustained regulatory momentum. Systematic review of 11 studies concludes autonomous chest X-ray triage systems ready for clinical implementation, with weighted average 42.3% autonomous triage rate and 97.8% sensitivity across real-world datasets (500K+ cases). Implementation barriers evidence: Brisbane hospital study identified 82 barriers and 33 enablers post-deployment using NASSS framework, with sustained adoption constrained by performance inconsistency, weak communication, and medicolegal uncertainty. Industry initiatives: Radiology Partners and Stanford AIDE Lab partnered to develop evidence-based validation and safety monitoring frameworks for autonomous AI tools. Practitioner perspective documents job displacement concerns and malpractice exposure constraints on adoption.

  • 2026-Feb: LLM capability maturation and infrastructure framework development. Blinded evaluation studies compare LLM-generated to radiologist-written reports across clinical relevance and accuracy, advancing evidence on LLM autonomous report quality. Research on fine-tuned Llama-3-70B demonstrates F1 0.780 error detection capability, outperforming GPT-4 (0.683), signaling LLM performance gains in structured report tasks. Technical research on web-based automated chest X-ray report generation systems achieves BLEU-4 0.482 and ROUGE-L 0.718, advancing autonomous generation methodologies. Industry frameworks emerge: Signify Research survey of 150 healthcare organizations identifies seamless workflow integration as 9-10/10 priority and integration barriers as deal-breakers; SATMED and HealthManagement publish frameworks documenting transition from scattered pilots to core infrastructure, citing 1,000+ FDA-cleared AI devices (75% radiology), cloud-native platforms, and agentic AI orchestration requirements. Commercial tool proliferation: Qure.ai expands qXR-Detect to 26 FDA indications; Report Rad AI launches claiming 60-95% faster reporting with 10K+ cases. Regulatory moment: ABR evaluates cautious AI adoption for internal functions while assessing competency frameworks for clinical tool use. Infrastructure maturity signal: adoption frameworks systematize integration, governance, and sustainability challenges underlying "pilot purgatory" barriers.

  • 2026-Mar: Market consolidation accelerated with RadNet acquiring Gleamer for up to $270M and merging it into DeepHealth (26 FDA-cleared devices, 2,700+ customer contracts across 50 countries), signalling that scale and device breadth are becoming competitive moats. Deployment evidence continued broadening: ARA Health (13 hospitals, 100k studies/month) achieved 20% reporting-time reduction with Rad AI across 79% of radiologists; United Imaging Intelligence unveiled CE-certified uAI Image-to-Report agents generating structured CT and brain MRI preliminary reports across five European countries; and a Ghana clinical study found autonomous AI TB screening achieved 91% accuracy versus 86% for radiologists in a resource-limited setting. The Radiology Research Alliance published multi-institutional consensus in March 2026 that human-AI collaboration produces stronger outcomes than full autonomy, reflecting field caution about replacing radiologist sign-off despite rising deployment velocity.

  • 2026-May: Commercial adoption deepened with Rad AI Omni deployed in 8 of 10 largest U.S. private radiology practices, and the RadNet-Gleamer merger creating 700+ customer contracts with autonomous draft reporting as a core capability. DeepHealth revenue grew 51.5% YoY to $29.1M with ARR reaching $96.9M (95% YoY growth) and guidance exceeding $140M, confirming production-scale commercial momentum across 2,890+ customers; RadNet reported record Q1 2026 results with 70%+ of studies expected to run through clinical AI by year-end. ACR launched ARCH-AI and Assess-AI governance programs — formal quality assurance and post-deployment monitoring registries — signalling field consensus that governance infrastructure now defines the frontier. Critical capability limits emerged: the ABRA agentic radiology benchmark found agents achieve 89% success on tool orchestration but only 0-25% on real annotation outcomes, locating the bottleneck in visual perception rather than reasoning; Stanford HAI policy research documented that most deployed radiology AI systems lack robust performance monitoring despite rapid clinical adoption. Independent analysis further documented the evidence gap between validated AI-assisted detection and autonomous-only reads lacking equivalent clinical trial validation.

  • 2026-Jun: Regulatory acceleration with dual FDA Breakthrough Device Designations for autonomous report generation. Aidoc's First Read (chest X-ray autonomous reports, 2,000+ hospitals, 120M+ cases, $150M Series E) and Cognita (acquired by Radiology Partners, autonomous report generation) both received Breakthrough status June 25-26, signaling regulatory recognition of autonomous preliminary reads as addressing unmet clinical need. Commercial deployment momentum confirmed: Yale New Haven Health System (700k+ annual exams) selected Rad AI for infrastructure-scale rollout; Mosaic Reporting (Radiology Partners subsidiary) deployed to thousands with real-time autonomous draft generation; HOPPR Presto Agent launched with PowerScribe integration. Northwestern Medicine published case study showing 40% productivity boost with autonomous reporting (JAMA Network). Capability maturation signals include Harrison.Rad 1.5 passing FRCR 2B board exam and MICCAI 2026 research on clinically-controlled autonomous RRG with precision-recall tradeoffs. Yet robustness and safety concerns deepened: peer-reviewed audit documented vision-language models with critical grounding failures (text-only models within 5.7% accuracy of multimodal; some models ignore images); research on LLM-rewritten reports quantified 51.4% entity erosion and 43.7% hedging loss. JAMA Network Open revealed safety validation gaps: 30 radiology AI devices recalled (4.3%), with missing clinical evidence 1.39× higher recall hazard. Peer-reviewed critical assessment identified realism-reliability gap in generative medical imaging and recommended constrained deployment with continuous monitoring. Governance maturation: ACR Board Chair formally endorsed ARCH-AI, Assess-AI, and Healthcare AI Challenge Consortium as deployment readiness prerequisites. Field consensus emphasizes that governance infrastructure, not capability maturation, now defines the frontier.