Perly Consulting │ Beck Eco

The State of Play

A living index of AI adoption across industries — where established practice meets the bleeding edge
UPDATED DAILY

The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.

The Daily Dispatch

A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.

AI Maturity by Domain

Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail

DOMAIN
BLEEDING EDGEESTABLISHED

Operational documentation — runbooks & post-incident reports

LEADING EDGE

TRAJECTORY

Stalled

AI that generates and maintains operational runbooks and produces post-incident review reports. Includes automated playbook creation and blameless post-mortem drafting; distinct from incident response automation which executes actions rather than documenting them.

OVERVIEW

AI-generated runbooks and post-incident reports have reached vendor maturity and demonstrated quantified value in select deployments, yet organisational adoption remains constrained by governance and operational discipline gaps—not technical capability. The technology solves a genuine pain point: runbooks decay as systems change (static documentation has a half-life measured in weeks in rapidly deploying environments), and post-mortems are routinely delayed or incomplete because recovery competes for engineers' attention against documentation. Named deployments show measurable returns: GreatCTO achieved 94.1% median detection time reduction across 47 P0 incidents via persisted incident memory; incident.io customers report 37% MTTR reduction and $29,700 annual savings; SolarWinds measured 17.8% incident resolution time cuts across 2,000+ ITSM deployments; Cutover platforms demonstrate 60% MTTR reduction and 50% fewer disruptions via AI-assisted runbook execution, plus 25–40% additional MTTR reduction via autonomous execution with continuous learning; financial services deployments (Danske Bank) report 300% resilience efficiency gains. Proven deployment patterns now documented: AI Advisory Board validates runbooks as high-confidence AI use case with emerging standard workflow (AI drafts from alert definitions, on-call engineers refine within first week), with documented SaaS firm scaling runbook coverage from 40% to 85% in 14 days. The vendor ecosystem has solidified: Arvo AI's May 2026 neutral taxonomy defines postmortem generation as a distinct, mature capability axis across 15+ vendors (BMC HelixGPT, PagerDuty Advance, incident.io, Rootly, ServiceNow, Datadog Bits AI, AlertOps). June 2026 updates show AlertOps Chronicle achieving 80% time savings in automated postmortem drafting; PagerDuty Scribe Agent GA enabling real-time transcription and enriched postmortem summaries; Datadog postmortem lifecycle management (Draft/In Review/Completed) embedded as tier-1 platform infrastructure; Lightrun AI SRE generating evidence-based postmortems in regulated deployments (SOC 2/HIPAA). Yet the adoption gap persists. Hallucination accuracy has become the critical blocker: Seekr research documents production hallucination rates of 33–86% in agentic multi-step workflows vs. sub-1% benchmarks, directly contradicting vendor marketing. High-profile failure: KPMG Big Four consulting firm withdrew its AI-generated report (June 2026) after verification identified 40 of 45 citations as hallucinated, contradicted by named organizations (UBS, NHS, Swiss Railways, Transport for London)—evidence that verification frameworks are essential for professional-grade documentation. June 2026 data reveals organisational barriers dominate: 58% of enterprise CTOs name governance as the #1 blocker on AI agent projects; the binding constraints are accountability structures (named owner, escalation paths, audit trails, change management discipline), verification checkpoints, and accuracy assurance rather than model capability. Accuracy risks remain acute: 2026 hallucination benchmarks show 3.3–60% error rates, with June professional documentation failures (Sullivan & Cromwell court filings, KPMG report, Deloitte audit reports, EY cybersecurity analyses) signaling real cost when governance checkpoints are absent; AI-generated incident reports face evidentiary, privacy, and compliance gaps. Operational documentation for AI systems reveals new structural gaps: runbooks written by engineers with broad access are unexecutable by on-call SREs with restricted credentials; postmortems must now document agentic failure modes (silent degradation, policy drift, tool ambiguity) distinct from deterministic systems. The practice is vendor-ready and proven in select deployments, but organisational prerequisites are steep—on-call discipline, runbook testing procedures, governance frameworks governing prompt/model changes, evidence capture at the agentic execution layer, verification checkpoints for AI-generated content, and mature incident-reporting cultures—and most teams haven't established them.

CURRENT LANDSCAPE

The vendor ecosystem has crystallised into mature, GA-ready offerings. BMC HelixGPT (26.1), PagerDuty Advance, ServiceNow automated post-incident review agents, Rootly, Datadog Bits AI, incident.io, and AlertOps all ship production-ready features for runbook automation and postmortem generation. June 2026 updates: PagerDuty Scribe Agent now GA with real-time Zoom/Teams transcription and enriched postmortem summaries; Datadog DASH 2026 announced postmortem lifecycle management (Draft/In Review/Completed status) as embedded tier-1 infrastructure; Lightrun AI SRE generating evidence-based postmortems with validated reasoning chains in SOC 2/HIPAA deployments; AlertOps Chronicle auto-drafts complete incident reviews from alert data with 80% time savings; incident.io ecosystem maturity signal shows 9+ vendors shipping postmortem generation (Rootly, incident.io, Datadog, Opsrift, Arvo AI, ilert, DrDroid, PagerDuty, Atlassian). Real deployments deliver quantified returns in financial services and IT operations: Danske Bank achieved 300% resilience efficiency gains in runbook automation; SolarWinds measured 17.8% incident resolution time reduction across 2,000+ ITSM systems; incident.io customers report 37% MTTR reduction and $29,700 annual savings; Cutover platforms demonstrate 60% MTTR reduction and 50% fewer disruptions via AI-assisted runbook execution with human-in-the-loop governance, and 25–40% additional MTTR reduction via continuous learning from incidents. Proven deployment pattern emerging: AI Advisory Board documents runbooks as high-confidence AI use case where AI drafts from alert definitions and incident history, on-call engineers edit within first week, with SaaS firm scaling runbook coverage from 40% to 85% in 14 days. Real-world deployments also reveal acute failure modes: Runcycles documented 20+ AI agent incidents with costs ranging from $1.40 to $12,400 in direct spend and up to $50K+ business impact—exactly the failures runbooks should prevent. Hallucination accuracy risk has intensified as critical blocker: Seekr research documents production hallucination rates of 33–86% in agentic, multi-step reasoning workflows vs. sub-1% benchmarks, directly contradicting vendor marketing claims. High-profile failure case: KPMG Big Four consulting firm withdrew its AI-generated report (June 2026) after verification identified 40 of 45 citations as hallucinated or misleading, contradicted by named organizations (UBS, NHS, Swiss Railways, Transport for London), signaling that verification frameworks and governance checkpoints are essential rather than optional for enterprise-scale documentation generation. Structural operational gaps surface: AI runbooks written by engineers with broad access are operationally unexecutable by on-call SREs without credentials; runbooks for agentic systems must now capture decision artifacts (workflow IDs, policy gate results, tool-call traces, side-effect ledgers) distinct from deterministic systems. Governance frameworks crystallize with tiered autonomy models: confidence thresholds <0.60 require manual selection, 0.60–0.84 require human approval, ≥0.85 execute autonomously, with NIST AI Risk Management alignment and 95% accuracy targets. Operator discipline remains weak: April 2026 evidence shows operational toil increased 30% despite AI investment because teams deployed agents without runbook discipline; 69% of AI-powered decisions still require human verification, creating a "messy middle" where the automation layer was added but the manual layer wasn't removed. Post-mortem quality is systemically broken: most AI incident postmortems miss root causes by focusing on model hallucination when the real cause is credential misconfiguration—a systematic failure pattern in how teams analyze incidents. Large-firm AI adoption in IT operations has stalled at 12%, with only 14% of enterprises successfully scaling pilots to production. The binding constraints are organisational. Incident-reporting systems remain underused due to blame culture and reporting friction, starving AI models of training data. Most AI deployments lack the telemetry infrastructure (model versions, prompt logs, retrieval context, embedding versions) needed for effective forensic postmortems. Governance frameworks (terminology control, human review workflows, audit trails, verification checkpoints) are emerging as essential—without them, AI-generated reports cannot be audited or defended when disputes occur. Runbook discipline requires operational governance: access federation via OpenTelemetry, runbook authoring discipline enforcing execution-persona validation, and agentic-specific controls (blast radius definition, autonomy classification, rollback procedures). Successful deployments cluster where blameless postmortem cultures and strong incident-data hygiene already exist—the AI amplifies mature practices rather than compensating for absent ones.

TIER HISTORY

ResearchJun-2023 → Jul-2023
Bleeding EdgeJul-2023 → Apr-2025
Leading EdgeApr-2025 → present

EVIDENCE (117)

— Cutover demonstrates 25–40% MTTR reduction via autonomous runbook execution with continuous learning from incidents, real-time command center dashboards, and human validation checkpoints for high-risk changes.

— Critical analysis of production hallucination rates in agentic workflows: 33–86% on reasoning tasks vs. sub-1% benchmarks. Documents gap between marketing claims and real operational deployment risk in multi-step documentation generation.

— Runbooks categorized as high-confidence AI use case with proven deployment pattern: AI drafts from alert definitions and incident history; on-call engineers edit within first week. SaaS case study: AI runbook generation increased coverage from 40% to 85% in 14 days.

— Big Four firm withdrew its AI-generated report after verification identified 40 of 45 citations as hallucinated or misleading, contradicted by named organizations. Critical negative evidence: professional documentation generation at enterprise scale lacks adequate verification frameworks.

— Tiered autonomy framework (confidence thresholds 0.60–0.84 require human approval; ≥0.85 autonomous) with NIST AI Risk Management governance; SentienGuard achieves 95% runbook selection accuracy with safety practices including separate AI decision/execution layers.

— Cutover deployment achieving 60% MTTR reduction and 50% fewer disruptions via AI-assisted runbook execution, real-time audit trails, and post-incident learning with human-in-the-loop governance checkpoints.

— AlertOps Chronicle GA feature auto-drafts complete incident reviews from alert data with 80% time savings, automated timeline assembly, and pattern surfacing across recurring incidents.

— Production platform generating evidence-based postmortems with timeline, RCA, and resolution strategies from runtime evidence; validated reasoning chains against live behavior with MTTR improvement claims and SOC 2/HIPAA alignment in regulated deployments.

HISTORY

  • 2023-H1: Cloud vendors publishing runbook best practices; specialized incident automation tools showing 65-80% time savings in post-incident documentation; broad IT ops AI adoption (85%) but infrastructure unpreparedness (42%) and high failure rates in AI projects (95% failure rate cited) reveal significant maturity barriers.
  • 2023-H2: Major vendors releasing GA AI capabilities for operational documentation (Atlassian virtual agent in Jira Service Management); early production deployments observed (Domino's Pizza automating post-incident report summarization); persistent cost and scalability challenges limit broader adoption.
  • 2024-Q1: Datadog releases GA Bits AI for Incident Management; internal engineering case studies reveal LLM-assisted postmortem generation with technical challenges (hallucinations, non-determinism). Data quality emerges as critical constraint—research shows severe under-reporting in incident systems limiting AI model effectiveness. Enterprise genAI adoption accelerating (65% of U.S. enterprises) but ROI disappointing due to data quality and overestimated productivity gains.
  • 2024-Q2: PagerDuty announces generative AI capabilities for postmortem drafting and automation job authoring; AI-generated runbooks feature released. Research documents systemic incident reporting barriers in healthcare and broader AI project abandonment trends (48% paused/rolled back). Vendor ecosystem maturing but adoption constrained by data quality, integration complexity, and economics.
  • 2024-Q3: PagerDuty Advance achieves general availability with embedded genAI for full incident lifecycle including postmortem drafting. Datadog publishes engineering deep-dives on production LLM-assisted postmortem generation addressing cost and hallucination challenges. Cross-domain real-world deployments emerge (police incident reporting from bodycam audio). Gartner forecasts 30% abandonment of genAI projects by end of 2025 due to data quality and business value challenges. Regulatory gaps in AI incident reporting documentation identified.
  • 2024-Q4: PagerDuty releases self-hosted runbook automation GA with cost/efficiency claims; independent enterprise surveys show 30% of companies running genAI in production with IT ops as top use case. Real deployments accelerate (police departments using AI for incident reports achieve 60% time savings but surface accuracy and legal concerns). Critical OpenAI research documents fundamental AI reliability risks—systems systematically overstate knowledge, creating misinformation hazards in operational documentation. Experts warn AI-generated reports face evidentiary and privacy challenges, highlighting tension between efficiency gains and accuracy requirements.
  • 2025-Q1: Major vendors and independent companies deploy AI agents for post-mortem automation (DataDome's DomeScribe using AWS Bedrock); Atlassian survey shows 79% of incident teams exploring AI but 74% blocking expansion due to security concerns. Healthcare governance research identifies AI governance as #2 safety threat, with only 16% of hospital executives having systemwide policies—revealing critical adoption barriers. Practitioner frameworks emphasize safety controls and governance checkpoints for AI product launches. Independent assessments document widespread pilot abandonment with organizational (not technical) scaling barriers. Technology maturity confirmed but adoption constrained by governance gaps and organizational readiness challenges.
  • 2025-Q2: Enterprise AI agent adoption accelerates—51% of companies deployed agents with 94% expecting faster agentic adoption than GenAI (PagerDuty survey, April). Datadog publishes LLM optimization details for postmortem generation (100+ hours tuning, 12-minute to <1-minute improvement). Academic validation emerges (aviation post-accident analysis using AI/NLP on NTSB data). However, critical independent assessment documents high-stakes deployment failures in police report generation: hallucinations, false officer attribution, evidence warping, constitutional risks—signaling maturity gap between vendor readiness and safe deployment practices.
  • 2025-Q3: Vendor ecosystem expansion continues—ServiceNow releases automated post-incident review agentic workflow (July); PagerDuty GA Rundeck integration for auto-remediation (September). Yet enterprise adoption momentum visibly slows: large firm AI adoption declines from 14% to 12%; projects pause due to unmet ROI. Critical assessments proliferate: FERZ documents fundamental AI determinism gaps for compliance-critical contexts; ReasonVoyager/MIT analysis confirms 95% of GenAI projects deliver no measurable P&L impact; practitioners advocate production-first pilots. Post-incident review practices themselves validate (SRE deployment in retail improves MTTD/MTTR), but confidence in AI automation weakens. High-stakes deployment risks resurface—police AI reports continue hallucinating. Practice reaches technology maturity but faces persistent governance and ROI barriers to broader adoption.
  • 2025-Q4: Vendor GA maturity solidifies—Rootly automated postmortem generation (October); PagerDuty Logz.io AI RCA integration (November). Measurable production metrics emerge: SolarWinds analysis shows 17.8% incident resolution time reduction (4.87 hours/incident), but Rootly real-world data reveals 20% actual savings vs. 80% expected—highlighting persistent gap between vendor claims and outcomes. Adoption momentum remains flat; large-firm AI adoption stays at 12%. Organizational barriers dominate: governance gaps, data quality constraints, and unclear ROI measurement constrain adoption more than capability immaturity. Post-mortem practices gain renewed focus—blameless, learning-centered cultures increasingly recognized as foundational. High-stakes deployment accuracy risks persist (police AI hallucinations continue). Practice reaches mature vendor ecosystem stage but adoption remains selective, gated by organizational readiness and governance, not technology.
  • 2026-Jan: Vendor feature expansion continues: PagerDuty expands Scribe agent to Microsoft Teams for meeting transcription and postmortem drafting. Industry guidance solidifies on runbook automation (OneUptime) and blameless postmortem practices, emphasizing executable workflows with preserved human judgment and learning-focused incident reviews. Critical assessments surface accountability gaps in AI-generated incident documentation—missing decision trails and explanations create legal and operational risks when disputes occur. Adoption momentum remains constrained by evidence failures and governance requirements, not technology maturity.
  • 2026-Feb: incident.io publishes ROI analysis showing 37% MTTR reduction and $29,700 annual savings for automated post-mortem software, providing concrete quantification of deployment benefits. However, practitioner critical analysis (Devrim Ozcay) documents reliability gaps in single-model AI postmortems and advocates multi-model architectures for improved evidence verification. Vendor ecosystem maturity continues; adoption remains selective pending resolution of accuracy and governance challenges.
  • 2026-Mar/Apr: incident.io GA launch of AI-native post-mortems with one-click draft generation, accuracy review, and collaborative editing (March 17); BMC HelixGPT 26.1 GA post-mortem analyzer (March 31); Cutover documents real-world runbook automation ROI in banking—Danske Bank 300% efficiency gain, major U.S. bank 24-hour failover testing, investment firm 53% efficiency gain. Opsrift documents automated postmortem generation completing in under 60 seconds with 1-2 hours time savings per incident. Critical evidence surfaces accountability gap: Harper Foley analysis documents 10 production incidents from AI coding agents over 16 months with zero vendor postmortems published. Microsoft published formal guidance that AI incident response requires fundamentally restructured playbooks and runbooks due to non-determinism, speed, novel harm types, and cross-functional complexity; practitioner analysis shows operational toil rose 30% in 2025 despite AI investment because teams deployed agents without runbook discipline, with a three-tier autonomy model and guardrail architecture (identity/access, blast-radius checks, circuit breakers, audit trails) emerging as the production standard. Survey of 650 enterprise leaders finds 78% have AI pilots but only 14% achieved production scale, with runbook and monitoring gaps as the dominant root cause. Industry-average hallucination rate of ~20% confirmed as systemic barrier to reliable AI-generated postmortems in high-stakes contexts; Runcycles documented 20+ AI agent incidents costing up to $50K+ as the class of failure that structured runbooks should prevent. Governance frameworks emerge as essential: AnalystEngine identifies majority of AI deployments lacking telemetry infrastructure for forensic postmortems; TextUnited outlines control pillars (terminology, review workflows, audit trails) for safe AI-generated documentation in regulated contexts. Vendor ecosystem maturity confirmed; adoption remains constrained by organizational readiness (governance, data quality, forensic infrastructure, runbook discipline), not technology capability.
  • 2026-Late Apr/May: Vendor GA consolidation accelerates: PagerDuty Post-Incident Reviews GA (April 27, 2026) replaces legacy postmortems by October 2026; SRE Agent GA for autonomous investigation and incident triage; incident.io releases rapid feature iterations (post-mortem editor Dec 2025, SharePoint export, list views) signaling sustained product demand. Real-world deployments continue validating operational value while exposing architectural gaps: incident.io case study shows autonomous AI investigation and auto-drafted post-incident documentation from Slack/Zoom context (production-ready); Amazon internal outages Dec 2025-Mar 2026 reveal critical runbook discipline gap where agentic systems made production changes without documented playbooks, prompting mandatory senior review and guardrail implementation. Peer-reviewed Amazon Science research on human-in-the-loop runbook improvement with agentic support automation (published May 2026) provides academic-level validation that agentic runbook automation is a production-mature practice. Practitioner frameworks crystallize foundational gaps: Tian Pan's incident response playbook documents why traditional runbooks fail for non-deterministic AI systems (hallucination, silent model updates, routing errors, data corruption) and proposes revised triage tree with explicit model-versioning checks and 15-50% hallucination monitoring; systematic catalogs of AI agent failure modes now inform structured runbook design for containment and rollback. Architectural consensus emerges: runbook automation and post-mortem generation require AI-specific instrumentation (prompt version tagging, session IDs for stateful rollback, telemetry infrastructure), not generic documentation templates. A five-component AI runbook framework (blast radius, autonomy classification, triage, rollback, escalation) gains practitioner traction as the operational standard for AI agent incidents; Lightrun survey (2026) finds 43% of AI-generated code changes require production debugging, validating the need for structured runbooks specifically designed for AI agent failure modes. Cisco Talos IR published field lessons on AI-generated reporting, while independent CIO/CISO readiness assessments identify on-call rotations, runbook currency, and change management discipline as the foundational operating model requirements separating sustainable scaled deployments from key-person dependencies. AI postmortem generation confirmed as vendor-standard differentiator across 15+ platforms (2026 comprehensive ITSM evaluation). Technology maturity confirmed across deployment models; organizational prerequisites (runbook discipline, forensic infrastructure, governance frameworks) remain binding constraints on broader adoption.
  • 2026-June: Vendor GA maturity continues: PagerDuty Scribe Agent GA with real-time Zoom/Teams transcription for automated incident documentation capture and enriched postmortem summaries (June 3); Datadog DASH 2026 announces postmortem lifecycle management (Draft/In Review/Completed status) as embedded tier-1 infrastructure (June 9); Lightrun AI SRE generating evidence-based postmortems with validated reasoning chains from runtime evidence in SOC 2/HIPAA regulated deployments (June 10); AlertOps Chronicle launches AI-generated postmortem automation with 80% time savings (June 12); Cutover documents 25–40% additional MTTR reduction via autonomous runbook execution with continuous learning and real-time command center dashboards, with human validation checkpoints for high-risk changes (June 24). Proven deployment pattern validated: AI Advisory Board confirms runbooks as high-confidence AI use case where AI drafts from alert definitions, on-call engineers edit within first week, with a SaaS firm scaling runbook coverage from 40% to 85% in 14 days; tiered autonomy model (confidence ≥0.85 autonomous, 0.60–0.84 human-approved, <0.60 manual) with NIST AI RMF alignment documented as the production standard. Critical hallucination failures intensify as the dominant risk signal: Seekr documents production hallucination rates of 33–86% in agentic multi-step workflows versus sub-1% benchmarks; KPMG withdrew its AI-generated report after 40 of 45 citations were identified as hallucinated or contradicted by named organizations, confirming that verification checkpoints are prerequisites, not enhancements, for enterprise documentation. Adoption barriers remain organizational: governance (58% CTO blocker), telemetry infrastructure for forensics, on-call discipline, runbook testing, and mature incident cultures are prerequisites not yet widespread; verification checkpoints for AI-generated content become essential rather than optional as accuracy risks manifest in high-stakes professional contexts.