Operational documentation — runbooks & post-incident reports — IT Operations & Security

The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.

AI Maturity by Domain

Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail

DOMAIN

BLEEDING EDGEESTABLISHED

Operational documentation — runbooks & post-incident reports

LEADING EDGE

TRAJECTORY— Stalled

AI that generates and maintains operational runbooks and produces post-incident review reports. Includes automated playbook creation and blameless post-mortem drafting; distinct from incident response automation which executes actions rather than documenting them.

OVERVIEW

AI-generated runbooks and post-incident reports have crossed from experimental to vendor-ready, but organisational adoption remains selective and constrained by maturity gaps. The technology addresses a genuine pain point: manual runbooks decay as systems change, and post-mortems are routinely delayed or incomplete because documentation competes with recovery for engineers' time. Forward-leaning organisations are getting measurable value -- incident.io reports 37% MTTR reduction and $29,700 annual savings, SolarWinds data across 2,000+ ITSM systems shows 17.8% resolution time cuts, and financial services deployments (Danske Bank) report 300% efficiency improvements in runbook automation. The vendor ecosystem has solidified: BMC HelixGPT, PagerDuty Advance, incident.io, Rootly, ServiceNow, and others all ship production-ready features for automation and postmortem drafting. Yet April 2026 evidence reveals a darker reality: operational toil actually increased 30% in 2025 despite AI investment, teams deployed AI agents without runbook discipline, and post-mortem practices are plagued by systemic root-cause analysis failures—most AI incident postmortems blame model hallucination when the real cause is permission misconfiguration. Adoption stalls at organisational boundaries, not technical ones. Governance gaps, poor incident-data quality, and unclear ROI measurement constrain deployment more than capability shortfall. Accuracy risks are acute: industry-average hallucination rates of ~20% make AI postmortems unreliable for high-stakes contexts; single-model AI postmortems hallucinate details; forensic readiness gaps prevent effective incident investigation; and high-stakes deployments (police report generation) have surfaced false attributions and evidence distortions. The practice is vendor-mature and proven in select deployments, but the organisational prerequisites are steep -- mature incident-reporting cultures, strong data hygiene, runbook discipline, and governance frameworks -- and most teams haven't cleared them.

CURRENT LANDSCAPE

The vendor ecosystem has crystallised into mature, GA-ready offerings. BMC HelixGPT (26.1), PagerDuty Advance, ServiceNow automated post-incident review agents, Rootly, Datadog Bits AI, and incident.io all ship production-ready features for runbook automation and postmortem generation. incident.io's March 2026 GA launch includes one-click AI draft generation from Slack/Teams, timeline reconstruction from event data, and AI accuracy validation that flags missing or contradictory details. PagerDuty's Scribe agent drafts structured post-mortem summaries in Teams; Datadog generates postmortems in under a minute (after 100+ hours tuning); BMC's Helix performs 5-Why analysis and root cause extraction. Real deployments deliver quantified returns in financial services and IT operations: Danske Bank achieved 300% resilience efficiency gains in runbook automation; SolarWinds measured 17.8% incident resolution time reduction across 2,000+ ITSM systems; incident.io customers report 37% MTTR reduction and $29,700 annual savings. Real-world deployments also reveal acute failure modes: Runcycles documented 20+ AI agent incidents with costs ranging from $1.40 to $12,400 in direct spend and up to $50K+ business impact—exactly the failures runbooks should prevent. Operator discipline is collapsing: April 2026 evidence shows operational toil increased 30% despite AI investment because teams deployed agents without runbook discipline; 69% of AI-powered decisions still require human verification, creating a "messy middle" where the automation layer was added but the manual layer wasn't removed. Post-mortem quality is systemically broken: most AI incident postmortems miss root causes by focusing on model hallucination when the real cause is credential misconfiguration—a systematic failure pattern in how teams analyze incidents. Large-firm AI adoption in IT operations has stalled at 12%, with only 14% of enterprises successfully scaling pilots to production. Hallucination remains endemic: industry-average rate of ~20% (1 error per 5 queries) and legal AI research showing 65–43% accuracy rates highlight accuracy risks in high-stakes operational documentation. The binding constraints are organisational. Incident-reporting systems remain underused due to blame culture and reporting friction, starving AI models of training data. Most AI deployments lack the telemetry infrastructure (model versions, prompt logs, retrieval context, embedding versions) needed for effective forensic postmortems. In regulated and high-stakes contexts, accuracy risks are acute: police AI report generation has produced hallucinated officer attributions and evidence distortions; compliance-critical environments face a fundamental tension between probabilistic AI outputs and deterministic documentation requirements. Governance frameworks (terminology control, human review workflows, audit trails) are emerging as essential, not optional. Runbook discipline—structured, testable, maintainable procedures—is foundational. Successful deployments cluster where blameless postmortem cultures and strong incident-data hygiene already exist -- the AI amplifies mature practices rather than compensating for absent ones.

TIER HISTORY

ResearchJun-2023 → Jul-2023

Bleeding EdgeJul-2023 → Apr-2025

Leading EdgeApr-2025 → present

EVIDENCE (92)

Human-in-the-loop runbook improvement with agentic support automationResearch Papers2026-05-14

— Peer-reviewed Amazon Science research on AI agents improving runbooks in operational incident resolution systems; validates agentic runbook automation at academic credibility level and demonstrates core practice maturity in research publications.

Top 10 Model Incident Management Tools: Features, Pros, Cons & ComparisonProduct Launches2026-05-14

— Comparison of 10 production AI incident management platforms documenting ecosystem maturity; catalogs AI-specific runbook and postmortem capabilities (drift monitoring, hallucination detection, prompt tracing, governance) as standard platform features.

AI Incident Response Runbook: RCA for LLM Failures (2026)Tutorials2026-05-12

— Comprehensive runbook design guide for LLM incident response covering severity classification, detection signals, containment primitives, and RCA templates adapted for AI failure classes; foundational operational documentation for AI system reliability.

Postmortem: Our AI-Powered Chatbot Hallucinated Sensitive Data – Root Cause and FixCase Studies2026-05-08

— Detailed production postmortem documenting AI incident RCA with four root causes, 3-layer guardrail remediation, and validation metrics (0 incidents over 4.2M requests); demonstrates mature post-incident documentation practices adapted for AI system failures.

12 Ways AI Agents Fail in ProductionOpinion2026-05-07

— Systematic catalog of 12 AI agent failure modes from incident analysis with detection signals and operational containment procedures; directly informs runbook design for identifying and mitigating production failures.

AI for Production Engineering — Who Gets to Build ItOpinion2026-05-07

— Independent analysis of vendor positioning and postmortem/runbook data as training-signal moat; identifies incident meeting transcripts as emerging corpus shift and highlights verification gap in AI-generated incident documentation.

AI Agent Disaster Postmortems: The 3 Structural GuardrailsCase Studies2026-05-03

— Analysis of two named AI agent incidents (PocketOS database wipe, auth system rewrite) with root cause analysis and three operational controls (snapshots, least-privilege, mandatory checkpoints) that runbooks must enforce to prevent catastrophic failures.

Designing Rollbacks for AI Automation: Building LLM Workflows That Can Be CorrectedOpinion2026-05-02

— Technical framework for designing reversible effects and rollback procedures in AI automation workflows; core architectural pattern for incident recovery documentation and runbook design in AI systems.

HISTORY

2023-H1: Cloud vendors publishing runbook best practices; specialized incident automation tools showing 65-80% time savings in post-incident documentation; broad IT ops AI adoption (85%) but infrastructure unpreparedness (42%) and high failure rates in AI projects (95% failure rate cited) reveal significant maturity barriers.
2023-H2: Major vendors releasing GA AI capabilities for operational documentation (Atlassian virtual agent in Jira Service Management); early production deployments observed (Domino's Pizza automating post-incident report summarization); persistent cost and scalability challenges limit broader adoption.
2024-Q1: Datadog releases GA Bits AI for Incident Management; internal engineering case studies reveal LLM-assisted postmortem generation with technical challenges (hallucinations, non-determinism). Data quality emerges as critical constraint—research shows severe under-reporting in incident systems limiting AI model effectiveness. Enterprise genAI adoption accelerating (65% of U.S. enterprises) but ROI disappointing due to data quality and overestimated productivity gains.
2024-Q2: PagerDuty announces generative AI capabilities for postmortem drafting and automation job authoring; AI-generated runbooks feature released. Research documents systemic incident reporting barriers in healthcare and broader AI project abandonment trends (48% paused/rolled back). Vendor ecosystem maturing but adoption constrained by data quality, integration complexity, and economics.
2024-Q3: PagerDuty Advance achieves general availability with embedded genAI for full incident lifecycle including postmortem drafting. Datadog publishes engineering deep-dives on production LLM-assisted postmortem generation addressing cost and hallucination challenges. Cross-domain real-world deployments emerge (police incident reporting from bodycam audio). Gartner forecasts 30% abandonment of genAI projects by end of 2025 due to data quality and business value challenges. Regulatory gaps in AI incident reporting documentation identified.
2024-Q4: PagerDuty releases self-hosted runbook automation GA with cost/efficiency claims; independent enterprise surveys show 30% of companies running genAI in production with IT ops as top use case. Real deployments accelerate (police departments using AI for incident reports achieve 60% time savings but surface accuracy and legal concerns). Critical OpenAI research documents fundamental AI reliability risks—systems systematically overstate knowledge, creating misinformation hazards in operational documentation. Experts warn AI-generated reports face evidentiary and privacy challenges, highlighting tension between efficiency gains and accuracy requirements.
2025-Q1: Major vendors and independent companies deploy AI agents for post-mortem automation (DataDome's DomeScribe using AWS Bedrock); Atlassian survey shows 79% of incident teams exploring AI but 74% blocking expansion due to security concerns. Healthcare governance research identifies AI governance as #2 safety threat, with only 16% of hospital executives having systemwide policies—revealing critical adoption barriers. Practitioner frameworks emphasize safety controls and governance checkpoints for AI product launches. Independent assessments document widespread pilot abandonment with organizational (not technical) scaling barriers. Technology maturity confirmed but adoption constrained by governance gaps and organizational readiness challenges.
2025-Q2: Enterprise AI agent adoption accelerates—51% of companies deployed agents with 94% expecting faster agentic adoption than GenAI (PagerDuty survey, April). Datadog publishes LLM optimization details for postmortem generation (100+ hours tuning, 12-minute to <1-minute improvement). Academic validation emerges (aviation post-accident analysis using AI/NLP on NTSB data). However, critical independent assessment documents high-stakes deployment failures in police report generation: hallucinations, false officer attribution, evidence warping, constitutional risks—signaling maturity gap between vendor readiness and safe deployment practices.
2025-Q3: Vendor ecosystem expansion continues—ServiceNow releases automated post-incident review agentic workflow (July); PagerDuty GA Rundeck integration for auto-remediation (September). Yet enterprise adoption momentum visibly slows: large firm AI adoption declines from 14% to 12%; projects pause due to unmet ROI. Critical assessments proliferate: FERZ documents fundamental AI determinism gaps for compliance-critical contexts; ReasonVoyager/MIT analysis confirms 95% of GenAI projects deliver no measurable P&L impact; practitioners advocate production-first pilots. Post-incident review practices themselves validate (SRE deployment in retail improves MTTD/MTTR), but confidence in AI automation weakens. High-stakes deployment risks resurface—police AI reports continue hallucinating. Practice reaches technology maturity but faces persistent governance and ROI barriers to broader adoption.
2025-Q4: Vendor GA maturity solidifies—Rootly automated postmortem generation (October); PagerDuty Logz.io AI RCA integration (November). Measurable production metrics emerge: SolarWinds analysis shows 17.8% incident resolution time reduction (4.87 hours/incident), but Rootly real-world data reveals 20% actual savings vs. 80% expected—highlighting persistent gap between vendor claims and outcomes. Adoption momentum remains flat; large-firm AI adoption stays at 12%. Organizational barriers dominate: governance gaps, data quality constraints, and unclear ROI measurement constrain adoption more than capability immaturity. Post-mortem practices gain renewed focus—blameless, learning-centered cultures increasingly recognized as foundational. High-stakes deployment accuracy risks persist (police AI hallucinations continue). Practice reaches mature vendor ecosystem stage but adoption remains selective, gated by organizational readiness and governance, not technology.
2026-Jan: Vendor feature expansion continues: PagerDuty expands Scribe agent to Microsoft Teams for meeting transcription and postmortem drafting. Industry guidance solidifies on runbook automation (OneUptime) and blameless postmortem practices, emphasizing executable workflows with preserved human judgment and learning-focused incident reviews. Critical assessments surface accountability gaps in AI-generated incident documentation—missing decision trails and explanations create legal and operational risks when disputes occur. Adoption momentum remains constrained by evidence failures and governance requirements, not technology maturity.
2026-Feb: incident.io publishes ROI analysis showing 37% MTTR reduction and $29,700 annual savings for automated post-mortem software, providing concrete quantification of deployment benefits. However, practitioner critical analysis (Devrim Ozcay) documents reliability gaps in single-model AI postmortems and advocates multi-model architectures for improved evidence verification. Vendor ecosystem maturity continues; adoption remains selective pending resolution of accuracy and governance challenges.
2026-Mar/Apr: incident.io GA launch of AI-native post-mortems with one-click draft generation, accuracy review, and collaborative editing (March 17); BMC HelixGPT 26.1 GA post-mortem analyzer (March 31); Cutover documents real-world runbook automation ROI in banking—Danske Bank 300% efficiency gain, major U.S. bank 24-hour failover testing, investment firm 53% efficiency gain. Opsrift documents automated postmortem generation completing in under 60 seconds with 1-2 hours time savings per incident. Critical evidence surfaces accountability gap: Harper Foley analysis documents 10 production incidents from AI coding agents over 16 months with zero vendor postmortems published. Microsoft published formal guidance that AI incident response requires fundamentally restructured playbooks and runbooks due to non-determinism, speed, novel harm types, and cross-functional complexity; practitioner analysis shows operational toil rose 30% in 2025 despite AI investment because teams deployed agents without runbook discipline, with a three-tier autonomy model and guardrail architecture (identity/access, blast-radius checks, circuit breakers, audit trails) emerging as the production standard. Survey of 650 enterprise leaders finds 78% have AI pilots but only 14% achieved production scale, with runbook and monitoring gaps as the dominant root cause. Industry-average hallucination rate of ~20% confirmed as systemic barrier to reliable AI-generated postmortems in high-stakes contexts; Runcycles documented 20+ AI agent incidents costing up to $50K+ as the class of failure that structured runbooks should prevent. Governance frameworks emerge as essential: AnalystEngine identifies majority of AI deployments lacking telemetry infrastructure for forensic postmortems; TextUnited outlines control pillars (terminology, review workflows, audit trails) for safe AI-generated documentation in regulated contexts. Vendor ecosystem maturity confirmed; adoption remains constrained by organizational readiness (governance, data quality, forensic infrastructure, runbook discipline), not technology capability.
2026-Late Apr/May: Vendor GA consolidation accelerates: PagerDuty Post-Incident Reviews GA (April 27, 2026) replaces legacy postmortems by October 2026; SRE Agent GA for autonomous investigation and incident triage; incident.io releases rapid feature iterations (post-mortem editor Dec 2025, SharePoint export, list views) signaling sustained product demand. Real-world deployments continue validating operational value while exposing architectural gaps: incident.io case study shows autonomous AI investigation and auto-drafted post-incident documentation from Slack/Zoom context (production-ready); Amazon internal outages Dec 2025-Mar 2026 reveal critical runbook discipline gap where agentic systems made production changes without documented playbooks, prompting mandatory senior review and guardrail implementation. Peer-reviewed Amazon Science research on human-in-the-loop runbook improvement with agentic support automation (published May 2026) provides academic-level validation that agentic runbook automation is a production-mature practice. Practitioner frameworks crystallize foundational gaps: Tian Pan's incident response playbook documents why traditional runbooks fail for non-deterministic AI systems (hallucination, silent model updates, routing errors, data corruption) and proposes revised triage tree with explicit model-versioning checks and 15-50% hallucination monitoring; systematic catalogs of AI agent failure modes now inform structured runbook design for containment and rollback. Architectural consensus emerges: runbook automation and post-mortem generation require AI-specific instrumentation (prompt version tagging, session IDs for stateful rollback, telemetry infrastructure), not generic documentation templates. Technology maturity confirmed across deployment models; organizational prerequisites (runbook discipline, forensic infrastructure, governance frameworks) remain binding constraints on broader adoption.