The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.
A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.
Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail
AI that generates product requirements documents, user stories, and acceptance criteria from research, feedback, and stakeholder input. Includes PRD drafting and story decomposition; distinct from feature prioritisation which ranks rather than defines requirements.
AI-powered requirements generation can produce credible first drafts of PRDs, user stories, and acceptance criteria in minutes rather than days. It cannot yet be trusted to do so autonomously. That gap — between impressive artifact acceleration and reliable production deployment — defines this practice’s stalled position at the leading edge.
The tooling ecosystem has now consolidated into two categories: general-purpose LLM prompting (ChatGPT, Claude, Gemini) and specialized platforms (ChatPRD, Sanciti RGEN, spec-generator, Copilot Specs). By June 2026, both paths show production maturity: second-generation Claude Code skills (174.7k installs on official marketplace), enterprise platforms deriving specs from codebase behavior (Sanciti AI RGEN), and integrated platform support (Atlassian Intelligence, Linear AI) all evidence that specification tooling has moved beyond proof-of-concept. However, the practice reveals two offsetting forces. First, organizational constraints now dominate tool constraints: peer-reviewed production studies document that tool integration (not tool capability) is the binding constraint; successful deployments at organizations with explicit specification discipline (Stripe, Microsoft, Notion, Figma, Coinbase, Ramp) embed AI-assisted PRDs at 25–50% time savings. Second, hallucination rates across foundation models (22–94% per Stanford AI Index benchmark) establish that comprehensive specifications are mandatory control points downstream: clear, detailed requirements prevent AI from solving the wrong problem. Institutions with explicit conventions, fallback patterns, and eval-driven acceptance criteria ship AI-generated requirements successfully. Institutions without these guards experience architectural drift, consistency problems, and 43% production failure rates with 2–4× cleanup overhead. A concrete failure case (South Africa government withdrew official National AI Policy in May 2026 after discovering fabricated citations in AI-generated documentation) demonstrates that hallucination is not a theoretical concern. This signals the practice has bifurcated: organizations with requirements discipline can leverage AI effectively to accelerate specification work; organizations treating AI as a solution to weak specification practices amplify their existing problems while compounding hallucination risk.
Production deployments have stabilized at two adoption tiers with distinct outcomes: organizations with pre-existing specification discipline (Stripe, Microsoft, Notion, Figma, Coinbase, Ramp) routinely embed AI-generated PRDs into workflows with 25–50% time savings; organizations without specification discipline experience 43% production failure rates and 2–4× cleanup overhead post-AI adoption. June 2026 evidence from peer-reviewed production deployment studies (XITASO) documents both tiers concretely and identifies the constraint: tool integration (not tool capability) is binding—where integration is in place (SaaS platforms, IDE plugins, workflow automation), time savings are dramatic; where missing, teams fall back on manual workarounds. The successful deployment pattern is visible across vendors: second-generation Claude Code skills on official marketplace (174.7k installs as of June 3, 2026), Sanciti AI RGEN platform generating requirements from codebase behavior, and ChatPRD’s 71+ case studies from major companies all show that AI excels at artifact acceleration when requirements are explicit before generation starts. Amplitude’s Moda tool generates PRDs from single-sentence prompts by encoding implicit product context into team knowledge base; the mechanism is specification discipline, not AI capability. Ramp scaled GitHub Copilot to 300+ engineers (30% productivity gains) by explicitly framing code as byproduct of good specs. Recent case studies (Boldare, May 2026) confirmed this pattern: embedding Claude Code into a 6-person engineering team produced Architecture Decision Records as a natural byproduct of development (solving documentation debt), increased test coverage from 85% to 95%, and delivered 31% sprint velocity gains. The enabling factor was explicit discipline in requirements documentation—not tool capability alone.
The liability has also become more sharply defined: hallucination is structural to generative AI, not a temporary quality issue. Stanford AI Index benchmark (June 2026) of 26 foundation models reveals hallucination rates from 22–94%; 74% of surveyed companies cite AI inaccuracy as their top AI risk (up 14 percentage points year-over-year). A concrete failure case (South Africa government withdrew official National AI Policy draft in May 2026 after discovering fabricated citations) demonstrates that hallucination threatens production documentation. This establishes comprehensive specifications as mandatory control points downstream: clear, detailed requirements prevent AI from solving the wrong problem. A product leader’s analysis documents how AI-generated code passed all tests and staging but failed in production (43% of AI-generated code requires manual debugging post-QA per Lightrun 2026). The root cause is structural: when AI writes both code and tests, tests only prove internal consistency, not correctness against requirements. O’Reilly’s maxim captures the gap: “AI is really good at writing correct code that does the wrong thing.” This validates why clear PRDs, requirements, and user stories matter—they bridge the specification gap between vague instructions and actual intended behavior. A technical architect’s analysis identified three gaps in how design intent is communicated to AI: (1) implicit conventions (naming rules, preferred patterns, security invariants) stay implicit and require 2–4× review cycles to enforce; (2) specs describe WHAT, not HOW/WHY—architectural decisions and constraints are not written down, so AI generates plausible code violating unstated assumptions; (3) no feedback loop—a single human review doesn’t update AI context for next task, so volume (50 AI-generated files) creates architectural consistency problems faster than review can catch them. The solution evident in deployments: persistent context files (CLAUDE.md pattern) encoding institutional knowledge—naming conventions, preferred patterns, security invariants, accepted tradeoffs—that survives across conversations.
Specialized guidance for AI feature PRDs has emerged as a critical practice gap. An AI engineering founder documented why deterministic PRD templates fail for stochastic AI systems: they assume inputs map to known outputs via enumerable function; QA can test ‘same input = same output’; acceptance criteria are binary checklists. None hold for probabilistic AI. The fix requires four sections baked into PRDs pre-code: (1) Behavior Matrix—rows = input classes (e.g., ‘ticket with screenshots’, ‘non-English ticket’, ‘out-of-domain topic’), columns = failure modes (target, acceptable degradation, forbidden). (2) Eval-Set Callout—named/versioned dataset, metric function (exact match/BLEU/rubric), threshold with rollback score. (3) Fallback Specification—what user sees on refusal, timeout, tool failure. (4) Prompt Ownership—who can change behavior post-ship with audit trail. These sections are absent from traditional PRD templates; their presence signals organizational readiness for AI feature deployment. The vendor ecosystem (ChatPRD with 50,000+ PM users, Sanciti RGEN, ReqSpell, Atlassian Intelligence, Linear AI) and tooling (spec-generator with validation gates, Copilot Specs with traceability, GitHub Spec Kit, Kiro) all now expect this structure. Yet adoption remains bifurcated: organizations with specification discipline deploy successfully; others amplify existing weak practices through automation and compound hallucination risk. Practitioner guidance increasingly emphasizes verifiable completion conditions (Claude Code /goal command), treating PRDs as active directing documents rather than overhead. Enterprise adoption research (Marlabs, June 2026) identifies 79% face production challenges when alignment/specification phase fails; organizations that define clear requirements, success metrics, and stakeholder buy-in before development show dramatically better outcomes. Conversely, Gartner identifies “unclear business value” and “inadequate risk controls” as top reasons GenAI pilots are abandoned—both failure modes map directly to insufficient requirements discipline. The industry response—platforms adding rubric-based success criteria, eval-set callouts with versioned datasets, multi-layer validation gates, and explicit traceability—signals recognition that requirement quality and organizational discipline (not tool maturity) determine whether AI-assisted specification accelerates or amplifies existing problems.
— Theory of Constraints analysis: code generation cost dropped, bottleneck shifted to requirements and ideation. Vague specs now carry higher cost when rework is free; precision in PRDs now directly impacts ROI of AI-assisted development.
— Anthropic production data shows Claude authoring 80%+ of merged code; productivity gains require spec writing as primary deliverable and verification infrastructure. Validates that specification quality is the binding constraint in AI-assisted code productivity.
— Expert PRD tool founder tests Claude Fable 5, revealing specific quality limitations: verbose dense paragraphs difficult to parse, conservative scope decisions. Documents maturity ceiling in latest frontier model for practical PRD/spec generation work.
— Product GA of AI Creator Lab with embedded PRD generator for hardware/manufacturing domain. Natural-language-to-structured-requirements pipeline deployed at scale in regulated, production-critical context.
— Concrete implementation guide for PRD-as-scope-contract in AI code generation: lock scope before prompting, plan vertical slices, gate every compile. Shows evolved PRD role preventing hallucination-driven scope creep in agentic workflows.
— Foundational analysis: hallucination is inherent to probabilistic token generation; deterministic architectural solutions (retrieval, operator selection) required where requirements accuracy matters. Identifies categories where training cannot fix hallucination.
— Practitioner guide on specification-driven agent work; documents cognitive shift from vague prompting to explicit acceptance criteria and completion conditions. Shows evolution in how PRDs and requirements are adapted for AI-assisted workflows.
— Claude Code skill for PRD generation shipped May 16, 2026; 174.7k installs as of June 3, 2026 demonstrate production deployment and active adoption of AI-assisted PRD generation in mainstream IDE ecosystem.
2023-H1: Early tooling and proof-of-concept. GPT-4 adoption in individual scrum teams; open-source repo traction (ai-driven-userstories). Academic research identified gaps in RE for AI systems. Vendor launches (ClickUp AI) signaled product ecosystem activity but lacked independent validation. Negative signal: 95% AI project failure rate; widespread hallucination and quality issues in AI-generated content.
2023-H2: Vendor ecosystem expansion and quality challenges emerge. New PRD-specific tools (WriteMyPrd, PMAI) launched as specialized offerings, signaling niche market opportunity. However, practitioner and industry analysis reinforced fundamental limitations: LLMs remained prone to hallucinations and "blatant" factual inaccuracy, making requirements generation risky without heavy human review. Real-world deployments remained small-scale and team-level, constrained by verification burden and quality gaps.
2024-Q1: Specialized vendor tooling launches and ecosystem maturation. ProVibe released an AI PRD Generator claiming 95% accuracy and 90% time savings, indicating vendor confidence in the segment. Deployments remained team-scale, primarily using GPT-4 for BDD scenario generation and acceptance criteria. Quality and verification challenges persisted as the core adoption barrier—no independent validation of vendor claims and no established workflow for verifying AI-generated requirements reliably.
2024-Q2: Continued ecosystem growth with mixed academic validation. Multiple vendors (Boggl.ai, WriteMyPRD) launched specialized products; practitioner deployments expanded (ArcTouch case study, Makemytrip/OROLabs adoption). Academic research confirmed early-stage maturity: tertiary study of 28 secondary RE studies documented rising LLM adoption but persistent data/evaluation gaps; empirical study showed AI potential for requirement classification but consistent misinterpretation risks; systematic review of user story generation identified insufficient quality guidelines. Verification burden and hallucination risk remained central barriers to autonomous deployment at scale.
2024-Q3: Specialized tooling expansion alongside critical assessment of deployment barriers. Klariti launched custom User Story GPT; ClickUp integrated AI-powered requirement generation into core product. Survey data showed 1M+ GitHub Copilot paying customers, indicating widespread adoption of AI-assisted development. However, converging research highlighted structural barriers: systematic mapping study of 126 RE studies documented persistent challenges in specification, explainability, and engineer-user gaps; IEEE peer-reviewed analysis identified seven primary reasons AI projects fail; Gartner forecast 30% of GenAI projects abandoned post-PoC by end of 2025 due to data quality, cost, and ROI concerns. Evidence indicated the practice remained experimental despite tooling maturity—organizations used AI-generated requirements as drafts for heavy human review rather than autonomous outputs.
2024-Q4: Continued vendor ecosystem expansion with persistent production-readiness barriers. New AI-native tooling emerged (Musely, RapidPRD, Rock-n-Roll) focusing on specialized PRD and user story generation, alongside practical methodology guides (ChatPRD). Vendor claims included 500+ PRDs generated and 94% completeness metrics. However, large-scale adoption data revealed fundamental deployment challenges: Economist Impact survey of 1,100 executives found 85% of enterprises testing GenAI but only 37% confident in production-readiness of GenAI applications (29% among practitioners), with 60% of UK enterprises admitting GenAI use cases had not reached production. Critical analysis documented that only 44% of businesses had an AI strategy despite 76% fearing competitive disadvantage, with specific cases of tool abandonment due to poor performance. The practice remained characterized by high experimentation and vendor innovation alongside widespread organizational barriers to reliable autonomous deployment.
2025-Q1: Continued vendor ecosystem expansion paired with reinforced research findings on quality challenges. New PRD generators (MakePRD, AIPRD) launched with efficiency claims; OpenAI's Product Lead published practical guidance on scaling AI-powered products emphasizing human-AI collaboration; independent developers released open-source agentic systems for requirements automation. However, two parallel systematic reviews of 105 papers each (published Q1 2025) documented persistent challenges: interpretability (61.9%), hallucination (44.8%), reproducibility (52.4%), controllability (47.6%). Carnegie Mellon SEI (February 2025) reaffirmed explainability as critical for mission-critical applications. The research consensus confirmed the practice remained experimental—tooling availability grew, but organizations continued treating AI-generated requirements as drafts requiring heavy human verification rather than autonomous outputs.
2025-Q2: Accelerated real-world deployment and practitioner adoption with persistent quality assurance concerns. Three independent case studies documented production-scale deployments: Leanware launched PRD Agent (OpenAI backend, achieving minute-level turnaround vs. hours); Tietoevry deployed Findwise I3-LLM solution for automotive requirements with hundreds of requirements processed; Ryan Lewis published detailed PoC for MCP server PRD, showing AI-identified architectural gaps. Open-source ecosystem matured: WillBooster's gen-pr tool demonstrated measurable real-world impact (24% of edits auto-generated across 91 PRs). Academic review (105 studies, ICEIS 2025) confirmed AI techniques are the most frequent solution for user story quality issues. However, critical practitioner assessment (Dean Peters, April 2025) documented specific failure modes: AI stories parrot templates, fail to probe context, sprawl rather than split, and generate "emotionally intelligent as drywall" acceptance criteria. Deployment remained constrained by verification burden and quality assurance—organizations increasingly used AI as drafting tool but required heavy human review before production handoff.
2025-Q3: Measurable progress in artifact quality offset by discovered deployment gaps. SBES 2025 peer-reviewed study demonstrated 87.5% of AI-generated user stories meeting quality criteria using structured US-Prompt technique (457 stories, 24 participants). Independent case studies showed specific improvements: Thoughtworks test generation from user stories achieved 87% correctness and 98.67% acceptance criteria coverage. Industry survey (Jellyfish, 600+ engineers) documented 90% adoption of AI tools and 62% reporting 25%+ productivity gains. However, measured versus self-reported productivity diverged sharply: practitioners reported +20% speed but measured data showed -19% net productivity loss due to AI over-engineering and integration failures. Bain & Company report revealed modest 10-15% actual productivity gains despite broad deployment, with METR study indicating AI tools made developers slower through error correction burden. Specific deployment failure: Babylon Health's AI-generated requirements for triage chatbot led to 23% abandonment rate. Q3 2025 consensus: organizations successfully accelerated artifact generation (user stories, acceptance criteria) but remained dependent on human verification for integration and strategy decisions. The practice demonstrated incremental progress in structured workflows but unresolved barriers to autonomous deployment and strategic decision-making.
2025-Q4: Vendor ecosystem maturation and specification-driven deployment at scale. New platform launches (ReqSpell with automated traceability, consistency enforcement, and requirement-to-code linkage) signaled continued niche innovation. Ramp deployed GitHub Copilot to 300+ engineers with 30% productivity gains, explicitly framing development as specification-driven: code as a byproduct of good specs. Industry data showed 70% of high-performing companies using AI for backlog grooming with 38% overhead reduction. Open-source ecosystem expansion: new MIT-licensed PRD generation packages and lightweight MCP servers demonstrated sustained developer momentum. Practitioner guidance (end-of-year tutorials) repositioned AI as collaborative "thinking partner" for clarifying intent and generating acceptance criteria within human-led refinement workflows. Q4 consensus remained consistent: tooling maturity increased, deployment scale increased (300+ engineers at Ramp), but the fundamental constraint persisted—organizations could reliably automate artifact generation (stories, criteria, traceability) using structured techniques, yet could not automate the strategic judgment required to determine what to build and whether specifications would deliver product value. The practice demonstrated continued incremental progress in structured artifact workflows but remained dependent on human expertise for strategic decisions.
2026-Jan: Organizational adoption barriers overshadow tooling maturity. Amplitude deployed internal Moda tool generating PRDs from single-sentence prompts in single meetings (vs. weeks), demonstrating achievable productivity gains within controlled enterprise contexts. New vendor products launched (TinyPRD for IDE integration) signaling continued niche market expansion. However, industry data became increasingly sobering: RAND, Gartner, and Deloitte reports documented 80%+ of AI projects never reach production, 40% abandoned post-PoC by 2027. Critical practitioner analysis revealed persistent production-readiness barriers: AI systems drift, lose consistency, contradict earlier outputs, and require extensive guardrails. ClickUp and other platforms added AI-driven automation capabilities, yet adoption remained contingent on human verification and integration with existing signals (customer feedback, issue trackers). Early 2026 consensus: the practice showed measurable improvements in tooling usability and some adoption in specification-driven contexts (Amplitude, 70% of high performers), but persistent deployment failures and integration complexity prevented ecosystem-wide advancement. Organizations successfully accelerated artifact generation in controlled workflows but remained unable to achieve autonomous requirements engineering at scale due to production-readiness and organizational integration barriers.
2026-Feb: Practical workflows mature amid trust and reliability concerns. Vendor ecosystem stabilized with established platforms (ClickUp, ReqSpell, Lane, Kuse.ai, TinyPRD) offering AI-powered PRD/user story generation; structured prompting frameworks adopted as standard practice ("AI as thinking partner"). Industry data documented persistent adoption barriers: 80.3% overall AI project failure rate and 95% GenAI pilot-to-production failure across 2,400+ enterprise initiatives; developer trust in AI dropped to 29% despite 84% adoption, with hallucinations and reliability cited as production barriers. Product management evolution confirmed: 59% of executives prioritized business strategy over AI fluency, signaling AI's role as drafting accelerator rather than autonomous decision-maker. Real-world deployments continued in controlled contexts (Amplitude, Ramp) but scaled adoption remained contingent on organizational readiness and verification workflows. February 2026 consensus: ecosystem achieved practical maturity in artifact generation with established vendor offerings and structured workflows, yet trust gaps and organizational barriers remained the core constraint on autonomous deployment.
2026-Mar: Framework evolution and enterprise-scale platform integration. Practitioner assessments identified fundamental gap in requirements generation: generic LLM prompting achieves only 80% completeness, requiring structural discipline for organizational-scale deployment. New frameworks emphasize machine-readable requirements: 13-section PRD structure optimized for AI code generation tools (Cursor, Claude Code, Bolt); Gherkin Given-When-Then format for test automation; explicit behavioral constraints and fallback behaviors for AI-feature requirements. OpenAI/Anthropic trained practitioners documented that AI-generated PRDs for AI features require eval thresholds, failure mode specification, and behavioral contracts—fundamentally different from human-focused specifications. Major platform vendors released native AI for requirements: Atlassian Intelligence (Jira GA) for user story breakdown and content generation across Standard/Premium/Enterprise tiers; Linear AI for issue creation from natural language and work planning. Enterprise deployments expanded: LaunchDarkly deployed PRD-as-code with background agents (Devin) integrated via MCPs; financial institution achieved 30% documentation time reduction using AI copilots for AML/KYC workflows. March 2026 evidence demonstrates ecosystem progression toward requirements engineering as code, with structured frameworks addressing the gap between artifact acceleration and autonomous requirements engineering. Constraint remains organizational: tooling maturity has increased, but verification workflows and requirements discipline (not tool availability) determine autonomous deployment viability.
2026-Apr: AI-native PRD standards and platform evolution accelerate. A 25-year practitioner (Microsoft/Accenture) declared traditional PRDs "dangerously inadequate" for AI products—AI-enhanced through autonomous-agent products require specification of probabilistic outputs, silent failure modes, eval-driven acceptance criteria, and rented-intelligence risks from model provider updates. Planning tools (Kiro, GitHub Spec Kit) and dedicated planning modes across major agents (Claude Code, Cursor, Windsurf) elevated specification to a first-class step in AI coding workflows. Atlassian responded architecturally with Rovo semantic layer and Teamwork Graph (100B+ objects) enabling Claude/Cursor/Gemini agents to reason about requirements natively. Startup CPO survey confirmed Claude as dominant PRD generation tool (50% share) with quality concerns cited by 62.5% as primary barrier. Critical governance signal: Pendo analysis (80% of shipped features rarely used, USD 29.5B wasted R&D) and Jira Rovo auto-ticket generation from meetings raised concern that AI-accelerated output volume without human curation risks institutionalising feature factories. Production deployments validated specification-driven workflows: CloudZero deployed Claude for PRD generation and prototyping, compressing PM-engineer alignment cycles from clarification loops to architectural problem-solving, with the resulting telemetry collector now live in production. Toucan and Olvy practitioners documented Spec-it workflows converting PM pitches to structured specs and acceptance criteria stored as codebase markdown (shared source-of-truth with brand constitution enforcement). A peer-reviewed empirical study (arXiv, April 2026) confirmed the practice's persistent human dependency: AI achieves consistent syntactic and structural validation against INCOSE criteria but human judgment remains essential for contextual interpretation and strategic trade-off reasoning.
2026-May: Bifurcation in adoption outcomes becomes concrete. Copilot Specs (VS Code GA) and spec-generator (Claude Code skill with 7-phase pipeline and validation gates) evidence that specification tooling has moved beyond proof-of-concept into GA-level vendor support; the "Write A Prd" Claude Code skill reached 14.3k installs and 85.4k GitHub stars, confirming PRD generation is now mainstream IDE-embedded tooling. ChatPRD's 71+ case studies from senior technologists at Stripe, Microsoft, Notion, Figma, Coinbase document mainstream adoption in specification-augmented workflows at scale. Boldare's Claude Code deployment (6-person team) produced Architecture Decision Records as a natural byproduct of development—solving documentation debt while achieving 31% sprint velocity gains and test coverage from 85% to 95%—demonstrating that requirements discipline and AI tooling together become an engineering scaling lever. Parallel evidence documents the failure case: Lightrun's 2026 survey shows 43% of AI-generated code requires manual debugging in production post-QA; product leaders document that when AI writes both code and tests, tests only prove internal consistency, not correctness against requirements. The "PRD Compiler Method" emerged as a practitioner response: decomposing requirements into bounded task packets with explicit acceptance criteria and risk notes to prevent agentic imprecision. Critical gap identified: technical architect analysis documents three structural communication gaps (implicit conventions, WHAT vs HOW/WHY specs, no feedback loop) and proposes persistent context files (CLAUDE.md pattern) as infrastructure. AI engineering founder publishes specific template gap for AI features: deterministic PRD templates fail for stochastic systems; four sections required (Behavior Matrix, Eval-Set Callout, Fallback Spec, Prompt Ownership). Practice has bifurcated: organizations with specification discipline (Stripe, Ramp at 300+ engineers, Amplitude Moda) successfully embed AI-assisted PRDs with 25–50% time savings; organizations without specification discipline experience 43% production failures and 2–4× review cleanup overhead. The distinction is organizational capability (requirements discipline), not tool maturity.
2026-Jun: Specification quality emerges as the binding constraint on AI coding ROI. Anthropic production data shows Claude authoring 80%+ of merged code—validating that spec writing, not code writing, is the primary PM/tech-lead deliverable. Stack Overflow's Theory of Constraints analysis documents the mechanism: code generation cost dropped toward zero, shifting the bottleneck to requirements and ideation; vague specs now carry higher cost precisely because rework is free. Ecosystem ships production tooling: second-generation PRD skill ("To Prd," 174.7k installs on Claude Code Marketplace) and XITASO peer-reviewed study of 15 RE use cases both confirm that tool integration, not tool capability, is the binding constraint. Stanford AI Index benchmarking 26 models reveals hallucination rates from 22–94%; 74% of companies cite AI inaccuracy as top risk (up 14pp YoY), and the South Africa government AI Policy withdrawal (fabricated citations) confirms hallucination is a production documentation risk, not a theoretical one. Practitioners testing frontier models (Claude Fable 5 for PRD work) document quality ceilings: verbose dense output and conservative scope decisions remain maturity gaps. The bifurcation holds: organizations with specification discipline see 25–50% time savings; those without amplify existing weak practices and face 43% production failure rates.