Perly Consulting │ Beck Eco

The State of Play

A living index of AI adoption across industries — where established practice meets the bleeding edge
UPDATED DAILY

The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.

The Daily Dispatch

A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.

AI Maturity by Domain

Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail

DOMAIN
BLEEDING EDGEESTABLISHED

Video generation — long-form narrative & explainer

BLEEDING EDGE

TRAJECTORY

Stalled

AI generation of longer narrative videos, explainers, and educational content with coherent storylines. Includes multi-scene generation and narrative consistency; distinct from short-form which produces clips rather than structured narratives.

OVERVIEW

AI-generated long-form narrative video — explainers, educational content, structured short films — exhibits a widening gap between hype and production reality. June 2026 evidence crystallizes this bifurcation: on one hand, 88% of organizations deployed AI in some function by year-end 2025, and enterprise adoption of video tools reached 73% of Fortune 500; Coca-Cola generated 70,000 video variations in under a month. On the other, Sora's collapse (from 1M to 500k active users; $1M/day compute costs against $1.4M lifetime revenue) demonstrates the unsustainability of consumer-facing video generation products. Diagnostic research (DirectorBench) quantifies the core bottleneck: scene-to-scene transition quality averages 0.256 (best 0.356 across workflows), while prompt fulfillment reaches 0.71—a structural continuity deficit, not a model-scaling gap. Memory-capability benchmarks (MBench) confirm that entity consistency, environment persistence, and causal reasoning remain systemically limited across all production-grade tools. The tools that work do so within narrow constraints: Meituan's LongCat-Video (13.6B parameters, 3,974 GitHub stars) generates minute-long content; JoyAI-Echo demonstrates 5-minute coherent generation with human-validated audio quality (81.7% preference); educational platforms (Renderforest 34M users, ZSky 105K creators) show real adoption in pedagogical and training contexts. Yet these successes depend on heavy human editorial guidance. The practice remains bleeding-edge: credible deployments exist in explainer and educational verticals, real investment flows in, and production pipelines increasingly integrate AI for ideation and asset generation. But autonomous long-form narrative generation — where AI independently maintains character, causality, and emotional arc across 5+ minutes — remains an unsolved engineering problem. Practitioner assessments consistently report that generative video still "creates more work, not less," producing isolated clips rather than connected narratives that require minimal human intervention.

CURRENT LANDSCAPE

June 2026 reveals a hardened, realistic landscape. Sora's documented collapse (April 2026) — from 3.3M peak installs to 1.13M (66% drop); $1M/day compute costs vs. $1.4M total lifetime revenue; novelty collapse within weeks — crystallizes a market truth: consumer-facing long-form video generation lacks sustainable unit economics and holds limited consumer appeal beyond initial curiosity. OpenAI redirected the model toward robotics and world simulation rather than pursuing product viability. The market consolidated around Runway, Kling, and Veo, with new entrants like BACH (Video Rebirth) entering enterprise pilots but not consumer channels. Enterprise adoption is real but narrowly scoped: 73% of Fortune 500 integrated AI video tools; Coca-Cola generated 70,000 variations in <1 month; production cost fell 91% ($4,500/min to $400/min). Yet deployment remains confined to short-form clips (10-25 seconds), concept ideation, and B-roll replacement. Multi-scene narrative coherence is the active constraint: DirectorBench benchmarking reveals scene-transition quality averages 0.256 (even top workflows reach 0.356), exposing the structural bottleneck. Character consistency remains a production barrier—30-second multi-shot films still show inconsistency patterns requiring manual intervention.

Educational and explainer verticals show genuine deployment success. Renderforest reports 34M creators using its platform for institutional education; ZSky AI documents 105,000+ educators across 39 countries generating synchronized instructional content; Golpo AI catalogs 25 real-world whiteboard explainer deployments (exam prep, training, policy). JoyAI-Echo (JD/Joy Future Academy) demonstrates 5-minute coherent generation with 81.7% audio-quality preference and 63.6% visual aesthetics—viable for structured pedagogical use. Production ROI tracking confirms hybrid workflows achieve 93.3% time reduction for 60-second product explainers, but always with human review at segment boundaries.

Yet fundamental barriers persist. Feature-length work (100,000-150,000 frames) remains beyond current model reach due to context window limitations. Research (MBench, May 2026) documents "critical systemic limitations" in entity consistency, environment persistence, and causal reasoning across long horizons. A March 2026 practitioner study testing four production tools (Runway, Kling, Seedance, Pika) on a 60-second single-character explainer required 15-20 regenerations per scene due to clothing and appearance drift, ultimately abandoning AI generation. Consumer trust barriers compound adoption friction: 36% report AI video lowers brand perception; 67% cite robotic gestures; 55% flag unnatural voices. For now, every narrative attempt above 2-3 minutes depends on human editorial guidance to sustain what models cannot.

TIER HISTORY

ResearchJun-2024 → Apr-2025
Bleeding EdgeApr-2025 → present

EVIDENCE (107)

— Practitioner analysis documenting real creator workflows: realistic long-form requires extensive multi-clip composition and regeneration (20+ attempts for 5 usable clips). Reveals hidden cost is failed attempts, not subscription.

— Peer-reviewed framework for minute-level video synthesis with error accumulation mitigation and identity consistency via causal attention, establishing benchmark for coherent long-form synthesis.

— Comprehensive buyer's guide with critical constraint: Sora API scheduled to shut down Sept 24 2026. Identifies Veo 3.1 and Seedance 2.0 as primary long-form contenders; documents market consolidation.

— Independent tested evaluation: Sora produces 60+ second clips maintaining coherence; Kling 3.0 achieves 3-minute consistency; Runway limited to 10 seconds requiring stitching.

— Meituan's LongCat GA: generates 60+ second coherent videos in single pass at 4K/60fps with character consistency and storytelling. Represents significant capability milestone for autonomous long-form generation.

— Vendor assessment identifies core storytelling requirements (character/environment consistency, narrative logic, shot sequencing) and documents the gap: models generate pieces of stories but lack production-scale narrative management layer.

— 15 real commercial deployments generating 38K RMB revenue despite 10-16 second per-clip limit via multi-clip composition. Runway Gen-3 controllability and stability superior for commercial work requiring assembly.

— Medical education deployment: LLM-script-to-HeyGen-avatar pipeline achieved 70-80% production time reduction with lightweight QA at output stage; demonstrates viable long-form narrative automation.

HISTORY

  • 2024-Q2: Research papers and evaluations dominate the landscape. Academic benchmarks reveal AI's struggles with long-form narrative comprehension and temporal reasoning. Product announcements (Runway Gen 3, Open-Sora) focus on short-form generation. Practitioner assessments and critical analyses emphasize generation time, consistency, and cost barriers preventing production deployment. No evidence of commercial adoption for full-length narrative production.

  • 2024-Q3: Technical coherence research accelerates (narrative consistency frameworks, follow-on shot limitations). Industry reports confirm major strides in video generation quality overall, but practitioner and critical analyses deepen understanding of long-form-specific barriers: diffusion models cannot reliably generate follow-on shots without breaking narrative logic; production economics remain prohibitive (300:1+ generation ratios). Early commercial attempts (brand films) remain short-form rather than long-form narratives. Viewer sentiment shows cautious adoption (75% receptive but 90% concerned about accuracy/authenticity). No advancement in long-form commercial production.

  • 2024-Q4: Product maturation accelerates (Veo 2, Sora Turbo, Amazon Nova Reel, open-source Hunyuan) with expanded access and improved quality in short-form generation. However, research analysis of long-form comprehension (HourVideo dataset) reveals AI models at 25-37% accuracy vs. 85% human baseline, indicating fundamental gaps in sustained attention and temporal sequencing. Practitioner assessments document specific narrative failures (semantic misinterpretation, character identity breaks) and identify 20-second clip length as practical ceiling. Media and entertainment industry remains cautiously hesitant despite tool proliferation; no evidence of production-critical long-form narrative generation deployment.

  • 2025-Q1: Academic research intensifies around narrative coherence solutions (Meta's OneStory, StoryAgent multi-agent framework, VideoStudio LLM-guided synthesis), demonstrating continued innovation in character consistency and multi-scene generation. However, real-world production case studies reveal persistent practical barriers: creative agencies report AI's inability to handle realistic human motion and physics, while high-profile deployments (Coca-Cola's campaign) require thousands of iterations with visible continuity issues. Production workflows shift toward multi-agent orchestration and human-in-the-loop curation rather than autonomous generation. Technology remains constrained by 20-second clip ceiling, low-yield generation ratios, and authenticity concerns; no evidence of autonomous long-form narrative production in professional media workflows.

  • 2025-Q2: Major commercial investment accelerates (Runway $300M Series D funding Runway Studios for long-form AI film production with Gen-4 character consistency features). Research advances ~60-second narrative generation with character consistency, extending prior 20-second ceiling. Industry analysts identify character consistency as "the holy grail" problem for long-form adoption. Critical assessments document deployment barriers: strategic oversight requirements, compliance validation gaps, production economics still prohibitive despite cost savings in early-stage ideation. Media studios remain cautious. Practitioner analysis emphasizes 70% cost benefits offset by emotional depth gaps and authenticity concerns. No evidence of autonomous long-form narrative deployment in professional production; hybrid human-AI workflows remain dominant.

  • 2025-Q3: Product releases accelerate narrative capability focus: Runway Gen-3 Alpha introduces cinematic storytelling controls; Sora 2 (end-Q3) improves physical world simulation. Research advances character consistency evaluation frameworks and multi-stage narrative pipelines with explicit stability metrics. Market projections reach $10B by 2027. However, production deployment remains constrained: character consistency improvements apply to controlled scenarios (animation, stylized content) but fail in realistic multi-character narratives. Practitioner reviews of Gen-4 and Sora 2 confirm incremental capability gains but note substantial iteration still required. No evidence of autonomous long-form narrative production; human-AI hybrid workflows with multi-agent orchestration remain industry standard. Technical coherence at scale and production economics continue to block adoption.

  • 2025-Q4: Product maturity accelerates: Sora 2 (Oct 2025) delivers synchronized audio and improved physics; Runway Gen-3 Alpha prioritizes character consistency for cinematic narratives; third-party integrations (SJinn) chain Sora 2 and Veo 3 to break the sub-10-second barrier, enabling minute-long character-consistent storytelling. Kling AI 2.0 reaches 22M users, signaling mass-market adoption of advanced video generation. Practitioner workflow guides document production patterns: shot planning, multi-take generation, QA gates for narrative content. However, deployment evidence confirms character consistency stability remains limited to stylized scenarios; realistic multi-character narratives with complex physics still require intensive iteration and human curation. Technical limitations (temporal coherence, semantic understanding, hand consistency) persist. Production economics remain prohibitive for autonomous long-form generation. Deployment has shifted from research prototypes toward hybrid human-AI production workflows, particularly in animation, education, and ideation-phase applications. Bleeding-edge capability present; mainstream production adoption constrained by technical barriers and cost-benefit economics.

  • 2026-Jan: Major commercial deployment acceleration: Vidu Q3 launches as first long-form AI video model with native audio-video generation (16s synchronized output); achieves 40M creator adoption with 500M+ videos generated (70% commercial). CraftStory releases 5-minute image-to-video capability for long-form narratives with human actors and lip-sync alignment. Disney-OpenAI partnership ($1B licensing) and WPP-Google partnership ($400M) signal production-scale adoption by major media and advertising conglomerates. Agentic research frameworks emerge (ScripterAgent, DirectorAgent) for dialogue-to-cinematic generation. However, foundational technical barriers persist: MIT-IBM Watson benchmark analysis documents coherence degradation after ~8 seconds in all major models due to fixed-length context windows. Character consistency remains limited to stylized scenarios. Feature-length professional production remains capped at 60 seconds; deployment assessments confirm 5-10 minute outputs still require multi-model orchestration and intensive human curation. Temporal coherence and semantic narrative understanding remain substantially unsolved. Production workflows continue hybrid human-AI patterns. Technical limitations at scale and production economics continue to constrain autonomous long-form generation deployment.

  • 2026-Feb: Market expansion signals growth (AI video generation market reached $1.8B with 45%+ CAGR). Enterprise adoption metrics document 42% of Fortune 500 marketing departments using tools, 65% of marketing teams (vs. 12% in 2024), 40% of e-commerce brands, 80%+ of social creators under 30. However, critical production deployment barriers emerge across evidence: Sora 2 assessed as "not reliable enough for final ad output" with "long-form and multi-scene control still fragile"; consumer adoption declined sharply (iOS downloads dropped 45% by January 2026); practitioner assessments note generative models "create more work, not less," generating "isolated clips with no narrative continuity." Professional use remains constrained to 10-25 second B-roll replacement. Consumer trust barriers persist: 36% report AI video lowers brand perception, 67% cite robotic gestures, 55% unnatural voices. Deployment evidence confirms market awareness and enterprise adoption acceleration, but production deployment barriers—narrative coherence gaps, consumer trust concerns, poor output reliability—continue to block autonomous long-form narrative generation. Hybrid workflows and ideation-phase use remain dominant applications.

  • 2026-Mar: Major market consolidation and evidence of both scaled deployment success and sustained technical barriers. OpenAI shuts down Sora March 25 due to unsustainable $15M/day operating costs vs. $2.1M lifetime revenue, signaling structural failure of standalone video generation products; consolidation accelerates around Runway, Kling, and Veo. Real-world production evidence shows contrasting patterns: Higgsfield AI competition attracts 8,752 film submissions from 139 countries (broadest adoption signal to date), with China's micro-drama sector showing 90% cost reduction and 41% AI content penetration. Educational deployment case study (peer-reviewed NIH publication) documents successful long-form narrative video deployment for medical training with 72-94% cost reduction and improved learning outcomes. However, practitioner case studies confirm character consistency remains unsolved: real client project testing 4 major tools (Runway, Kling, Seedance, Pika) on 60-second explainer required 15-20 regenerations per scene due to character drift, ultimately abandoning AI generation. PhD researcher quantifies feature-length barrier: AI cannot sustain coherence across 100,000-150,000 frames (feature-length content). Successful production ROI deployment shows specific hybrid workflow (Sora+Runway hybrid) achieves 93.3% time reduction (6 days to 8 hours) for 60-second explainers with 340% ROI. Technical barriers—character consistency, narrative coherence, frame length—remain fundamentally unsolved despite market breadth signals. Deployment remains constrained to educational, explainer, and B-roll replacement use cases with intensive human editorial oversight.

  • 2026-Apr: Sora's shutdown confirmed the structural failure of standalone video generation as a product category, with Futurum Group documenting enterprise adoption risks and vendor durability concerns as direct consequences; the market consolidated around Runway, Kling, and Veo. Research frontier advanced on multiple fronts: OmniScript addressed multi-scene audio-visual coherence; Stable Video Infinity (ICLR 2026 oral) introduced error-recycling for infinite-length generation; the MuSS benchmark established formal evaluation of multi-shot narrative logic and character consistency barriers; and the Mixture of Contexts paper (ICLR 2026) demonstrated 7x attention-routing speedup enabling minute-long multi-shot generation with maintained subject consistency. Vendor benchmarking confirmed API ecosystem maturity reaching Tier 5 "Cinematic Director" capability — multi-shot, physics-aware, audio-sync, scene-graph APIs now production-ready across Vidu Q3, Kling 3.0, and Veo 3.1. Sora 2 remained available via API until September 2026 with physics-aware rendering and world-state persistence. Against persistent character consistency barriers, a Japanese production firm documented Veo 3.1 deployment in an insurance company explainer campaign (one-third cost compression, half the production time, 20% higher view completion), while a critical analysis found 68% of buyers report vendor homogenization and viewers sense the absence of human storytelling judgment — signalling that trust, not just technical coherence, now constrains adoption for narrative content requiring emotional resonance.

  • 2026-May: Research advances continued on coherence and consistency: A²RD (agentic autoregressive diffusion) benchmarked 30% consistency gains and 20% narrative coherence improvements across 1-10 minute videos; ReCA (recursive context allocation) introduced MSVE-Bench for the 3-5 minute multi-shot regime and showed 8-16% narrative consistency improvement over baselines; while the EduStory framework addressed pedagogical consistency for multi-shot STEM instructional content. Educational verticals showed real adoption at scale: VideoTutor reached 50M TikTok views, $11M seed funding, and 1,000+ enterprise API inquiries; ZSky AI reported 105,000+ creators across 39 countries generating synchronized audio-video instructional content; a hybrid AI workflow case study demonstrated 5-minute corporate training videos produced in 45-60 minutes (down from 5-6 hours). Marketer adoption reached 78% (up from 41% in 2024) with 70M Gemini-generated assets in Q4 2025, but platform enforcement hardened simultaneously: YouTube's January 2026 action removed 4.7 billion views from 16 channels that had built content economies on full-AI narratives, establishing a concrete distribution ceiling for autonomous long-form generation. New industrial-grade tooling emerged with BACH (Video Rebirth), a Tier 5 Cinematic Director engine ranking #6 on Artificial Analysis benchmarks for character consistency in 30-second multi-shot films, entering enterprise pilots. Fundamental barriers remained documented: technical analysis identified three unresolved structural constraints — VRAM wall (O(n²) attention cost saturating H200 GPUs at 10-second clips), temporal drift, and causal consistency constraints — and expert consensus placed feature-length autonomous generation 3-5 years away, confirming that coherent autonomous long-form generation above 2-3 minutes remains an unsolved engineering problem, not a tuning gap.

  • 2026-Jun: Two diagnostic benchmarks crystallised the core technical bottleneck: DirectorBench measured scene-transition quality averaging 0.256 (best workflow 0.356) against prompt fulfilment of 0.71, pinpointing scene-to-scene continuity rather than single-shot quality as the binding constraint; MBench confirmed entity consistency, environment persistence, and causal reasoning show "critical systemic limitations" across all production-grade models. Against this backdrop, Sora's economic collapse was fully documented ($1M/day compute costs, $1.4M lifetime revenue, 66% user-base decline from peak), confirming that standalone consumer-facing long-form video lacks viable unit economics; the Sora API is scheduled to shut down September 24, 2026, concentrating the market around Veo 3.1 and Seedance 2.0 as primary long-form contenders alongside Kling 3.0 (documented 3-minute consistency) and Runway Gen-3 (limited to 10 seconds, requiring assembly). The fal.ai State of Generative Media report (88% organisational AI deployment, 73% Fortune 500) and Golpo AI's 25 whiteboard explainer case studies (exam prep, training, policy) represent the realistic upside: viable deployment within narrow, highly structured verticals with human editorial oversight, not autonomous multi-scene narrative generation. Meituan's open-source LongCat-Video (13.6B parameters, 3,974 GitHub stars, 4K/60fps, 62.11% VBench 2.0) demonstrated GA-tier minute-long generation capability, medical education deployments (LLM-to-HeyGen pipeline) documented 70-80% production time reduction, and API-level character consistency reached 95% visual continuity for episodic content — but practitioner workflows confirmed the regeneration economics remain harsh, with real creators reporting 20+ attempts to yield 5 usable clips per scene.