Perly Consulting │ Beck Eco

The State of Play

A living index of AI adoption across industries — where established practice meets the bleeding edge
UPDATED DAILY

The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.

The Daily Dispatch

A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.

AI Maturity by Domain

Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail

DOMAIN
BLEEDING EDGEESTABLISHED

User research & feedback synthesis

GOOD PRACTICE

TRAJECTORY

Stalled

AI that synthesises user research transcripts, survey data, and customer feedback into themes and feature signals. Includes automated affinity mapping and sentiment-driven feature prioritisation; distinct from product analytics which analyses behavioural data rather than qualitative feedback.

OVERVIEW

AI-powered synthesis of qualitative user feedback -- interviews, surveys, open-ended responses -- is a proven capability with mature tooling and documented enterprise ROI. The question is no longer whether it works but why it has stalled. Over half of UX researchers now use AI for synthesis, and vendor-commissioned studies report ROI figures from 236% to 665%. Yet adoption remains confined to large enterprises with established research operations, and the category shows clear signs of a maturity plateau. The binding constraints are organisational, not technical: integration complexity favours vendor-led deployments over internal builds, practitioners hide AI tool use from colleagues even while reporting productivity gains, and hallucination risks demand human oversight that erodes the speed advantage. Consensus has settled on AI as an efficiency multiplier for theme extraction and summarisation -- compressing weeks of affinity mapping into hours -- rather than a replacement for interpretive research judgment. The result is a good-practice capability whose rollout challenge is less about tooling maturity and more about embedding AI-assisted synthesis into research workflows that organisations actually trust.

CURRENT LANDSCAPE

Three platforms dominate the vendor landscape -- UserTesting, Dovetail, and Thematic -- each with GA AI features and named enterprise customers including Amazon, Canva, Meta, and Mayo Clinic. UserTesting's latest Forrester TEI study documents 665% ROI with measurable business outcomes: 60% conversion improvement and 140% lift in customer spend. Dovetail has pushed furthest on workflow integration, with 3.0 (Fall 2025) and April 2026 releases shipping AI Agents, Dashboards, and Figma integration that embeds synthesised insights directly into design workflows; quantified outcomes show product managers reducing workload from 100 to 10 hours per week and teams saving 38+ hours weekly through AI-native synthesis. In April 2026, UserTesting launched its Figma plugin (GA), auto-generating test plans and embedding analysis results directly into design tools with named early-adopter outcomes (CarMax refined form flows; AJ Bell streamlined workflows before development). Thematic continues demonstrating strong enterprise adoption with documented 92% time reduction in feedback analysis and $4.8M incremental revenue generation. Outset, a new entrant, launched a visual intelligence suite (April 2026) extending synthesis beyond text to multi-modal analysis (facial cues, physical interaction) and integrated with Dovetail to send AI-moderated interview transcripts directly into synthesis platforms. A peer-reviewed deployment (Muse, ACL 2026) validates that LLM-based synthesis achieves inter-rater reliability κ=0.71 with human researchers on structured theme identification. Market analysis projects the text analytics segment reaching $18 billion by 2028, driven by demand for AI-powered open-ended response analysis.

Beneath the vendor momentum, adoption data tells a more complicated story. Adoption among researchers has stabilized at 69% using AI in at least some synthesis projects (April 2026 survey), though 83% limit it to summarisation rather than deeper thematic analysis. Ecosystem analysis reveals a market-wide shift toward AI-first platforms (Looppanel, Condens, Marvin, Marvin) characterized by auto-tagging and low-friction workflows, with AI-assisted synthesis now table-stakes vendor capability rather than differentiator (April 2026 industry report). Yet a critical risk surfaces in deployment: 47% of enterprise AI users have made major business decisions based on hallucinated synthesis content, with documented examples of entire features built on fabricated user preference findings. Practitioner assessment identifies that AI works reliably for structured synthesis tasks (transcription, ideation, desk research) but requires professional review for complex synthesis (summarization often introduces bias, incorrect details, fabricated metrics) and fails on interpretation and pattern discovery. Independent rigor assessment (Nielsen Norman Group, March 2026) reveals critical limitations: AI tools hallucinate findings, fail to identify meaningful patterns in qualitative data, and cannot adequately consider nuanced research questions; they excel at semantic pattern finding in pre-coded datasets but cannot replace trained human researchers. Comprehensive hallucination benchmarking shows Gemini 2.0-Flash at 0.7% on summarization tasks but 18.7% on legal questions and 15.6% on medical queries; Claude models range from 4.4% to 10.1%; newer reasoning models paradoxically show higher hallucination rates (o3 at 33% on person-based questions). Peer-reviewed evidence from healthcare research shows LLMs perform similarly to humans on deductive coding with predefined codebooks but fail on inductive theme generation and hallucinate themes without evidence support. MIT research has documented systematic bias against lower-literacy and non-Western users, with refusal rates reaching 11% -- a direct threat to synthesis accuracy for diverse feedback pools. Heavy AI users report three times more hallucinations than casual users and require ten times longer verification time. Expert practitioners face unique risks: fluency trust and velocity pressure enable hallucinations in specialist workflows even among teams with stated AI literacy. New governance frameworks (April 2026 business school guidance) emphasize source verification, construct validity checks, and enterprise-grade data security as prerequisites for responsible synthesis deployment, positioning synthesis as "prediction not verification."

These reliability concerns reinforce the organisational friction that defines the category's stall. Critical research (April 2026) documents that AI-based research methodologies fail at adoption scale: systems designed from users' stated preferences achieve only 57.7% accuracy, underperforming naive baselines, and deployment variance is extreme (bottom-quartile teams reach 12-18% daily active users vs. top-quartile 82-88% within 90 days). The differentiator is understanding the problem before building—a research design issue, not a technology one. Vendor-led implementations succeed at roughly twice the rate of internal builds, and a 42-day average project cycle suggests the bottleneck is process and research methodology, not processing power. Emerging hallucination mitigation strategies--multi-model orchestration where frontier LLMs cross-examine each other for quality--show promise but add deployment complexity. Practitioners increasingly argue the real gap is not categorisation -- where AI hits about 80% accuracy -- but prioritisation and action, the stages where human judgment remains irreplaceable. Industry consensus has shifted from AI-as-replacement toward responsible AI augmentation: vendors explicitly position AI as effective for accelerating interpretation and synthesis automation while humans own meaning, impact, and decisions. This maturation signals the category has settled into a sustainable but bounded equilibrium: proven value for large enterprises with mature research operations and research discipline, persistent structural barriers preventing expansion to mid-market segments rooted in adoption methodology and organizational readiness rather than tooling capability, and training incentives within foundational models that reward confident guessing over truthfulness—challenges that tooling improvements alone cannot resolve.

TIER HISTORY

ResearchJan-2020 → Jan-2020
Bleeding EdgeJan-2020 → Jan-2023
Leading EdgeJan-2023 → Apr-2024
Good PracticeApr-2024 → present

EVIDENCE (107)

— Critical risk assessment: qualitative synthesis especially vulnerable to hallucinations because research summaries lack verification numbers; 66% of employees trust LLM outputs without verification; documents persistent adoption barrier rooted in quality validation requirements.

— Independent analysis of AI research transformation: synthesis time collapsed from 16-26 days to 4 hours, async AI moderation dominates (80% of new studies), sample size scaling 50-100x, with Greenbook GRIT tracking AI as #1 emerging method for 3rd consecutive year.

— Production adoption analysis of 412+ enterprises shows 68% running AI interviews in production (up from 31%), synthesis time dropped 11→4 days, continuous discovery emerging as dominant use case (28%), demonstrating operational maturation beyond pilot stage.

— Enterprise market adoption snapshot: 51% of enterprises running AI research agents in production, synthesis timelines compressed 6-8 weeks→24-48 hours, continuous discovery moving from theoretical to practical, positioning AI-native research operationalization as mainstream.

— SAGE peer-reviewed edited volume with 10 chapters on AI-assisted QDA covering hybrid human-AI workflows, five-level QDA method adaptation, hallucination/bias/ethics treatment—scholarly validation of AI synthesis as mature field with established practice and pedagogy.

— Business school faculty implementation framework: GenAI compresses research timelines but requires verification of sources, construct validity checks, and enterprise-grade data security; synthesis outputs are predictions, not verifications—demand interrogation over acceptance.

— UserTesting Figma plugin GA (Apr 2026) auto-generates test plans and embeds synthesis results directly into design tool; named deployments (CarMax, AJ Bell) show ecosystem integration reducing research-design iteration delays.

— Market analysis: ecosystem shifted from manual taxonomy-driven repositories to AI-first platforms with auto-tagging and low-friction workflows; 10-tool comparison shows AI-assisted synthesis now table-stakes; segmentation by use case (speed, cross-functional, scale) signals ecosystem maturity.

HISTORY

  • 2020: User research synthesis tools gain investment and vendor momentum; UserTesting, Dovetail, and Thematic all demonstrate platform expansion and active customer bases; Dovetail's $4M seed round validates market opportunity; accuracy and synthesis quality become key vendor messaging.
  • 2021: Production adoption accelerates; UserTesting releases AI-powered Human Insight Platform (August); Thematic and Dovetail report real-world deployment at scale (130k+ reviews, daily production use); human-in-the-loop synthesis emerges as best practice to balance speed and accuracy; early evidence of cross-functional adoption beyond research teams.
  • 2022-H1: Vendor market growth continues (UserTesting Q1 revenue +47% YoY) but adoption barriers emerge; Dovetail launches AI Agents for workflow automation; industry research highlights systemic challenges including synthesis trust, data quality gaps, and broader enterprise AI adoption plateau at 26%; pilot-to-scale failures in health tech signal organizational readiness constraints.
  • 2022-H2: Feature expansion continues as vendors invest in automation depth; Thematic launches feedback categorization (November) for automatic classification of Questions, Issues, and Requests; synthesis tooling stabilizes but organizational barriers to adoption persist at scale.
  • 2023-H1: Vendor market consolidates with no new entrants; Thematic demonstrates 543% ROI metrics for transactional NPS workflows with enterprise deployments; Dovetail strategically narrows scope to synthesis-summarization phase for continuous research workflows in 100+ person tech companies; category shifts from innovation to incremental optimization; structural adoption barriers persist despite stable product-market fit and vendor growth.
  • 2023-H2: Market consolidation confirmed with no new entrants or exits; UserTesting releases AI Insights Summary beta (October) leveraging proprietary models trained on domain research data; Thematic publishes detailed Forrester TEI study showing 543% ROI and $1.8M revenue lift at enterprise scale; Dovetail maintains 3,800+ paying customers and confirms deployments at Canva; vendors acknowledge critical generative AI limitations (cost, accuracy) and invest in proprietary models and human-in-the-loop approaches; practitioner adoption surveys show 95% of UX professionals using AI tools, but deployment remains concentrated in large enterprises with mature research operations; structural barriers (integration complexity, organizational readiness) persist without resolution.
  • 2024-Q1: Vendor market shows continued stability with no new entrants; Dovetail publishes Forrester TEI study (January) documenting 236% ROI and 36,000 research-hours saved, validating enterprise-scale deployment economics alongside Thematic's earlier TEI evidence; Thematic launches Theme Discovery and Theme Summarizer features (March); Thematic releases Answers AI feature (February, GA) achieving adoption at LinkedIn, Instacart, and DoorDash; Dovetail highlights enterprise customers Atlassian, Okta, GitLab, Porsche, Dyson; third-party assessment reveals persistent deployment barriers including high cost, poor AI integration, and maintenance burden (Looppanel, January); academic research supports screening automation with caution on analysis phases (HCI position paper, March). Deployment remains concentrated in Fortune 500 and Series C+ companies with no visible market expansion to mid-market segments.
  • 2024-Q2: Practitioner adoption accelerates with 56% of researchers now using AI for synthesis (up from 20% in late 2023); UserTesting launches Feedback Engine with AI-powered surveys (April) enabling theme summarization at scale (100-1,000 respondents); Thematic's Answers feature reaches GA with visible customer traction; peer-reviewed medical research validates ChatGPT-assisted thematic analysis with human oversight requirements; customer reviews of UserTesting highlight sentiment tagging and transcription value but reveal friction with analysis automation and quality variability; Gartner and McKinsey data surfaces persistent AI project barriers (53% and 36% deployment rates respectively), explaining continued concentration of deployments in enterprise-scale organizations with mature operations.
  • 2024-Q3: Market matures with consolidation continuing; new startups (Outset, HeyMarvin, Looppanel) emerge with AI-core positioning. Critical practitioner assessments (September 2024) argue AI cannot reliably conduct evaluative research, perform thematic analysis, or analyze video without human supervision; UserTesting faces financial stress with revenue declining to $147.4M and persistent losses (-$50.7M); 87% of CX leaders prioritize generative AI but 27% struggle to quantify ROI; EFM platform market projects $6B+ by 2025 at 15% CAGR driven by AI/ML adoption. Structural barriers persist: capability limitations, cost, integration complexity, and organizational readiness gaps prevent scaling beyond enterprise segment.
  • 2024-Q4: Vendor feature expansion continues with Dovetail 3.0 GA (October) delivering AI Customer Insights Hub with named enterprise customers (Amazon, Canva, Meta, Notion, Mayo Clinic) and 38+ hours/week productivity gains; UserTesting (October) launches Insights Hub and QXscore, advancing synthesis at enterprise scale post-UserZoom merger. However, independent analyst assessment (Forrester Wave, Q4) reveals critical adoption headwinds: genAI expectations exceed reality, nearly half of customer references underuse analytics tools, and 85% of global enterprises using/testing GenAI but only 22% confident IT architecture supports deployment (Databricks/Economist, November). Peer-reviewed research (December, IJHCI) confirms GPT-4 limitations in usability research: AI-generated follow-up questions elicit feedback but rarely surface deeper insights, with only 13% of UX professionals using AI frequently. Structural tension persists: vendor GA momentum masks persistent capability limitations, organizational readiness gaps, and production-readiness challenges affecting category-wide adoption.
  • 2025-Q1: Vendor momentum continues with Dovetail's Amazon Bedrock GenAI integration (January) delivering 80% efficiency improvements and 10+ hours weekly time savings in customer deployments. UserTesting's refreshed Forrester TEI study (January) documents 415% ROI with quantified benefits (7.2% conversion improvement, 10% retention gain, 50% researcher time savings), validating continued enterprise value. However, emerging evidence reveals persistent research quality limitations: non-inclusive UX research practices and biased methodologies continue to drive product failures despite available synthesis tooling, signaling that adoption barriers remain rooted in research design discipline rather than purely in tooling capability. Market remains concentrated in Fortune 500 and Series C+ companies; no new entrants or mid-market expansion visible.
  • 2025-Q2: Practitioner adoption continues accelerating with 54.7% of 300+ researchers using AI assistance in synthesis workflows (June 2025), though 82.9% concentrate AI use on summary generation rather than deeper analysis. UserTesting adds documented deployment wins (Betway, Athletic Greens, GSK) with quantified business outcomes (600% download increase, 5% checkout improvement). Independent practitioner debate emerges on AI capabilities: vendors claim 90-95% synthesis accuracy but researchers express persistent concerns about bias (77% cite bias risk), synthetic users lacking human insight nuance, and limitations of hallucination-prone models. Productivity gains documented (80-hour synthesis reduced to 14 hours) position AI as augmentation rather than replacement. Critical assessment (MIT Sloan, June 2025) reinforces AI hallucination and bias risks in feedback analysis. Market concentration persists in enterprise segment; organizational readiness, cost, and integration complexity remain the binding constraints on broader adoption.
  • 2025-Q3: Vendor ROI benchmarks reach new highs with UserTesting's Forrester TEI documenting 665% ROI and quantified business impact (60% conversion improvement, 140% lift in customer spend); Dovetail case study shows production deployment at Canva with AI-powered insights. However, critical assessments underscore persistent technical limitations: Harvard Kennedy School (August) publishes peer-reviewed framework documenting AI hallucination risks in feedback synthesis; benchmarks show hallucination rates 0.6–2.0% for GPT-4 but 17–45% for general models (Fuel Cycle, September). Independent research finds AI inaccuracies rare in structured tasks but interpretive differences common, confirming synthesis quality depends on task specificity. Broader market signals moderation: HBR documents 95% of gen AI investments yield zero returns, signaling ROI challenges affecting enterprise adoption. Market expansion remains constrained to large enterprises; organizational barriers and cost persist.
  • 2025-Q4: Vendor feature expansion accelerates with Dovetail Fall 2025 launch (AI Agents, Dashboards, Docs, GA AI Chat); Amplitude case study reveals tool selection maturity and cost optimization (switched from $60k comprehensive to $12k specialized analysis tool, 80% savings). However, critical findings surface integration and organizational readiness barriers as the binding constraints: MIT analysis of 300+ AI deployments shows 95% deliver no measurable business value, with vendor-led implementations succeeding 67% vs. internal builds 33%; Anthropic study of AI interviewing (1,250 participants) reveals 86% report time savings but 69% hide AI use from colleagues, indicating hidden organizational skepticism despite productivity claims. Independent hallucination testing (SHIFT ASIA) identifies Perplexity fabricating citations, Gemini and Claude showing inaccuracies—escalating synthesis accuracy concerns. Expert consensus (November 2025) positions AI as "efficiency multiplier, not replacement," emphasizing mandatory human oversight. Market remains concentrated in Fortune 500 and Series C+ enterprises with no mid-market expansion; maturity plateau evident as category becomes bound by organizational factors rather than technical capability.
  • 2026-Jan: Vendor feature expansion accelerates with Dovetail's January launches (AI Agents for autonomous monitoring, AI Dashboards for custom visualizations) signaling continued platform maturity. Independent research documents emotion detection automation in usability tests achieving 86% expert agreement. Industry analysis reveals ecosystem consolidation with widespread AI integration but variable quality; UX tool selection maturity evident (Amplitude case: 80% cost savings through specialized platform selection). However, user frustrations persist: Dovetail's AI features rated as "shallow" by advanced teams; research synthesis remains constrained by organizational factors (42-day average project cycle) rather than pure technical capability. Practitioner consensus (2026-01) positions AI as efficiency multiplier requiring human-in-the-loop validation; synthetic users viable for validation but inadequate for emotional discovery.
  • 2026-Feb: Ecosystem expands with UserTesting adding physical product testing (February) using smartphone video and AI analysis, broadening research methods beyond digital interfaces. Market growth projects text analytics segment reaching $18B by 2028, driven by AI feedback analysis demand. However, critical risks surface: MIT research (February) documents systematic AI bias against lower-literacy and non-Western users, with refusal rates up to 11%, signaling synthesis accuracy barriers for diverse feedback. Adoption metrics reveal hidden constraints: Rev survey of 1,000+ users shows heavy AI users experience 3x more hallucinations than casual users, requiring 10x longer verification time. Practitioner critical assessment (February) argues AI automation solves wrong problem—categorization remains 80% accurate but real bottleneck is prioritization and action; founder analysis shows 30-68% churn reduction without AI automation. Dovetail's Figma integration (February) advances research-to-design workflows, demonstrating continued ecosystem maturation.
  • 2026-Mar: Dovetail launches Explore GA, a visual search interface for AI-synthesized customer feedback grounded in underlying evidence, advancing the core discovery workflow. The UX research software market is measured at $470.3M (2025) growing at 11.6% annually, reflecting steady ecosystem expansion. Hallucination risk remains the defining quality challenge: NNg independent testing confirms AI tools hallucinate findings and fail on qualitative pattern recognition; benchmarks across 70+ models show hallucination rates from 1.8% to 23%, with reasoning models paradoxically worse; expert workflows face compounded risk from fluency trust and velocity pressure—one journalist study found 15 of 53 AI-summarized posts contained fabricated quotes.
  • 2026-Apr: Ecosystem expansion accelerates with new vendors and integrations: Outset launches visual intelligence suite (April 2) with multi-modal analysis (facial cues, physical interaction) expanding synthesis beyond text; Outset integrates with Dovetail (April 7) to send AI-moderated interview transcripts and synthesized insights directly into synthesis platforms, advancing ecosystem consolidation. UserTesting's Figma plugin reaches GA (April 2026), auto-generating test plans and embedding analysis results into design tools with named early-adopter outcomes (CarMax, AJ Bell). Adoption metrics from Maze 2026 survey show 69% of researchers use AI in synthesis (up 19pp YoY), with 88% identifying synthesis as top trend and 63% reporting faster turnaround times. Thematic documents 92% time reduction in feedback analysis with enterprise deployment outcomes ($4.8M incremental revenue). Product management ecosystem shows insight synthesis as #1 most-wanted AI capability (Product-Led Alliance). Market analysis confirms AI-assisted synthesis is now table-stakes vendor capability, with an ecosystem of AI-first platforms (Looppanel, Condens, Marvin) competing on auto-tagging and low-friction workflows. However, critical risk surfaces: 47% of enterprise AI users report making major decisions on hallucinated synthesis content, exemplified by feature development based on fabricated user preferences. Peer-reviewed healthcare research confirms LLMs fail on inductive theme generation and hallucinate unsupported themes; practitioner QA guides document critical failure modes (fabricated quotes, incorrect counts, lost context) requiring evidence-backed validation for every synthesized theme. Governance guidance (business school framework, April 2026) positions synthesis outputs as "prediction not verification," emphasising source verification and construct validity checks as prerequisites. Ecosystem consolidation around synthesis as central insight hub continues; organizational and quality risks emerge as binding constraints on expansion beyond enterprise segment.
  • 2026-May: Operational maturation metrics sharpen the picture: 412+ enterprise deployments show 68% running AI interviews in production (up from 31%), with synthesis time compressed from 11 days to 4 hours and continuous discovery emerging as the dominant use case. Async AI moderation now accounts for 80% of new studies, with sample sizes scaling 50-100x over traditional methods. Hallucination risk remains the defining quality constraint: qualitative synthesis is especially vulnerable because research summaries lack verification numbers, and 66% of employees trust AI outputs without verification — sustaining the case for mandatory human oversight even as throughput gains become dramatic.

TOOLS