Perly Consulting │ Beck Eco

The State of Play

A living index of AI adoption across industries — where established practice meets the bleeding edge
UPDATED DAILY

The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.

The Daily Dispatch

A daily newsletter distilling the past two weeks of movement in a domain or two — delivered to your inbox while the index updates in the background.

AI Maturity by Domain

Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail

DOMAIN
BLEEDING EDGEESTABLISHED

Candidate assessment — structured scoring support

LEADING EDGE

TRAJECTORY

Stalled

AI that helps interviewers evaluate candidates consistently by structuring scoring rubrics and flagging evaluation biases. Includes calibration support and rubric enforcement; distinct from resume screening which evaluates documents rather than interview performance.

OVERVIEW

AI-assisted structured scoring has achieved operational maturity in high-volume hiring but remains trapped behind cascading validity, fairness, and regulatory barriers that are now crystallizing into systemic deployment risk. Multinational enterprises and large-scale recruiters—HireVue (800+ clients), Metaview (3,000+ customers), Curatal, Interviewer.AI—sustain deployments with documented efficiency gains: 27–71% time-to-hire reductions, £3,000/month CV screening savings, and 20-point improvements in final interview pass rates. The underlying methodology is theoretically robust: decades of research validate structured interviews as 2.2× more predictive of job performance than unstructured alternatives (Sackett et al. 2022), and 85% of systems designed with explicit fairness guardrails meet bias thresholds. Yet adoption has stalled and new structural validity threats have emerged in June 2026. Five reinforcing barriers now constrain expansion: (1) LLM self-preference bias—peer-reviewed research shows AI screeners prefer their own stylistic outputs 67–82% of the time regardless of actual quality, a structural property unfixable by prompt engineering; (2) audit methodology failures—vendor bias audits aggregate across jobs, masking job-level disparities that emerge under EEOC scrutiny (Stanford's 3.4M-application study found vendor audits gave "clean" results while job-by-job analysis revealed 26% of Black and 15% of Asian applicants faced adverse impact); (3) GenAI cheating and response gaming—39% of applicants use AI to optimize answers; LLM-scored assessments show 18–23% scoring bias toward AI-generated text; (4) candidate trust collapse at 26% fairness confidence and offer acceptance falling from 74% (2023) to 51% (2026), despite positive user experience in live interactions; (5) regulatory fragmentation—federal vacuum after EEOC guidance removal (Jan 2025) created patchwork of state standards (CA, IL, CO, TX) and EU AI Act high-risk classification (August 2, 2026 enforcement, up to 7% global revenue penalties), with Mobley v. Workday class certification (~1.1B applications) establishing vendor liability precedent. The result is deepening bifurcation: enterprises with compliance infrastructure and high-volume hiring needs sustain deployment despite validity and fairness risks; mid-market and risk-averse organisations remain blocked by unresolved structural threats and implementation costs. The practice is production-grade for compliant enterprise use but not yet enterprise-safe at mass market scale.

CURRENT LANDSCAPE

The vendor ecosystem continues scaling despite mounting validity and fairness concerns. HireVue serves 800+ enterprise clients—Emirates, Unilever, Philips, Nestlé—reporting $500k to £1M annual savings per deployment. Metaview's 3,000+ customers cite 30-minute-per-interview time savings and 30% reduction in interviews-per-hire. Interviewer.AI reports 66% of hires closing within one week. UK SME adoption jumped to 54% (from 35% in 2025) with documented 71% cost-per-hire reduction and £3,000/month CV screening savings. Production case studies show measurable outcomes: LNER cut hiring from 7 weeks to 3 weeks (71% reduction); William Hill compressed time-to-interview from 15 days to 1.8 days (88% reduction); TTS Talent client reduced first-year turnover 48% and achieved 5% performance uplift through structured assessment. These deployment outcomes remain credible across sectors and geographies, validating operational ROI in controlled settings.

However, June 2026 evidence reveals critical structural validity threats beneath the deployment metrics. Stanford HAI's analysis of 3.4M applications across 156 employers found that while individual vendors had published clean bias audits, job-level disaggregation (required by EEOC) exposed algorithmic adverse impact: 26% of Black and 15% of Asian applicants faced disproportionate rejection—a classic audit methodology failure showing vendor audits aggregate across jobs to mask disparities. LLM-based assessment systems exhibit self-preference bias (67–82% preferring own-model outputs) regardless of actual quality, and this is a structural property unfixable by prompt engineering. Assessments scored by LLMs show 18–23% bias toward AI-generated text, disadvantaging candidates using their own voice. On calibration side, while 69.6% of organizations use structured interviews and 47% conduct calibration, the vast majority (78.7%) retain human final authority—indicating tool adoption without methodology adoption. Candidate transparency gaps compound fairness perceptions: 70% of candidates are never informed upfront that AI is involved, and 38% abandon hiring processes due to AI assessment. Only 26% of candidates trust AI fairness despite positive user experience, and offer acceptance rates remain depressed at 51% (down from 74% in 2023). GenAI cheating remains prevalent: 39% of applicants use AI in responses; 49.6% optimize answers against known LLM assessments; RPO research documents candidates successfully using GenAI to pass video interviews with inflated ratings. Fairness metrics remain vendor-dependent and inconsistent (40% variance between implementations), yet audit frameworks that enforce structured rubrics (vs. black-box embeddings) demonstrate measurable bias reduction.

The regulatory environment has hardened into a compliance crisis. The EEOC removed AI hiring guidance in January 2025, creating a federal vacuum filled by conflicting state standards: California (FEHA), Illinois (HB 3773 prohibiting intent-independent discrimination), Colorado (SB 24-205 with annual impact assessments), Texas (TRAIGA), Ontario (AI disclosure mandate), Germany and UK (EU AI Act and Data Act reformed decision-making). The EU AI Act classifies recruitment as high-risk with mandatory agent inventories, automated logging, human oversight, and transparency—enforcement August 2, 2026 with penalties up to 7% of global annual revenue. Vendor liability is now established: Mobley v. Workday class certification (nationwide, ~1.1B applications) treats vendors as direct agents liable for disparate impact; Kistler v. Eightfold exposes FCRA liability for opaque scoring and missing dispute rights; HireVue faces ACLU/EEOC complaints on accessibility failures. This fragmented compliance burden favours caution and concentrates adoption among enterprises with compliance infrastructure. The practice has achieved operational maturity but faces a compliance and validity infrastructure gap that blocks mass-market expansion.

TIER HISTORY

ResearchJan-2022 → Jan-2022
Bleeding EdgeJan-2022 → Jul-2022
Leading EdgeJul-2022 → present

EVIDENCE (139)

— U.S. District Judge Rita Lin denies Workday's motion to dismiss, establishing vendor liability as "agent" performing FEHA-regulated activities; first ruling expanding direct vendor liability for algorithmic outcomes.

— Morgan Lewis confirms candidate assessment and selection are Annex III high-risk systems under EU AI Act (enforcement Aug 2, 2026); mandates risk management, data governance, human oversight, technical documentation, and regulatory registration.

— GESI analysis documents proxy bias (names, career gaps, language patterns) and failure of human-in-the-loop safeguard (only 41% detect deliberate bias); provides negative signal on structural limitations and oversight gaps.

— Enterprise AI interview platform with 10M+ interviews conducted, 9.1/10 candidate satisfaction, 1.6k hours saved monthly; demonstrates production-scale maturity with explainable scoring, fairness metrics, and independent audit availability.

— UK ICO audited 30+ employers, issued compliance letters to 16; enforcement establishes test for meaningful human involvement and mandates bias/fairness testing; signals shift from best-practice to regulated-mainstream governance.

— Class action alleges Eightfold scored 1B+ worker profiles 0-5 without disclosure/consent, violating FCRA; establishes FCRA liability theory distinct from discrimination claims, covering opaque scoring without candidate access/dispute rights.

— Named global SaaS company raised offer acceptance by 12% after six months of disciplined calibration; demonstrates specific, measurable business outcome from structured scoring and panel alignment.

— Large-scale empirical study (3.4M applications, 156 employers) found 26% of Black and 15% of Asian applicants faced adverse algorithmic impact; position-level analysis reveals job-by-job disparities invisible in aggregate vendor audits.

HISTORY

  • 2022-H1: Structured interview assessment moved from research into production deployments. HireVue and Metaview platforms showed real-world adoption (Metaview with Catawiki; HireVue with 86%+ employer adoption), but regulatory action (BIPA lawsuit) and documented bias concerns raised questions about sustainable deployment.
  • 2022-H2: Ecosystem matured with academic validation (ETS research on Evidence-Centered Design methodology) and government sector adoption (Canada's Department of National Defence deployed structured scoring templates). However, candidate skepticism emerged from psychological research and regulatory risk crystallized: New York City's bias audit law (effective January 2023) and mounting legal liability concerns constrained adoption momentum among risk-averse enterprises.
  • 2023-H1: Adoption accelerated despite regulatory headwinds. HireVue documented major enterprise ROI (Sitel saving $408k, Emirates $500k, Flutter achieving 50% time-to-hire reduction), and new platforms like Sapia.ai reached scale (12M structured interview questions, named clients including Qantas and Woolworths). Metaview deployments at growth-stage companies (Replit, Pleo, Localyze) showed 20+ hours/week time savings. However, legal challenges persisted: CVS faced class-action lawsuits over HireVue's facial analysis features. Public sentiment remained mixed (Pew Research: 71% opposed AI making final decisions, but 47% believed AI assessed candidates more fairly), signaling ongoing tension between efficiency gains and fairness concerns.
  • 2023-H2: Adoption continued among growth-stage and enterprise adopters (81% of talent leaders exploring AI, Metaview clients reporting 20+ hours/week savings, Brex saving 1,000 hours/year). However, regulatory and readiness barriers intensified: EEOC filed its first AI discrimination settlement ($365,000 with iTutorGroup for age bias), Illinois and other states enacted video interview regulation laws, and workforce sentiment turned negative (65% of UK professionals concerned about job automation, only 25% felt prepared, 48% feared AI bias in recruitment). Academic critique of autonomy and fairness assumptions in AI assessment design emerged, and platform vendors faced increasing scrutiny—EPIC filed complaints about HireVue's "black-box" scoring methodology. The practice remained bifurcated: proven ROI in adopting companies versus growing caution among risk-averse enterprises and regulated sectors.
  • 2024-Q1: Adoption dynamics shifted with legal escalation. Psychometric validation continued (peer-reviewed research confirming AI scoring reliability and fairness), and platforms sustained momentum with growth-stage customers (Metaview, LetzInterview). However, two federal court rulings in Q1 2024 expanded legal liability: Massachusetts ruling allowed class-action against CVS/HireVue for lie-detector law violation (February), and Illinois BIPA ruling extended biometric privacy protections to facial expression analysis in AI assessments (March). These precedents signaled courts interpreting AI behavioral scoring more expansively, treating it as analogous to polygraph or biometric systems, creating new compliance uncertainty. The bifurcation deepened: enterprises with compliance infrastructure continued deployment, while regulated industries and risk-averse firms faced mounting barriers to entry.
  • 2024-Q2: Ecosystem continued maturing amid regulatory tension. New vendor products emerged (Aspect's AI-powered interview template generator) signaling sustained investment in structured assessment tooling. Peer-reviewed research reinforced scientific legitimacy of AI competency assessments (Journal of Applied Psychology validation study), underpinning vendor claims. However, industry discourse increasingly focused on the diversity-validity dilemma: balancing AI's demonstrated accuracy and consistency benefits against fairness and demographic parity concerns. Courts' expansionist interpretation of biometric privacy law (established in Q1) created uncertain liability for behavioral scoring, constraining adoption outside growth-stage and multinational enterprises with legal resources.
  • 2024-Q3: Deployment and validity challenges intensified. JetBlue's HireVue deployment achieved strong candidate satisfaction (93% CSAT), affirming practical adoption value for high-volume hiring. However, validity threats emerged: Sapia.ai's analysis of 573,500+ candidate responses revealed widespread AI-generated content (cheating), challenging the reliability of text-based structured assessments. Recruiter adoption metrics remained positive (380+ recruiters survey showing 92% adoption for productivity, 25% more candidates weekly), but the CVS settlement in July (resolving HireVue facial analysis lawsuit) crystallized regulatory and reputational risks. The market bifurcation deepened: enterprises with compliance infrastructure and high-volume hiring needs continued deployment despite risks; regulated sectors and risk-averse firms remained constrained.
  • 2024-Q4: Evidence of structured assessment maturity and persistent validity risks. Stanford RCT (n=37,000) validated AI-assisted structured interviewing, showing 20 percentage point improvement in final interview pass rates—confirming efficacy for high-volume hiring at scale. Enterprise adoption continued (Abeam Consulting deployed HireVue globally). However, cheating and fairness risks crystallized: peer-reviewed research documented widespread applicant use of GenAI in unproctored assessments, challenging validity of text-based scoring systems; psychometric vendors reported 75% of workers using GenAI with 70% reporting productivity gains, but raised alarms about detecting AI-assisted responses in live interviews. The practice remained bifurcated: organizations with high-volume structured hiring and compliance infrastructure sustained deployment despite validity and regulatory risks; organizations dependent on asynchronous assessment faced mounting barriers.
  • 2025-Q1: Adoption acceleration amid validity and regulatory crisis. AI usage in hiring surged to 72% weekly adoption with 31% deploying AI for assessments, signaling transition from experimentation to operational integration. Peer-reviewed validation of AI structured interview scoring continued. However, three critical barriers emerged: (1) validity threats from GenAI cheating on unproctored assessments affecting 92% of organizations using pre-employment testing; (2) regulatory escalation with ACLU discrimination complaint against HireVue/Intuit (March 2025) alleging bias and accessibility failures, paralleling WorkDay litigation establishing employer liability; (3) state-level AI regulation expansion (Colorado AI Consumer Protection Act). Market bifurcation deepened: high-volume tech hiring and multinational enterprises with compliance infrastructure sustained deployment; regulated sectors, risk-averse mid-market, and text-based assessment users faced compounding legal and validity barriers.
  • 2025-Q2: Enterprise deployment scale proved sustainable but tier progression constrained by implementation and regulatory realities. Major deployments continued: HireVue at 800+ enterprise clients (Emirates, Unilever, Philips, Nestlé) showing $500k-£1M annual savings; Metaview's 3,000+ customers delivering 30+ min per-interview productivity gains and 92% hiring confidence improvement. New ecosystem entrants (Criteria Corp, BarRaiser) launched AI scoring and real-time guidance tools. However, expert assessment shifted toward skepticism: industry leaders argued AI scoring "is not yet reliable enough" due to transcription errors and missing validation data; ACLU complaint pattern evidence (accessibility failures, bias against deaf/Indigenous candidates) established regulatory precedent; implementation reality showed privacy friction and narrow scope limitations. Market bifurcation persisted: multinational enterprises and high-volume tech hiring sustained deployment with legal buffer; mid-market and regulated sectors remained blocked by unresolved validity concerns and regulatory exposure.
  • 2025-Q3: Implementation gaps, candidate distrust, and validity threats intensified adoption barriers. While 96% of recruiting teams deployed AI broadly, only 53% used scoring rubrics and 47% conducted interview calibration—indicating adoption of generic AI tools without structured assessment discipline. Candidate trust collapsed: Gartner research showed only 26% trusted AI evaluation fairness, with offer acceptance rates falling from 74% to 51%; research across 13,000 participants found candidates alter self-presentation under AI assessment, downplaying empathy and creativity. Fairness capability remained vendor-dependent: Warden AI's 1M+ sample audit found 85% of systems met fairness thresholds and AI delivered 39-45% better treatment for women and minorities vs. humans, yet bias metrics varied 40% between vendors. Validity threats emerged: RPO research documented candidates successfully using GenAI to pass online tests and video interviews with higher ratings, undermining confidence in scoring reliability. Multiple HireVue lawsuits (EPIC complaint, Deyerler BIPA class action, D.K. EEOC discrimination complaint) established pattern precedent of accessibility failures and algorithmic opacity. Practice remained bifurcated: high-volume tech and multinational enterprises sustained deployment; broader adoption constrained by validity risks, legal exposure, and candidate resistance.
  • 2025-Q4: Maturity and barriers crystallized in final quarter. Enterprise deployment sustained at scale (Interviewer.AI production data: 66% hires in 1 week, 20% evaluation variance reduction; Metaview 3,000+ customers; HireVue 800+ enterprise clients), validating operational ROI in high-volume hiring. However, implementation gap persisted: 78.7% retained human final hiring authority despite AI deployment, and Willo survey showed only 69.6% using structured interviews and 47% with calibration—indicating tool adoption without methodology adoption. University of Washington research revealed humans mirror AI biases 90% of the time without intervention; HireVue responded with Multi-Penalty Optimization bias mitigation technique. Regulatory escalation continued: Workday class action certified as nationwide, HireVue faced ACLU discrimination and EEOC complaints, establishing pattern evidence for legal liability. Candidate distrust remained structural: 26% trust fairness, 39% use GenAI in applications, offer acceptance rates at 51%. Validity threats unresolved: vendor fairness metrics varied 40%, GenAI cheating remained prevalent, and assessment reliability under threat. Practice achieved operational maturity but faced fundamental barriers (candidate resistance, implementation gaps, fairness inconsistency, legal exposure) limiting expansion beyond current multinational and high-volume segments.
  • 2026-Jan: Hybrid AI-human screening research validated complementary deployment models (70,000-applicant field study showing 7% offer increase, 24% separation reduction), but critical expert assessment argued AI had made hiring worse overall and legal landscape intensified with pending Mobley v. Workday and Harper v. Sirius XM lawsuits establishing precedent for discrimination claims. Practice remained bifurcated: enterprises with high-volume hiring and compliance resources sustained deployment despite regulatory risks; broader adoption blocked by candidate distrust (26% fairness confidence), validity threats from GenAI cheating, and persistent fairness inconsistency (40% bias metric variance across vendors).
  • 2026-Feb: Professional standards maturation and vendor product evolution amid persistent adoption barriers. SIOP released formal recommendations for validating AI-based assessments signaling mainstream governance readiness; HireVue released Assessment Builder with claimed efficiency gains (60% less screening time, $667k annual savings). However, critical assessment revealed systematic AI bias patterns—research showed White names 85% selection vs. Black names near-zero, humans mirror AI bias 90% without intervention—while legal analysis documented six compliance risks (bias, transparency, data privacy, psychological inference validity, candidate fairness perception). Sapia.ai case study (Holland & Barrett: 89% turnover drop in 3 months, 47% hiring acceleration) demonstrated real-world deployment impact but remained exception. Market bifurcation persisted: multinational enterprises and high-volume hiring operations sustained deployment; broader adoption remained blocked by validity threats from GenAI cheating, candidate distrust, regulatory escalation (Illinois HB 3773, Colorado AI Act), and persistent bias metrics variance (40% between vendors).
  • 2026-Mar: Regulatory vacuum escalated adoption friction: EEOC removal of AI hiring guidance in January 2025 created a patchwork of four conflicting state laws (CA, IL, TX, CO), and Mobley v. Workday class certification expanded vendor liability exposure. Countervailing evidence emerged from a 70,000-applicant field study showing AI-conducted interviews achieved 12% more job offers, 18% higher job starts, and 78% candidate preference for AI over humans — yet fairness audit data (Warden AI, 150+ systems, 1M+ samples) confirmed 85% of systems meet thresholds only when designed with explicit guardrails, leaving the bifurcation between compliant enterprise deployments and the broader market unchanged.
  • 2026-Apr: Regulatory pressure intensified with the EU AI Act's August 2, 2026 enforcement deadline (penalties up to 7% of global revenue) requiring agent inventories, automated logging, and human oversight for high-risk assessment systems. UK SME adoption reached 54% (up from 35% in 2025) with documented 71% cost-per-hire reductions, while enterprise deployments (LNER: 71% time-to-hire reduction; William Hill: 88% compression) confirmed production-scale outcomes. The Mobley v. Workday class certification — covering ~1.1B applications — cemented vendor liability as a structural risk, shifting compliance burden squarely onto assessment tool providers.
  • 2026-May: Eightfold AI's BABL audit (29M assessments) achieved PASS ratings on disparate-impact thresholds, while Oracle-Eightfold's GA integration compressed time-to-hire from 42 days to 5 — but a Mercuri Urval Research Institute audit found AI selection tools broadly non-compliant with SIOP/ITC/ISO standards (7 structural bias types, no job analysis or validation steps), and the Kistler v. Eightfold class action added FCRA liability exposure (opaque Match Scores, no candidate dispute rights, customers including Microsoft/PayPal/Morgan Stanley) alongside the expanding Mobley v. Workday nationwide class (~1.1B applications). The candidate trust gap hardened: 63% have faced AI interviews (up 13 points in six months), 70% were never informed upfront, and 38% abandoned processes — with CMU/AJL research documenting AI-generated text scoring 18–23% higher and over-55/immigrant-English candidates 27–31% lower, surfacing explainability as an architectural requirement ahead of the August 2, 2026 EU AI Act enforcement deadline.
  • 2026-Jun: Structural bias evidence intensified: peer-reviewed research (Yang et al. 2026, 20 models) confirmed AI capability uncorrelated with fair self-preference scoring, with synthetic training data leaking preference bias invisibly; a separate multisite study found LLM screeners prefer their own stylistic output 67–82% of the time regardless of quality — a structural property unfixable by prompt engineering. Stanford HAI study of 3.4M applications across 156 employers found 26% of Black and 15% of Asian applicants faced adverse algorithmic impact, with vendor audits masking job-level disparities by aggregating results. A durable-skills survey of 500 HR leaders showed structured scenario-based assessment underutilised despite appearing in 76% of job postings. On the deployment side, Knockri scaled to 3M assessments annually via governed AI agents, and Sapia.ai reached 10M+ interviews across 77 countries, while EU AI Act compliance infrastructure is driving market growth ($3.47B by 2031). Regulatory enforcement hardened: Morgan Lewis confirmed candidate assessment as an Annex III high-risk system with August 2, 2026 enforcement; UK ICO audited 30+ employers and issued compliance letters to 16, shifting hiring AI governance from best practice to regulated-mainstream. A federal judge denied Workday's motion to dismiss in Mobley v. Workday, establishing vendors as "agents" directly liable for algorithmic outcomes under FEHA — the first ruling of its kind. The national regulatory landscape (Colorado, Illinois, NYC, Texas) produced a compliance patchwork with DOJ challenging state laws while the August 2, 2026 EU AI Act enforcement date approaches. Market bifurcation deepened: validated-framework deployments sustain enterprise use, while systemic bias evidence, audit-methodology failures, and expanding vendor liability continue blocking mid-market expansion.