Natural language to code for non-developers

The AI landscape doesn't move in one direction — it lurches. Some techniques leap from experiment to table stakes in a single quarter; others stall against regulatory walls, technical ceilings, or organisational inertia that no amount of hype can dislodge. Knowing which is which is the hard part. The State of Play cuts through the noise with a rigorously maintained index of AI techniques across every major business domain — classified by maturity, evidenced by real-world adoption, and updated daily so you always know where you stand relative to the field. Stop guessing. Start knowing.

AI Maturity by Domain

Each dot marks the weighted maturity of practices within a domain — hover for a brief summary, click for more detail

DOMAIN

BLEEDING EDGEESTABLISHED

LEADING EDGE

TRAJECTORY— Stalled

Tools enabling non-technical users to generate functional code or automations from plain language descriptions. Includes no-code/low-code AI builders and spreadsheet-to-app tools; distinct from chat-based code assistance which targets developers.

OVERVIEW

Natural language to code for non-developers is an emerging category of tools that enable business users, citizen developers, and non-technical roles to create functional applications, automations, and workflows through conversational interfaces rather than manual coding. Unlike chat-based code assistance which targets professional developers, these tools aim to democratize application development by accepting plain-English requirements and generating executable workflows or applications—a key promise of low-code and no-code platforms as they integrate generative AI capabilities. The core tension is between capability and correctness: while foundational research shows LLMs can meaningfully improve reasoning and code understanding when trained on code data, early real-world evaluations reveal significant gaps in functional correctness and robustness that limit production readiness. Enterprise adoption momentum exists, driven by cost pressures and developer shortages, but implementation risks around data quality, intellectual property, and security remain largely unresolved.

CURRENT LANDSCAPE

By April 2026, natural language to code for non-developers had solidified as operational mainstream for departmental automation and internal tools, with accelerating platform consolidation and refined feature focus. Microsoft Power Platform continued its ecosystem-wide expansion: April 2026 announcements introduced Canvas Apps MCP Authoring Plugin (enabling non-developers to use GitHub Copilot, Claude Code, or any MCP-compatible agent to describe and build apps through natural language conversation) and expanded external tool support for generative pages to GA across all public clouds. Bubble reported ongoing rollout of AI Agent with improved error detection (JSON validation, issue explanation) and compound editing (simultaneous UI/workflow/database modifications), signaling shift from "generate complete app" toward "interactive scaffolding and refinement" workflows. Retool reached 50% non-developer user base (up 40% YoY) with reported 66% of companies implementing AI productivity mandates, while confirming realistic limitations: AppGen creates functional drafts in minutes but requires significant visual builder refinement for production polish. Market adoption metrics remained robust: 80% of low-code users from outside IT, $52B market in 2026 (up from $26B in 2025), citizen developers outnumbering professional developers 4:1, and 70% of new enterprise apps using low-code/no-code. However, deployment constraints from late 2025 persisted unchanged. Real-world adoption remained concentrated in bounded, non-critical use cases: only 9% of Bubble builders deploy AI-coded solutions for business-critical applications; 25-30% of no-code projects require rewriting in custom code within 2 years due to performance limits, customization ceilings, and vendor lock-in. Compliance barriers continued: regulated organizations maintained paused Copilot rollouts. Case studies from Q1 2026 (BluBinder $150K savings, MyAskAI 40K+ users, Byword 100K+ articles, Faceless 850K+ users) documented speed and cost advantages in bounded contexts while reinforcing the maturity ceiling: non-developers achieved faster MVP cycles and departmental automation but remained dependent on developer involvement for scaling, compliance, and production durability.

By May 2026, May evidence crystallized the category maturity ceiling and production readiness barriers. Microsoft Power Platform 2026 Wave 1 introduced MCP Apps (general availability), enabling non-developers to generate rich UI experiences in Copilot chat with Power Apps agents rendering forms and tables—demonstrating ecosystem-wide integration of LLM-powered code generation. Retool's enterprise case study confirmed 50% non-developer penetration and $300K-$1M annual savings per organization, while acknowledging that AI-generated drafts require substantial visual refinement for production polish. However, independent security research revealed systematic production barriers: VibeEval's uniform scan of 1,514 live applications across Lovable, Bolt.new, Cursor, Replit, and V0 found 81% shipped with at least one critical or high-severity issue, with per-platform critical rates ranging from 24% (V0) to 58% (Lovable), and median 7 findings per app. ByteIota's developer survey (2,847 respondents) documented a verification bottleneck—developers now spend 11.4 hours/week reviewing AI code vs. 9.8 hours writing, with 43% of AI-generated changes requiring production debugging despite passing QA. Veracode's empirical analysis of 100+ LLMs across 80 real-world tasks showed 45% introduce OWASP Top 10 vulnerabilities, with no improvement trend despite model advances. The ACM Technology Policy Council formally acknowledged vibe coding's mainstream adoption while documenting systematic security risks and the need for strong governance, formal validation, and human oversight before production deployment. Market consolidation accelerated: Gartner forecasted $44.5B market in 2026 (19% CAGR) with 75% of new enterprise apps built via low-code by 2026, signaling mainstream institutional adoption. The category remained operationally mainstream for rapid MVP cycles and departmental automation with documented ROI (50-70% cost savings, multi-week acceleration), but systematic security vulnerabilities, production failure rates, verification costs, and technical debt generation established hard barriers preventing strategic expansion into mission-critical or regulated deployment.

TIER HISTORY

ResearchMar-2023 → Mar-2023

Bleeding EdgeMar-2023 → Jan-2025

Leading EdgeJan-2025 → present

EVIDENCE (100)

AI-Generated Code Poses Major Security Risks in Nearly Half of All Development Tasks, Veracode Research RevealsIndustry Reports2026-05-06

— Veracode analysis of 100+ LLMs across 80 real-world coding tasks: 45% introduce OWASP Top 10 vulnerabilities; when choosing between secure/insecure methods, LLMs select insecure 45% of the time; no improvement trend despite model advances.

Building Intelligent Apps with Confidence and Control on Power PlatformProduct Launches2026-05-04

— Microsoft Power Platform 2026 Wave 1 GA: generative pages now support external AI code generation tools (GitHub Copilot, Claude Code) for code-first NL-to-app workflows, enabling non-developers to generate UI and logic via natural language.

How Secure Is an AI-Generated App? 2026 BenchmarkAdoption Metrics2026-05-01

— 1,514 deployed apps across 5 major platforms (Lovable, Bolt.new, Cursor, Replit, V0) scanned; 81% shipped with critical/high security issues, with median 7 findings per app and 47% containing critical vulnerabilities.

Vibe coding is making developers faster. ACM says it is also making software less secure.Industry Reports2026-05-01

— ACM Technology Policy Council TechBrief on vibe coding: mainstream institutional recognition documenting speed gains coupled to security vulnerabilities, technical debt, and agentic-specific risks requiring strong governance and human oversight.

Developers Spend 11.4 Hours Reviewing AI Code | byteiotaAdoption Metrics2026-04-29

— Survey of 2,847 developers: developers now spend 11.4 hrs/week reviewing AI code vs. 9.8 hrs writing; 43% of AI code requires debugging in production; systematic gaps in error handling, idempotency, retry logic, and observability.

12 Best Low-Code Platforms in 2026 - CatDoesIndustry Reports2026-04-29

— Market analysis: low-code market projected $44.5B by 2026 (19% CAGR); Gartner forecasts 75% of new apps built with low-code by 2026; platform consolidation shows gap between AI-bolted legacy tools and AI-native architectures.

MCP Apps now available in Copilot chatProduct Launches2026-04-28

— Microsoft 365 Copilot announces GA MCP Apps: agents generate rich UI experiences in chat including Power Apps with form and table visualizations, enabling non-developers to request and create data-driven apps conversationally.

Enterprise Application Development Tools for Scale - RetoolCase Studies2026-04-28

— Retool enterprise case study: 50% non-developer user base (+40% YoY); 66% of companies implementing AI productivity mandates; AppGen creates functional app drafts in minutes but requires significant visual builder refinement for production.

HISTORY

2023-H1: Microsoft shipped Copilot AI across Power Platform (Automate, Apps, Pages, Virtual Agents) enabling non-developers to generate workflows and apps via natural language; research revealed both capability gains (12x code performance improvement with code-aware pre-training) and significant limitations (19-28% of LLM-generated code fails rigorous testing); 50% of enterprise IT leaders signaled intent to adopt low-code/no-code by 2025 despite known implementation risks.
2023-H2: Bubble launched Bubble AI with native mobile support and natural language capabilities, expanding no-code NL-to-code beyond Power Platform ecosystem; Microsoft's Ignite 2023 announcements broadened Copilot reach but community feedback revealed implementation gaps (promised editing capabilities unavailable); research catalogued systematic weaknesses in neural code generation and comprehensive effectiveness assessments confirmed correctness deficits remained primary barrier to autonomous non-developer code generation at scale.
2024-Q1: Citizen developers deployed custom Copilots for real-world tasks (data querying in ticketing systems), and Maker Copilot expanded to Asia-Pacific regions, signaling geographic adoption; however, ICLR 2024 research revealed critical trustworthiness gap (LLM self-consistency failures), NoviCode benchmark showed significant difficulty in translating novice language to complex code, and analyst reports flagged AI investment bubble with only 20% executive confidence, constraining enterprise momentum despite vendor feature velocity.
2024-Q2: Microsoft Power Apps Copilot reached general availability with 25M+ monthly users achieving 88% fewer clicks and 60% faster app builds, confirming production-scale deployment; however, Q2 research exposed critical gaps: embedding-based evaluation metrics showed weak correlation (0.16) with actual code correctness, and NaturalCodeBench revealed GPT-4 achieving only 53% pass rate on real-world queries versus synthetic benchmarks. Practitioner experience showed 56% of tech professionals using AI coding tools daily with reported productivity gains, but governance challenges and organizational readiness barriers (63% viewing Power Platform as Excel-like) constrained full-scale enterprise adoption.
2024-Q3: Forrester TEI analysis quantified Power Platform ROI at 224% with 35% development acceleration, providing empirical validation of enterprise value capture at scale; Bubble and no-code community surveys showed 64% confidence in category dominance by 2030. Yet Q3 research exposed critical unresolved limitations: ACL papers revealed benchmark contamination inflating performance metrics, and security research showed safety guardrails fail 80%+ when natural language inputs are converted to code. Case research documented adoption challenges in multinational deployments. Category remained on bleeding-edge tier with measurable production evidence balanced against material gaps in safety, evaluation reliability, and real-world correctness requiring continued governance.
2024-Q4: Microsoft continued feature iteration with Power Apps natural language editing capabilities entering preview. The broader no-code ecosystem (Quickbase, Appian, Pega, Tray.ai, Kissflow) integrated NL-to-code for integrations and developer experience improvements. Academic research in Q4 2024 addressed practical deployment challenges: EMNLP papers introduced methods for bootstrapping NL-to-code with minimal labeled data (85% supervised performance with 1 example) and improved benchmarking methodology. Critical limitations persisted: vendor lock-in, security/compliance constraints, and scalability concerns remained barriers to enterprise adoption. Enterprise governance and transparency requirements increasingly recognized as prerequisites for production deployment.
2025-Q1: Enterprise adoption momentum accelerated: Bubble Q1 survey reported 100% of customers achieved lower development costs, 96% faster time-to-market, 88% achieving 3x+ faster development, with 85% saving $300K–$1M annually. Gartner projections targeted 70% of new applications using low-code/no-code by end of 2025, with 80% of users outside IT by 2026. Microsoft Power Apps Copilot expanded to model-driven apps (GA) and canvas app building (preview) with multi-region support. Enterprise adoption reached 65% across organizations. However, independent critical analysis highlighted persistent risks: vendor lock-in, security vulnerabilities, scalability constraints, and hidden operational costs. Gap between "no skills needed" marketing and real-world implementation remained significant—troubleshooting and scaling beyond departmental scope typically required developer involvement. Category transitioned to operational mainstream for greenfield departmental applications while governance and portability barriers constrained strategic replacement of professional development.
2025-Q2: Developer adoption of codegen tools broadened (71% Copilot, 26.5% v0) while critical compliance barriers emerged: 73% of regulated organizations paused Copilot rollouts due to security/compliance concerns, with only 16% in production use. Developers reported realistic quality feedback (surface-level output, extensive refinement needed, 1-in-5 suggestions containing errors). Builder.ai collapse highlighted vendor lock-in risks. Mid-market analysis confirmed 50-70% cost savings for departmental use cases but emphasized persistent constraints: customization limits, performance/scalability ceilings, and vendor dependency prevented strategic adoption. Category remained leading-edge with documented production ROI, but growing evidence of compliance and correctness barriers limited expansion beyond risk-tolerant, non-critical applications.
2025-Q3: Vibe coding emerged as distinct market category with exponential growth in the multi-billion-dollar no-code AI market segment. Bubble reported 80% automation of mobile app builds via NL-to-code, but final 20% (environment setup, deployment) remained manual bottleneck. Research revealed non-developers struggle to assess AI-generated code correctness, limiting autonomous workflows. Practitioner feedback acknowledged real productivity gains but persistent tool immaturity (clunky, frequent failures). Enterprise compliance barriers from Q2 remained unresolved: 73% of regulated orgs still in paused rollout status. Category evidence crystallized around narrow, high-value niches (departmental automation, rapid prototyping) with clear cost savings but hard technical and governance limits preventing mission-critical adoption.
2025-Q4: Non-developer application building consolidated as mainstream operational practice: Bubble reached 4.69 million deployed apps globally, Retool's Q4 2025 survey confirmed ops managers and business leaders actively shipping dashboards and tools via NL-to-code. Microsoft Power Apps Copilot GA across model-driven and canvas apps; Bubble AI Agent launch expanded production-grade NL-to-app capabilities. Critical deployment ceiling documented: only 9% of Bubble builders use AI coding for business-critical applications, revealing durability/correctness constraints. Compliance barriers persisted (73% regulated orgs in paused rollout). Research syntheses documented ongoing code quality gaps. Category transitioned from "bleeding-edge" to operational mainstream for departmental automation with clear ROI, but hard technical limits (scale, compliance, correctness) and governance requirements prevented strategic replacement of professional development teams.
2026-Jan: Momentum continued into 2026 with platform consolidation and refined market segmentation. Bubble's AI tooling reached GA with production-grade app generation from NL; Microsoft Power Apps Copilot remained broadly deployed; Retool and independent analysts categorized emerging "vibe coding" sub-segment (NL-to-code for non-technical users) as distinct from developer-facing codegen. MIT Technology Review named Generative Coding a 2026 breakthrough technology, citing 30% AI-written code at Microsoft and 25% at Google as validation of adoption scale in tech industry. However, January research from SANER 2026 and industry surveys highlighted continued tensions: prompt quality and natural language proficiency strongly influenced code correctness; developer-side adoption paradox persisted (ubiquitous use coupled with profound skepticism about reliability and security). Non-developers still faced a maturity ceiling—platforms delivered speed but not autonomy for mission-critical work. The category remained operationally mainstream for departmental automation and rapid prototyping while fundamental correctness and governance barriers persisted.
2026-Feb: Platform momentum accelerated with Bubble announcing enhanced AI Agent capabilities and mobile plugin builder rollout for Q2 2026, while real-world deployment evidence solidified: Retool survey of 817 customers confirmed 35% had replaced SaaS tools with NL-to-code builds and 51% had production deployments with significant cost savings (ClickUp, Harmonic case studies); Bubble documented production apps at scale (My AskAI 40k+ users, Seagate 5x time savings, City of Atlanta procurement). However, critical maturity ceiling reasserted: industry analysis documented 92% developer adoption paired with trust decline to 60%, AI-generated code carrying 1.7x more major issues, and 45% containing OWASP vulnerabilities; case studies of platform failures (Enrichlead collapse, Lovable data leaks) and practitioner findings (Bubble AI UI generation requiring backend wiring via additional AI agent) revealed that full automation remained elusive even as speed and cost benefits persisted in bounded, non-critical use cases.
2026-Mar: Enterprise adoption continued with Retool reaching 50% non-developer user base (+40% YoY growth) and shipping Assist improvements (20% faster generation, 40-50% token efficiency gains); Microsoft launched vibe.powerapps.com public preview for NL-to-app generation; Bubble upgraded to Claude Sonnet 4.6 with 2x faster scaffolding and improved multi-step editing. Real-world deployment expanded: Gartner forecasted 75% of 2026 enterprise apps built via low-code (80% non-IT users), with named cases (Bendigo Bank 25 apps/18 months, US Air Force $83M savings). Non-developer platforms (Lovable 8M users, Bolt.new 5M, v0 4M) drove mainstream adoption. Yet critical barriers intensified: security research documented 45% of AI code containing exploitable vulnerabilities vs 31% manual code; organizations reported 30-41% technical debt increase within 6 months of AI adoption; Gartner projected $1.5T in technical debt by 2027 as 60% of new software became AI-generated. Vendor lock-in and organizational risk perception constrained adoption momentum despite documented productivity gains (50-70% cost savings, 2-8 week MVP cycles).
2026-Apr: Platform momentum continued with Microsoft M365 Copilot reaching GA in model-driven Power Apps (April 15) alongside new app skills (data entry, visualization, summarization); Bubble refined AI Agent with JSON validation and compound editing. Deployment scale reached inflection: Y Combinator W25 cohort data showed 25% of startups operating with 95%+ AI-generated codebases; vendor CEO statements confirmed 20-30% of code at Microsoft/Google now AI-written; Stack Overflow survey of 65K developers showed 62% use AI daily, 46% on Copilot-enabled files. Non-developer adoption specifically: 63% of vibe coding users are non-developers (Hostinger 1M users building real products: websites, ecommerce, SaaS); Gartner forecast 40% of enterprise software via vibe coding by 2028. However, production reliability barriers persisted: Lightrun survey of 200 enterprise SRE/DevOps leaders found 43% of AI-generated code required debugging in production despite passing QA; documented incident showed e-commerce platform lost 6.3M orders from AI code error with industry defect rates 2.3x baseline; security analysis (Veracode/Invicti) confirmed AI-generated code at 2.74x higher vulnerability rate with 45% introducing CWEs, and Fortune 50 enterprises saw 10x spike in security findings post-AI adoption. Category confirmed as operational mainstream for departmental automation and rapid MVP cycles with documented cost savings (3-10x faster shipping, $300K-$1M annually per org) but material production reliability and security constraints preventing expansion into mission-critical or regulated deployment—only 9% of Bubble builders deploy AI-coded solutions for business-critical applications, 25-30% of projects require rewriting within 2 years due to performance/scalability ceilings and vendor lock-in.
2026-May: Security evidence for AI-generated code crystallized at institutional scale. Vibe-eval's scan of 1,514 live apps across five major platforms (Lovable, Bolt.new, Cursor, Replit, V0) found 81% shipped with at least one critical or high-severity vulnerability, median 7 findings per app. Veracode's analysis of 100+ LLMs across 80 real-world tasks showed 45% introduce OWASP Top 10 vulnerabilities with no improvement trend across model generations; the ACM Technology Policy Council formally acknowledged vibe coding's mainstream adoption while documenting systematic security risks requiring governance and human oversight. Platform expansion continued: Microsoft Power Platform 2026 Wave 1 shipped MCP Apps GA enabling non-developers to generate rich Power Apps UI through Copilot chat; a developer survey of 2,847 respondents found developers now spend 11.4 hours per week reviewing AI code versus 9.8 hours writing it, with 43% of AI-generated changes requiring production debugging. Market consolidation reached $44.5B projected (Gartner, 19% CAGR) with 75% of new enterprise apps via low-code—but the security and verification overhead evidence established that productivity gains at MVP stage are partially offset by production maintenance costs, preventing expansion into regulated or mission-critical deployment contexts.

TOOLS

Microsoft Power Apps Copilot Bubble AI Retool