AI Development

Vibe Coding

Shipping

Vibe Coding Is Not Enough: What Happens After the Demo

Codse Tech

March 1, 2026

Andrej Karpathy coined "vibe coding" in February 2025. Describe what you want, accept the AI's output, move on. Collins Dictionary named it Word of the Year. Lovable hit $200M ARR. Cursor's parent company hit a $9.9B valuation. Real money, real adoption.

A year later, Karpathy moved on from his own term. He calls the professional default "agentic engineering" now, meaning AI-first but with actual oversight. His original tweet was, in his words, "a shower of thoughts throwaway tweet."

That tracks. You can go from idea to working prototype in a few hours with these tools. Non-technical founders can build functional apps. Developers can move at speeds that would have seemed absurd three years ago.

But there's a gap between "it works on my screen" and "it works for 10,000 paying users." That gap is where most vibe-coded projects go to die.

What vibe coding gets right

These tools deserve credit. They've changed what one person can build in a weekend.

Cursor ($9.9B valuation) made IDE-level AI editing feel natural. Multi-file changes, codebase awareness, structured code that mostly works the first time.

Claude Code is a terminal-based coding agent. Point it at a refactoring task and it reads files, makes changes, runs tests. For experienced developers, it's a genuine multiplier.

Lovable ($6.6B valuation, $200M ARR) lets non-technical people describe an app and get something usable back. Full-stack apps from prompts. Landing pages in minutes.

Bolt.new does the same in a browser. Quick demos, fast validation, zero setup.

v0 from Vercel generates React components with shadcn/ui. Best tool we've used for UI prototyping.

Replit Agent handles conversation-to-app flows with built-in hosting.

Each tool works well at the right phase of a project. The problem starts when the prototype phase ends and nobody switches gears.

The numbers nobody mentions

The hype cycle around vibe coding skipped over some findings that matter if you're building software for actual users.

CodeRabbit (December 2025) analyzed 470 GitHub pull requests, 320 AI-co-authored and 150 human-only. AI-generated code produced 1.7x more issues overall. On security: 2.74x more XSS vulnerabilities, 1.88x more improper password handling, 1.91x more insecure object references. Logic and correctness errors were 75% higher in AI-authored code. This wasn't comparing AI to senior engineers at Google. It was comparing AI to regular development teams.

METR study (July 2025) ran a randomized controlled trial with 16 experienced open-source developers across 246 tasks. Before the study, developers predicted AI tools would make them 24% faster. After the study, they still believed they were 20% faster. Actual result: 19% slower. That's a 39-percentage-point gap between how fast developers think they are with AI and how fast they actually are.

Lovable security incident (2025): Security researcher Matt Palmer found that Lovable's generated Supabase backends had misconfigured row-level security. A scan of 1,645 projects from Lovable's showcase turned up 303 vulnerable endpoints across 170 apps, exposing emails, phone numbers, payment details, and API keys. Lovable shipped a "security scan" feature in response, but it only checked whether RLS existed, not whether it was correctly configured.

None of this means AI coding tools are bad. It means accepting AI output without review is a real risk, and that risk gets worse the more users you have.

What breaks between demo and production

A prototype works because the happy path works. Production software works because everything else is handled too.

Security

AI-generated code skips input validation, uses insecure defaults, hardcodes secrets, and writes auth flows that look right but break under scrutiny. The CodeRabbit data says this isn't an edge case. It's the default.

We've seen all of these in vibe-coded apps clients brought to us:

API keys sitting in client-side code
SQL queries built by string concatenation
Auth tokens in localStorage with no expiration
No rate limiting on public endpoints
CORS set to allow everything

Any security scanner would catch these. But AI tools don't run security scanners before handing you the code.

Error handling

Vibe-coded apps handle the success case. When an API call fails, when a database connection drops, when a user submits garbage, the app crashes or shows a blank screen or fails silently. Production needs error boundaries, retry logic, fallback states, and error messages that actually say something.

Testing

Most AI-generated codebases have zero tests. Not "not enough." Zero. The AI builds the feature, you accept it, nobody writes the test. Next week you change something and break it. Without tests, every change is a coin flip.

Data validation

AI tools trust user input. They generate forms that accept anything, APIs that process whatever comes in, database queries that assume data is clean. Data is never clean. Validation at every boundary is the difference between a working app and a corrupted database.

Performance

Vibe-coded apps work fine with 10 users. At 1,000 they start lagging. At 10,000 they fall over. The reasons are always the same: unoptimized queries, missing indexes, no pagination, entire datasets loaded into memory, component trees re-rendering on every state change.

Observability

Something breaks in production. A user says "it's broken." You have no logs, no error tracking, no performance monitoring. You can't reproduce it locally because you don't have their data or environment. You're flying blind.

Edge cases

AI writes code for the expected flow. Then a user has an apostrophe in their name. A timezone offset crosses a date boundary. Someone uploads a 500MB file instead of 5MB. A payment fails halfway through. These don't show up in the demo. They show up when real people use your product.

The open source backlash

This isn't just a product problem. Unreviewed AI code is hitting open source too.

Daniel Stenberg shut down cURL's bug bounty after AI-generated submissions hit 20% of reports. They looked plausible but were fabricated. Maintainer time wasted on bugs that didn't exist.

Mitchell Hashimoto banned AI-generated code from Ghostty entirely.

Steve Ruiz closed all external PRs to tldraw because the quality of AI-generated contributions had dropped below useful.

A January 2026 paper titled "Vibe Coding Kills Open Source" laid out the pattern: AI tools lower the barrier to contributing, but they also lower the quality floor. More PRs, more noise, more work for maintainers.

If the same quality issues frustrate experienced open source maintainers, they'll frustrate your users too.

The middle ground

We're not anti-AI. We use Claude Code, Cursor, and v0 every day. They've changed how fast we deliver.

But the approach that actually works is: vibe code the prototype, engineer the product.

Phase 1: Prototype (1-3 days)

Use Lovable or Bolt to validate the idea with a clickable prototype. Show it to potential users. Get feedback before writing production code. If the idea doesn't hold up, you've lost days, not months.

Phase 2: Build (1-2 weeks)

Switch to Cursor and Claude Code for the real codebase. Review every file. Run a security scanner. Write tests for critical paths. Set up error tracking (Sentry, LogRocket). Add structured logging. Validate inputs at every boundary.

This is where AI tools save the most time. Not by replacing engineering, but by speeding it up. A developer with Claude Code builds in 2 weeks what used to take 6. The output still needs review, but the throughput is real.

Phase 3: Ship (1-3 days)

App Store assets, metadata, deployment pipeline, monitoring. Mostly mechanical, but skip any step and you get a rejected submission or a blind spot in production.

What it costs: US agency vs us

The pricing gap has widened because of AI tools, not despite them.

Scope	US agency	Codse (AI-augmented)
Vibe-coded prototype + validation	$5-10K	$2-4K
Production MVP (auth, payments, core features)	$25-50K	$8-15K
Full production app + App Store submission	$50-100K	$18-35K
Monthly retainer (maintenance + iteration)	$8-15K/mo	$3-6K/mo

Two reasons for the gap. AI tools like Claude Code and Cursor have compressed dev timelines by 50-70%, and we use them on everything. Our operating costs are also lower than a US or EU shop, so those time savings go directly to the client.

Same stack. Same tools. Same quality bar. Fewer hours billed and lower overhead.

What we use at Codse

We've settled on a stack for the kind of work we do: production apps for founders and busy builders who need to ship.

Claude Code for complex refactoring, multi-file changes, autonomous tasks. Primary coding agent.
Cursor for IDE-level editing and structured changes in large codebases. Daily driver.
v0 for quick UI prototyping with React and shadcn/ui.
Direct Claude API for production integrations where we need full control over prompts, tool use, and outputs.

What we don't do: accept AI output without review. Every PR gets a security check, a logic review, and tests on critical paths. The AI writes fast. We make sure it writes safe.

When to vibe code and when to hire help

Vibe code it yourself if you're validating an idea, building an internal tool, you can review what the AI generates, or the stakes are low (no payments, no user data, no compliance).

Hire help if real users will depend on the software, you're handling payments or sensitive data, you need App Store or Play Store approval, you can't review the code yourself, or your nights and weekends are worth more than the cost of hiring someone.

Two years ago there was no middle ground between "do everything yourself with AI tools" and "pay a US agency $80K." There is now.

AI integration services

We embed AI into your existing product with production-grade security, testing, and monitoring.

Explore service

AI agent development

From prototype to production agent with tool use, guardrails, and measurable reliability.

Explore service

FAQ: Vibe coding in production

What is vibe coding?+

Vibe coding is a term coined by Andrej Karpathy in February 2025. It describes a workflow where you tell an AI what you want in natural language, accept the generated code without deep review, and iterate by describing changes. Tools like Cursor, Claude Code, Lovable, and Bolt enable this workflow.

Is vibe coding safe for production apps?+

On its own, no. CodeRabbit found AI-generated code has 1.7x more issues than human code. Vibe coding works well for prototyping and validation. Production apps need a security review, tests, error handling, and monitoring on top of what the AI generates.

Which vibe coding tool should I use?+

Lovable or Bolt for rapid prototypes. Cursor or Claude Code for building production code with AI assistance. v0 for React UI components. Most projects benefit from using two or three tools at different stages rather than picking one.

How much does it cost to turn a vibe-coded prototype into a production app?+

A US agency charges $25-50K for a production MVP. We deliver the same scope for $8-15K because AI tools compress the timeline and our operating costs are lower.

Can non-technical founders ship production apps with vibe coding?+

You can build a solid prototype with Lovable. Shipping a production app that handles payments, user data, and real-world edge cases still needs engineering. The practical path: validate the idea yourself, then bring in a team for the production build.

vibe coding

vibe coding tools

Cursor

Claude Code

Lovable

production software

AI generated code