Vibe Engineering: How to Ship Production Code with AI
Andrej Karpathy coined “vibe coding” in early 2024: let the AI write your code, don’t worry too much about understanding it, just see if it works. It went viral. Everyone was vibe coding.
Most of them shouldn’t have been.
There’s a distinction the hype erased, and it’s the difference between AI that makes you faster and AI that makes you dangerous. This guide is about the right side of it: how to use AI to ship code that survives production, who on your team should be doing it, and how to tell whether it’s actually working.
Vibe coding vs vibe engineering
Vibe coding treats the AI as a black box: prompt in, solution out, don’t look too closely. That’s genuinely fine for a one-off script that nobody will maintain. The problem is it became shorthand for all AI-assisted development, including the code that runs your business.
Vibe engineering is the discipline that scales. You’re still using AI to do the typing. You’re just not trusting it. We draw the full distinction in vibe engineering vs vibe coding:
| Vibe coding | Vibe engineering |
|---|---|
| Trust the AI | Supervise the AI |
| Accept outputs | Question outputs |
| Debug by retrying | Debug by understanding |
| Works until it doesn’t | Catches problems before shipping |
| Treats AI as an expert | Treats AI as fast but fallible |
The AI does the typing. You do the thinking. The moment you can’t explain what a piece of AI-generated code does and why, you’ve crossed from engineering back into coding, and you’re the last line of defence that just stopped checking.
Who should actually use AI tools
Here’s the counter-intuitive part most rollouts get backwards. The instinct is to hand AI tools to juniors and “free up” the seniors. It’s exactly wrong.
AI tools amplify what you already know. Seniors spot when the model is wrong, redirect it, and know when good enough really is. Juniors accept everything, because they can’t yet see the mistakes. There’s a predictable spectrum: juniors say “give me everything,” mid-levels resist out of craft-pride, and seniors use it as a tool. The teams getting real gains give the tools to their skeptical seniors first and let them stay skeptical, because the skill was never prompting, it’s judgement.
The new scarce skill
When the model can build almost anything, the bottleneck stops being implementation and becomes judgement. “How” is cheap now. “What” and “why” are not. We argue this in first principles are the new scarce skill: cheap building punishes vague thinking, because the friction that used to kill bad ideas in the planning meeting is gone. Feed the model confusion and it returns a beautifully engineered version of your confusion, faster than ever.
So the most valuable engineering skill in an AI world isn’t typing speed. It’s the ability to decompose a problem to bedrock and decide what’s actually worth building, before a single line exists.
Measure it honestly, or you’re guessing
The fastest way to fool yourself is to grade AI work on a green test suite. A passing suite proves the code did what it attempted, not that the attempt was worth making. The metric that matters is mergeability: would a senior engineer take this as-is into a codebase they own? Judge it with a held-out reviewer, never the agent that wrote it.
And separate the feeling of fast from actual fast. A controlled METR study found experienced developers felt about 20% faster using AI tools, while actually running roughly 19% slower. If your seniors can’t feel the slowdown from the inside, your dashboards certainly won’t show it. Measure the work, not the vibe.
Make it compound
Done once, vibe engineering makes you faster. Done as a system, it compounds. Capture every solved problem as a reusable skill, then loop those skills toward a goal, that’s how you go from AI that helps you type to an agent that runs a process. Wrap it in the compounding-engineering loop, Plan, Delegate, Assess, Codify, and each cycle makes the next one cheaper.
The tooling matters too. When the cost of a sharp internal tool collapses from a team-quarter to an afternoon, building your own rails becomes the non-negotiable choice. And because agents generate parallel change faster than Git can absorb it, the version-control workflow itself becomes a bottleneck, which is why we’ve been testing whether Jujutsu is a Git superpower for AI coding.
Where this is heading
Vibe engineering is the on-ramp to agentic development: once you can supervise AI writing code, you can supervise agents that run multi-step jobs, the subject of our guide to building AI agents. And the whole point of the discipline is the same as everywhere else on this site: getting from a demo that impresses to a system that’s actually production-ready.
AI tools are amplifiers. They make a senior engineer faster and a careless one dangerous. The tool is the easy part. Teaching the judgement, that’s where the results come from. Let’s talk about building it into your team.
Frequently asked questions
- What is vibe engineering?
- Vibe engineering is using AI to write code while staying in control of it: you delegate the typing, not the thinking. You review every suggestion, understand what the code does, catch the mistakes the model makes, and redirect when it goes off track. It's how you get production code out of AI rather than a demo.
- What's the difference between vibe coding and vibe engineering?
- Vibe coding (Andrej Karpathy's term) means accepting whatever the AI produces without really reading it, fine for a throwaway script. Vibe engineering means using the same tools while staying suspicious, reviewing, understanding, and catching errors before they ship. Same tools, opposite stance. One produces demos; the other produces code worth keeping.
- Should juniors or seniors use AI coding tools?
- Counter-intuitively, seniors. AI tools amplify judgement you already have, seniors spot when the model is wrong and know when 'good enough' really is. Juniors tend to accept everything because they can't yet see the mistakes. Roll AI out to your skeptical seniors first, not your interns.
- How do you measure whether AI-written code is good?
- Not by test-pass rate, a green suite only proves the code did what it attempted, not that the attempt was worth making. The honest metric is mergeability: would a senior engineer take this as-is into the codebase they own? Judge it with a held-out reviewer, never the agent that wrote it.
- Does AI actually make developers faster?
- Only with the discipline. A controlled METR study found experienced developers felt about 20% faster using AI tools while actually running roughly 19% slower. Speed that feels fast isn't speed. The gains are real, but they come from vibe engineering (judgement, review, measurement), not from accepting output on autopilot.