Engineering

Vibe coding with discipline: why AI-assisted dev isn't yolo coding

The 'vibe coding' meme makes AI look like the end of engineering rigour. The teams shipping fastest with Claude Code and Cursor have more tests, more types, more discipline. Not less.

01 May 202610 min readKrypto Forge

"Vibe coding" became a meme in late 2024. The idea, half-joking, was that AI tools let you ship features by feel. Open Cursor, describe what you want, accept the diff, push to production. Don't read the code. Vibe it.

Two years in, the joke has aged poorly. The teams genuinely shipping fast with AI-assisted development don't do less engineering. They do more. More tests, more types, more linting, more eval harnesses, more careful plan-then-execute separation. The AI didn't replace the rigour. It made the rigour cheaper, which is a different thing.

This is the engineering hygiene that makes AI-assisted dev trustworthy, written from the inside.

The vibe coding myth, fairly stated

The myth isn't entirely wrong. The first ten minutes of any AI-assisted dev session can feel like magic. You describe a function, the model writes it, it works, you move on. That experience exists. We're not denying it.

The myth fails when "the first ten minutes" becomes "the production deploy". The function that works on the happy path breaks on the boundary case the model didn't think to test. The diff that touches one file actually requires changes in three. The type that looks fine has a subtle mismatch with the schema two layers down.

Without the engineering layer underneath, AI-assisted code falls apart at the same rate as hand-written code, just faster, because you're producing more of it per hour.

What the disciplined teams actually do

A working pattern, from the studio's own setup and from clients we've helped tune theirs.

Plan mode and execute mode are separate. Claude Code's planning workflow makes this explicit. You describe the goal, the model plans, you review the plan, then you execute. Skipping the plan-review step is the single most common cause of AI-assisted disasters. Read the plan. Argue with it. Then execute.

TDD becomes more important, not less. When the model writes code, writing the test first is the cheapest way to lock in what you actually meant. The model will happily write code that does the wrong thing convincingly. A test pins down the right thing.

// Test first, with a clear name and a clear assertion.
test("normaliseQuantity converts kg to meters when fabric width is given", () => {
  expect(normaliseQuantity({ qty: 1, unit: "kg", widthIn: 44 })).toBeCloseTo(...);
});

// Then ask the model. The test is the spec.

Type systems earn their place. TypeScript strict mode. Rust. Anything that fails at compile time instead of runtime. The model is excellent at producing code that the type system can verify. When the type system rejects the model's output, the feedback is immediate and the next attempt is better. Without types, the same bug reaches production.

Linting and formatting on save. Prettier, eslint, the equivalent. Make the code look the same regardless of who or what wrote it. This isn't aesthetics; it's grep-ability. Six months from now, you want to read the code without flinching at five different brace styles.

CI gates that actually fail builds. Unit tests, type checks, lint checks, build checks. All in CI. AI-assisted dev produces more code per hour, which means more chances to slip something past. The CI is the only consistent reviewer.

Eval harnesses for AI features. If the code calls an LLM, you need an eval. We've covered this in the prompts-for-production post. Without an eval, you don't know that a prompt change made things better or worse. With one, you know in 30 seconds.

The plan-then-execute discipline

Worth a longer note. Claude Code's plan-mode workflow has trained a lot of people into a useful habit.

The pattern:

State the goal.
The model produces a plan: list of files, list of changes, list of risks.
You read the plan. You push back. The model revises.
Only when the plan is good does anyone write code.

We use this for almost every non-trivial change. For trivial changes (rename a variable, fix a one-line bug), we skip it. For anything touching three or more files, or anything in a critical path, the plan-review step is non-negotiable.

The studio has been burned often enough by skipping it that it's now muscle memory. The cost is 2-3 minutes per task. The benefit is roughly zero "the AI just rewrote my schema" incidents per quarter.

The plan-mode pause is the single highest-leverage habit in AI-assisted development. It costs minutes. It saves days.

MCP as a safety layer

The Model Context Protocol shows up here too. When AI tools talk to your systems through MCP, you get a natural permissions boundary. The MCP server defines what's allowed. The AI tool can ask for whatever it wants; the server enforces.

In our setup:

Read-only access to the database is wide-open. The model can SELECT anything to understand the schema.
Writes are narrow. The model can only mutate through specific tool calls that have their own validation.
Production data is never exposed to the IDE-side AI directly. Sample data, anonymised, lives in a dev DB.

This isn't AI-specific paranoia. It's standard least-privilege applied to a new kind of tool.

The pair-with-AI pattern

The mental model that actually works for us: treat the AI like a fast, fluent junior who knows everything and understands nothing.

A junior who knows everything will:

Recall the API of every library you've ever used.
Write a 200-line scaffolding in 30 seconds.
Produce reasonable-looking code on almost any topic.

A junior who understands nothing will:

Confidently use APIs that don't exist.
Write code that looks right and does the wrong thing.
Miss the implication of the change two files away.

You wouldn't merge a junior's PR without reading it. Don't merge the AI's either. The same review discipline applies. AI-assisted dev is faster because the code is faster to write, not because it's safer to skip review.

What we shipped because of this discipline

Concrete: every project in our portfolio (Schiffli ERP, Paraslace, TaskBolt, WhatSender, our MCP servers) was built with heavy AI-assistance. Every one of them has:

A typed schema and a typed API layer.
Unit tests for the non-trivial logic.
Integration tests for the critical flows.
A CI pipeline that fails the build on type errors, lint errors, test failures.
An eval harness for the LLM-touching parts.
A plan-then-execute habit for every non-trivial change.

This is not slower than not having it. It's faster, measured in time-to-production-stability. The "speed" of skipping discipline is a debt that comes due in week three.

The honest limits

A few places we're still figuring out.

Refactors across large codebases. Even the best AI tools struggle when the change touches dozens of files with subtle interactions. The plan-mode helps; pre-existing tests help more. We still do these mostly by hand with AI assistance, not the other way around.

Architecture decisions. The model can summarise tradeoffs and write ADRs. It cannot, yet, replace the judgement call. We make those, write the ADR, then use the model to execute.

Debugging hard concurrency bugs. The model is a great rubber duck and a terrible primary debugger for race conditions, deadlocks, and other things that require holding the full execution context in your head. We use it for hypothesis generation, then we do the actual debugging.

The takeaway

AI-assisted development is not yolo coding. The fastest teams are not the most casual; they are the most disciplined. Tests, types, linting, CI, plan-then-execute, MCP-bounded tool access. None of that is glamorous. All of it is the difference between shipping confidently and shipping anxiously.

The "vibe" is the marketing. The discipline is the work.

Engineering didn't get less important. It got more leveraged. Same people, more output, only if the floor was solid to begin with.