Vibe Coding Meets Spec Engineering – What Building With AI Agents Really Looks Like

Vibe Coding Meets Spec Engineering – What Building With AI Agents Really Looks Like

One is great for exploration, the other for accumulation. Here’s what happens when you use both.

Vibe coding is having a moment. Andrej Karpathy coined the term, and suddenly everyone’s doing it, you open a chat, describe what you want in plain English, let the AI generate code, and just go with the vibes. Accept all, run it, see what happens, iterate.

It’s exhilarating. You can go from zero to a working prototype in an afternoon. You can explore ideas you’d never have spent a week building by hand. You can try three architectures before lunch and pick the one that feels right.

But here’s what I’ve noticed: vibe coding is incredible for exploration and terrible for accumulation. Every session starts from zero. The AI doesn’t remember what worked last time, what broke, or what conventions you settled on. You rediscover the same bugs. You re-explain the same constraints. The vibes are great until they’re not.

That’s where spec engineering comes in. And using the two together can be powerful in my experience.

What Is Vibe Coding, Really?

Vibe coding is coding by intent rather than implementation. You describe your end product, not how to build it. The AI generates the code, you run it, and you steer through feedback loops:

  • “Build me a REST API for managing grants”
  • “That 500 error — fix it”
  • “Add pagination”
  • “Make it faster”

It’s conversational, intuitive, and fast. You don’t need to know the exact syntax, the right library, or the framework conventions. You just need a clear idea of the outcome and a willingness to iterate.

This is genuinely powerful for:

  • Rapid prototyping — testing an idea before committing to it
  • Exploring unfamiliar territory — trying a new framework or API you’ve never used
  • Quick discovery — figuring out what’s possible before figuring out what’s right
  • Low-stakes experiments — throwaway scripts, one-off tools, proof-of-concepts

Vibe coding lowers the activation energy for building. That matters enormously.

Where Vibe Coding Hits a Wall.

The problem shows up when the prototype needs to become a product. Or when you come back to the project next week. Or when the codebase has almost a hundred files instead of two.

Vibe coding alone struggles with:

  • Consistency — the AI doesn’t remember that your project uses uv, not pip, or that secrets go through Key Vault, never hardcoded
  • Accumulated knowledge — that psycopg.pool doesn’t exist in psycopg 3 (it’s psycopg_pool), or that PostgreSQL is at hostname pgvector inside your devcontainer
  • Conventions at scale — how agents should be structured, how tests should be written, how new features should be wired in, how my AI governance polices should be implemented
  • Avoiding past mistakes — bugs you already fixed last Tuesday will reappear on Wednesday

You end up spending half your time re-correcting the agent. “No, use uv.” “No, don’t hardcode that key.” “No, the import is wrong.” Each correction is a prompt you’ve written before. The vibes evaporate fast when you’re debugging the same issues for the third time.

What Spec Engineering Adds.

Spec engineering is the practice of writing structured markdown files in the repo that the AI agent reads automatically, every time, every session. No prompting required.

Here’s what that looks like in practice:

  • AGENTS.md — the full repo map, build commands, coding conventions, security mandates, and step-by-step checklists for common tasks
  • LESSONS.md — accumulated pitfalls, bugs you’ve hit before, and decisions you’ve made (with rationale)
  • BACKLOGS.md — what needs building, with user stories and acceptance criteria

These aren’t documentation for humans (though humans can read them). They’re persistent instructions for AI agents. They’re the institutional memory that vibe coding lacks.

When the agent reads AGENTS.md, it already knows the rules:

## Golden Rules (Non-Negotiable)
- **uv only**: use `uv pip install …` and `uv run …` (no raw `pip` commands).
- **No secrets in code**: production secrets must flow through Azure Key Vault.
- **No module-level network calls**: create clients lazily via `get_*()` singletons.
- **Agents never call other agents**: the Supervisor mediates all multi-step flows.

When it reads LESSONS.md, it knows what’s gone wrong before:

### PostgreSQL / pgvector
- **psycopg 3 ships the connection pool as a separate package** `psycopg_pool`
— use `from psycopg_pool import ConnectionPool`, NOT `import psycopg.pool`.
- Inside the devcontainer, pgvector is at hostname `pgvector`, NOT `localhost`.

That bug you hit on Tuesday? It’s in LESSONS.md now. You’ll never hit it again. Neither will the AI.

When it reads BACKLOGS.md, it knows when something is done:

### 10. Vertical POCs (Comply / Legal / Grant)
**Acceptance Criteria:**
- [x] At least one vertical POC built with domain-specific agents
- [x] Vertical agents reuse the shared foundation
- [x] Demo-ready within one sprint

No ambiguity. The spec is the prompt you never have to write.

The Intersection: Where It Gets Powerful.

Here’s the insight that changed how I build: vibe coding and spec engineering aren’t opposites, they’re phases.

Vibe coding is how you explore. You start with an open-ended question: “Could we build a grant search engine?” You spin up a quick prototype, try different approaches, see what the data looks like, discover which API is actually usable. You’re in discovery mode, fast, loose, creative.

Spec engineering is how you compound. Once you know what works, you write it down. The conventions that emerged during exploration become rules in AGENTS.md. The bugs you hit become entries in LESSONS.md. The feature you prototyped becomes a user story with acceptance criteria in BACKLOGS.md.

The next time you (or the AI) touch that area of the codebase, you don’t rediscover, you build on top.

The workflow looks like this:

  1. Vibe — explore an idea conversationally. Try things. Break things. Let the AI generate freely.
  2. Capture — when something works (or breaks in an instructive way), write it into your spec files.
  3. Build — now switch to spec-driven development. The AI reads the specs and builds with full context, conventions, and guardrails.
  4. Repeat — the next exploration starts from a higher baseline because the specs grew.

Each cycle raises the floor. Vibe coding gives you breadth; spec engineering gives you depth. Together, they compound.

A Practical Example.

I wanted to build a Grant Intelligence vertical, an agent that searches federal grants using hybrid vector + SQL search.

The vibe phase: I explored freely. “What does the grants.gov API look like?” “Try building a search endpoint.” “What if we use pgvector for semantic matching?” I went through three different approaches in two hours. Most of the code was throwaway, but I learned what worked.

The capture phase: I wrote down what I learned; the agent structure that worked, the import gotchas, the API quirks, the testing patterns. These went into AGENTS.md, LESSONS.md, and BACKLOGS.md.

The build phase: Now I told the AI: “Build the Grant Intelligence vertical.” One sentence. The specs handled the rest. The agent wrote the code, the tests, the pipeline, and the wiring all following the conventions from the spec files. I debugged four bugs (all documented in LESSONS.md now), and by the end of the afternoon, I had a working grant search engine over 250 real federal grants.

Without the vibe phase, I wouldn’t have known what to build. Without the spec phase, I’d have spent the afternoon re-explaining constraints instead of building.

The Compounding Effect.

Here’s the thing that surprised me: this dual approach gets dramatically better over time.

Every session, LESSONS.md grows. Every bug becomes a future bug prevented. Every architectural decision becomes a pattern the agent follows without being told.

After a few weeks, the agent effectively has institutional memory. It knows the codebase like a team member who’s been on the project since day one. Except it never takes a day off, never forgets, and reads the docs every single time.

And you’re still free to vibe. The specs don’t constrain exploration; they give it a foundation. You can go off-script, try wild ideas, let the AI riff. but the baseline keeps rising. Your worst session next month is better than your best session today, because the specs kept compounding while you were vibing.

Vibes enable free flowing exploration, while Specs introduce guardrails and scaffolding that help compound.

The Honest Part.

I wouldn’t say this has been all upside without its share of friction.

Spec engineering has a proliferation problem. You start with three markdown files. Then you add a .github/copilot-instructions.md because your agent reads that too and it enables automation ingestion of the instructions and guardrails. Then a project-memory.md for current-state snapshots. Then vertical-specific docs. Then skill files. Before you know it, you’ve got a dozen markdown files, and you’re spending real time maintaining the docs that maintain the AI.

There’s an irony there. You started spec-engineering to avoid repeating yourself, and now you’re wondering if the specs themselves need a spec.

And then there’s the control question. Do these files actually give you as much control as you think? I’ve watched agents read a perfectly clear AGENTS.md rule: “no raw pip commands”, and then casually drop a pip install into the terminal anyway. The spec said it. The agent read it. The agent ignored it. Why? Maybe the context window pushed it out. Maybe the instruction conflicted with something in the system prompt. Maybe the model just… vibed past it.

This is the thing nobody talks about yet: spec files are influence, not enforcement. They’re closer to a strong team norm than a compiler check. Most of the time the agent follows them. Sometimes it doesn’t. And when it doesn’t, you’re back to prompting “No, read AGENTS.md, the rule is right there.”

I don’t have a clean answer for this. What I’ve found is:

  • Keep specs lean. If a file gets too long, the agent skims just like a human would. Shorter, sharper rules get followed more reliably than sprawling docs.
  • Consolidate aggressively. Resist the urge to create a new markdown file for every concern. Three well-maintained files beat ten neglected ones.
  • Accept the tradeoff. Specs reduce repeated corrections by maybe 80%. They don’t eliminate them. The remaining 20% is still better than starting from zero every time.

This whole space is evolving fast. The tools are getting better at reading and respecting context. But right now, spec engineering is a practice that requires upkeep and honesty about where it falls short.

How to Start.

You don’t need to choose one over the other. Start vibing and start capturing what you learn.

Phase 1: Vibe freely. Use AI coding agents conversationally. Prototype fast. Don’t worry about structure.

Phase 2: Create your first spec file. After your first session, create an AGENTS.md:

  • Repo structure with one-line descriptions
  • Build and run commands
  • Non-negotiable rules (package manager, security, naming conventions)
  • Common task checklists (“Adding a new feature”, “Adding a test”)

Phase 3: Start a LESSONS.md. After your first bug:

  • What went wrong and how you fixed it
  • Tricky imports, config quirks, env var gotchas
  • Decisions you made and why

Phase 4: Add a BACKLOGS.md when you have multiple things to build:

  • User stories with acceptance criteria
  • Status tracking (Not Started → In Progress → Done)
  • Priority and size estimates

Put them in docs/ or at the repo root. Point your AI agent config at them (most agents auto-read .github/copilot-instructions.md, with pointers to crucial md files).

That’s it. Keep vibing. Keep capturing. Three markdown files and the discipline to update them after each session.

The Meta Point.

Vibe coding showed that natural language is a programming interface. Spec engineering suggests that persistent, structured natural language might be a more durable one. Neither is the whole picture on its own.

I’m still figuring this out. The tools change every few months, the best practices from January feel dated by March, and the honest answer is that nobody has this fully solved yet. But the pattern I keep coming back to — explore loosely, capture what works, build on it next time — has held up better than anything else I’ve tried.

Maybe that’s the real takeaway: not a methodology, but a rhythm. Vibe when you need to discover. Spec when you need to remember. Stay honest about where both fall short.

This approach was developed while building ATLAS, a multi-agent AI platform using LangGraph, Chainlit, Azure Cosmos DB and Azure AI Foundry 

Leave a comment

Chinny Chukwudozie, Cloud Solutions.

Passion for all things Cloud Technology.