Building software has a hundred small ways to go wrong. A free-form chat with an AI hides them until it's too late. This workflow surfaces them early, keeps a human in charge, and makes every step observable. You don't need to write code to drive it.
Seven phases. You participate directly in three of them — Planning, Human-testing, and (only if there's feedback) PR-responding. The rest run on their own.
AI-driven Human-driven
You describe what you want in plain English. The AI explores the codebase, asks clarifying questions, then writes a plain-language plan — goal, steps, what's out of scope, and how you'll know it's done. You edit or approve. Nothing is built until you say go.
One continuous stretch of work with four silent sub-steps. You'll see the phase change; you rarely need to intervene.
a. Coding — the only step that edits files. Every fix is traceable to a plan step or a review finding.
b. Auto-testing — compiles, applies any database changes, and runs the existing test suite. Work is saved only once it's green.
c. Self-reviewing — the AI re-reads its own change with fresh eyes against the plan, looking for ordinary bugs.
d. Agent-reviewing — two independent reviewers second-guess the work (below).
The AI hands you something to try and asks you to just use it — click around the way you actually would. If it works, you confirm the acceptance checklist. If it doesn't, you describe the problem in plain words and it routes straight back to coding.
The AI pushes the work, opens a pull request linked to the original issue, and hands off to a human reviewer. It cleans up and returns you to a clean starting state. The AI is never allowed to merge — that's deliberate.
A human reviewer reads the change and either merges it or requests changes. This is the gate that keeps the whole thing trustworthy: no AI-authored change reaches your codebase without a person signing off.
If the reviewer leaves comments, the AI replies to each one, makes the fixes, re-runs the checks, and pushes an update — one round at a time. If the reviewer simply approved, it closes the feature out cleanly.
Once merged, the change is promoted to production on whatever cadence suits your team. Depending on the project, this is automated on merge or run on a schedule.
The most valuable step happens before you ever see the work. The AI spawns two independent reviewers with deliberately partial information, so neither can rubber-stamp the other:
Reads the change with no knowledge of the plan, and describes what it actually does — plus any bugs it spots. It can't be biased by what the work was supposed to do.
Compares Reviewer 1's description against the original plan and flags mismatches, missing pieces, or extras. It never sees the code — only intent versus reality.
The AI then weighs both findings: critical issues go straight back to coding, trivial ones pass, and genuinely-judgment-call issues get surfaced to you in plain language. Bugs that were already in the code before this change are reported separately — never silently swept into the work.
Every step the AI takes is logged as it goes, so you (and the AI itself) can look back over exactly what it did and why — catch a wrong turn early, and adjust the instructions that guide it. The workflow tunes to how your team works over time, instead of repeating the same mistakes.
You can just describe what you want — but short commands make it quicker. You're never more than one word away from redirecting, pausing, or bailing out.
/plan | Start planning a new feature. (Or just describe what you want.) |
/replan | Rewrite the plan for the current feature — keep the work so far, or scrap it. |
/pause | Park the current feature safely so you can switch to something else. |
/continue | Resume a paused feature — pick from the list or jump straight to one. |
/pr-respond | Handle whatever the reviewer did — address comments or close out on approval. |
/abandon | Throw out the current feature entirely. Asks for confirmation first. |
The same "AI does the work, a human stays in charge" principle powers a few other flows, each tuned to a different kind of job. They share the phase markers, the guardrails, and the human-approval gate.
A lighter sister to the feature flow, for adding or updating end-to-end (browser) tests. The AI drafts the test from a plain-English description of the behavior, runs it to prove it actually passes, and opens a PR — so your safety net grows alongside the app instead of lagging behind it.
When work outgrows the main codebase — a customer portal, a mobile scanner, an integration — the AI spins up a fresh project from a hardened template: modern stack, CI/CD from day one, and a clean contract for talking back to your core system over APIs.
Every flow keeps you oriented with a visible phase marker on each step, and pings you the moment it needs a decision — so long stretches of silent work never leave you wondering what's happening.
I set the whole workflow up on your system, tune it to your stack, and get your team shipping reviewed features — without needing to write code.