Review Workflow
Flow-Next uses review gates before and after implementation.
Cross-model review is not a Flow-Next invention; people have paired one model against another as a reviewer for a while. Wiring it as an autonomous adversarial loop, where a different model challenges every plan and implementation automatically, at each handover and inside the autonomous modes (with or without Ralph), is something Flow-Next was one of the first to ship.
flowchart LR Spec["Spec"] --> PlanReview["Plan review"] PlanReview --> Work["Work"] Work --> ImplReview["Implementation review"] ImplReview -->|needs work| Work ImplReview -->|ship| Completion["Completion review"] Completion --> PR["make-pr"] Completion -.optional.-> QA["Live-app QA"] QA -.-> PR
The static gates above (plan / impl / completion review) read code and specs. The optional live-app QA stage drives the running app — slot it in after work, around or before make-pr.
Review backends
Section titled “Review backends”Every gate runs through a configurable review backend — a different model than the one that wrote the artefact. Pick the one your team already runs; the verdict grammar, receipts, fix loop, and optional --deep / --validate passes are identical across all of them.
| Backend | Driver | Reviewer models | Shape |
|---|---|---|---|
RepoPrompt (rp) | RepoPrompt app + rp-cli | chosen in the RepoPrompt window / session | macOS GUI; its Builder auto-discovers the surrounding context the diff alone would miss |
OpenAI Codex (codex) | codex CLI | GPT-5.x family | headless, cross-platform |
GitHub Copilot (copilot) | copilot CLI | Claude 4.x + GPT-5.x families | headless, cross-platform |
Cursor (cursor) | cursor-agent CLI | gpt-5.5-high (default), gpt-5.3-codex family, composer-2.5, Opus 4.8 thinking | headless; reviews billed against your existing Cursor subscription |
Cursor (cursor-agent) runs the same headless contract and verdict grammar as the others, with reviews billed against your Cursor subscription instead of a separate API key. It is resume-only (the first review persists Cursor’s session_id; re-reviews resume it) and folds reasoning effort into the model name (Cursor convention), so a spec is cursor:<model> with no :effort rung.
Set it once with /flow-next:setup, or override per run:
# persist the default (.flow/config.json)flowctl config set review.backend codex
# override for a single run/flow-next:impl-review fn-1 --review=rp|codex|copilot|cursor|none
# full spec form — backend:model:effortFLOW_REVIEW_BACKEND=codex:gpt-5.5:highnone is an explicit opt-out (skip review). The :model:effort suffix is optional and backend-specific — RepoPrompt picks its model in-app, so it takes no suffix; Codex and Copilot accept a :model:effort suffix (e.g. copilot:claude-opus-4.5:high); Cursor takes a model only (e.g. cursor:gpt-5.5-high) since effort is baked into the model name. The chosen backend is recorded as the mode field on every review receipt.
A per-task review: (or per-spec default_review) override routes end-to-end — it wins over the project default and env/config, so a task set to review: cursor:... under a codex project default actually reviews with cursor. Implementation reviews also carry an always-on code-smell baseline (Fowler Refactoring — Feature Envy, Data Clumps, Primitive Obsession, …) across every backend.
Plan review
Section titled “Plan review”/flow-next:plan-review fn-1Checks whether the spec and plan are complete enough before work begins.
Use it when the work is high risk, cross-module, product-facing, or likely to be delegated. A plan review should catch missing requirements and bad decomposition while the fix is still cheap.
Implementation review
Section titled “Implementation review”/flow-next:impl-review fn-1Runs a second model over the diff. Only introduced findings count toward blocking verdicts.
Use a different model or backend than the implementation model when possible. The point is adversarial pressure, not another pass from the same context. The workflow is a loop: review finds introduced issues, /flow-next:work fixes them, review runs again, and the handoff continues only once the verdict is shippable.
Completion review
Section titled “Completion review”/flow-next:spec-completion-review fn-1Checks the combined implementation against the whole spec after all tasks are done.
This is different from implementation review. Implementation review checks a diff. Completion review checks whether the full spec is satisfied after all tasks, merges, and fix loops.
Live-app QA (optional)
Section titled “Live-app QA (optional)”/flow-next:qa fn-1Every gate above is static — it reads code or specs. QA is the live-app gate: it drives the running app like a real user, derives scenarios straight from the spec (AC, R-IDs, boundaries), files P0/P1/P2 findings with evidence, and emits a YES/NO qa_verdict receipt that can feed completion review.
Opt-in — it needs a live deploy + a driver (Flow-Next Drive); with neither it surfaces a BLOCKED verdict rather than failing, and adds nothing to the base flow when unused. It is forbidden from marking PASS by reading source.
PR review
Section titled “PR review”/flow-next:make-pr fn-1/flow-next:resolve-pr 123The PR body summarizes acceptance coverage, critical files, decisions, memory, deferred findings, and review focus.
With the opt-in HTML artifact mode (2.0.0+), make-pr also emits a PR render lens — a self-contained, diff-derived HTML review instrument with a churn map grouped by review intent, an R-ID → evidence table verified against the spec export, and a where-to-look checklist. Read-only by design: PR feedback stays in review threads.
Review escalation
Section titled “Review escalation”| Signal | Response |
|---|---|
| Plan review finds unclear product behavior | Rerun /flow-next:interview --scope=business |
| Plan review finds technical gaps | Rerun /flow-next:interview --scope=technical |
| Impl review finds introduced bug | Rerun /flow-next:work on affected task |
| Impl review flags architectural mismatch | Revisit spec decision context |
| Completion review finds uncovered acceptance criteria | Add or repair task coverage |
Live-app QA files a P0/P1 (or a BLOCKED/NA verdict) | File the finding to the bug track, add a fix task, or supply the missing deploy/driver |
| Human reviewer is confused | Improve task summaries or regenerate PR body |
Review is part of the workflow, not an afterthought at the end.