QA
/flow-next:qa runs a live-app, real-user QA pass: it drives the running app like an unforgiving customer, derives its test scenarios directly from the spec, files structured findings with evidence, and ends with a YES/NO ship verdict.
It is the one review surface that isn’t static. impl-review, spec-completion-review, quality-auditor, and code-review all read code or specs; /flow-next:qa exercises the deployed app. It is forbidden from marking PASS by reading source — a scenario passes only on captured evidence (screenshot / console / URL), never on inspection.
The spec-as-intent advantage
Section titled “The spec-as-intent advantage”Spec-less QA tools have to reconstruct what the app is supposed to do from READMEs and landing pages. Flow-Next already has the spec, so QA derives scenarios straight from it:
- Acceptance criteria → test scenarios.
- R-IDs → a coverage table — the same traceability
make-prrenders. - Boundaries → what NOT to test — suppresses false bugs.
- Decision context → expected behavior.
The result is bidirectional traceability: spec AC ↔ scenario ↔ finding ↔ R-ID.
How it drives the app
Section titled “How it drives the app”QA never re-implements driving — it consumes Flow-Next Drive’s surface-aware driver ladder (agent-browser → chrome-devtools-mcp → Playwright → cursor-ide-browser → manual, with Computer Use for native surfaces). Whatever rung Flow-Next Drive resolves for the surface is what QA inherits.
Findings
Section titled “Findings”Failures are filed immediately as structured P0 / P1 / P2 reports — persona, steps to reproduce, expected vs actual, and evidence (console, screenshots, full URL). Findings feed the bug memory track (track: bug) with overlap dedup, and can be promoted to flow specs or tasks for the fix.
The verdict
Section titled “The verdict”The pass ends with a verdict carried as a proof-of-work receipt (type: qa_verdict), with four outcomes:
qa_outcome | Meaning | Ship? |
|---|---|---|
SHIP | All scenarios pass, zero open P0/P1, R-ID coverage complete | Yes |
NEEDS_WORK | Any open P0/P1, or incomplete coverage | No |
BLOCKED | No live deploy or no driver — could not verify | No |
NA | Spec has no driveable user-visible AC | n/a |
The receipt’s verdict field projects onto the existing review-receipt enum (BLOCKED → NEEDS_WORK, NA → SHIP), so the verdict can feed Spec Completion Review — “does the live app satisfy the AC, not just the code?”
Lifecycle position
Section titled “Lifecycle position”QA runs after /flow-next:work, around or before make-pr. It runs interactively and autonomously (when a target URL + test accounts are configured), and is not a hard Ralph-block. When the tracker bridge is configured, the verdict can post to the linked issue (opt-in via tracker.perEvent.qa).
Requirements
Section titled “Requirements”QA needs a live deploy + a driver (Flow-Next Drive). With neither, it surfaces a BLOCKED verdict rather than failing; a spec with no driveable UI yields a clean NA. Opt-in — it adds nothing to the base flow when unused.
Credit
Section titled “Credit”The QA discipline — the P0/P1/P2 taxonomy, evidence rules, and session-hygiene practices — is a lean borrow from Ray Fernando’s running-bug-review-board (Apache-2.0). Thank you, Ray.
Next step
Section titled “Next step”/flow-next:make-pr <spec-id>