Skip to content

QA

/flow-next:qa runs a live-app, real-user QA pass: it drives the running app like an unforgiving customer, derives its test scenarios directly from the spec, files structured findings with evidence, and ends with a YES/NO ship verdict.

It is the one review surface that isn’t static. impl-review, spec-completion-review, quality-auditor, and code-review all read code or specs; /flow-next:qa exercises the deployed app. It is forbidden from marking PASS by reading source — a scenario passes only on captured evidence (screenshot / console / URL), never on inspection.

Spec-less QA tools have to reconstruct what the app is supposed to do from READMEs and landing pages. Flow-Next already has the spec, so QA derives scenarios straight from it:

  • Acceptance criteria → test scenarios.
  • R-IDs → a coverage table — the same traceability make-pr renders.
  • Boundaries → what NOT to test — suppresses false bugs.
  • Decision context → expected behavior.

The result is bidirectional traceability: spec AC ↔ scenario ↔ finding ↔ R-ID.

QA never re-implements driving — it consumes Flow-Next Drive’s surface-aware driver ladder (agent-browser → chrome-devtools-mcp → Playwright → cursor-ide-browser → manual, with Computer Use for native surfaces). Whatever rung Flow-Next Drive resolves for the surface is what QA inherits.

Failures are filed immediately as structured P0 / P1 / P2 reports — persona, steps to reproduce, expected vs actual, and evidence (console, screenshots, full URL). Findings feed the bug memory track (track: bug) with overlap dedup, and can be promoted to flow specs or tasks for the fix.

The pass ends with a verdict carried as a proof-of-work receipt (type: qa_verdict), with four outcomes:

qa_outcomeMeaningShip?
SHIPAll scenarios pass, zero open P0/P1, R-ID coverage completeYes
NEEDS_WORKAny open P0/P1, or incomplete coverageNo
BLOCKEDNo live deploy or no driver — could not verifyNo
NASpec has no driveable user-visible ACn/a

The receipt’s verdict field projects onto the existing review-receipt enum (BLOCKED → NEEDS_WORK, NA → SHIP), so the verdict can feed Spec Completion Review — “does the live app satisfy the AC, not just the code?”

QA runs after /flow-next:work, around or before make-pr. It runs interactively and autonomously (when a target URL + test accounts are configured), and is not a hard Ralph-block. When the tracker bridge is configured, the verdict can post to the linked issue (opt-in via tracker.perEvent.qa).

QA needs a live deploy + a driver (Flow-Next Drive). With neither, it surfaces a BLOCKED verdict rather than failing; a spec with no driveable UI yields a clean NA. Opt-in — it adds nothing to the base flow when unused.

The QA discipline — the P0/P1/P2 taxonomy, evidence rules, and session-hygiene practices — is a lean borrow from Ray Fernando’s running-bug-review-board (Apache-2.0). Thank you, Ray.

Terminal window
/flow-next:make-pr <spec-id>