Changelog
Human-readable highlights — one line per release, expand for detail. The repository changelog is canonical.
Unreleased
Section titled “Unreleased”Latest
Section titled “Latest”2.8.0 — Orchestration in your repo: usage.md steering recipes + setup routing scaffold
Section titled “2.8.0 — Orchestration in your repo: usage.md steering recipes + setup routing scaffold”The orchestration story now ships into the repo you work in: every installed .flow/usage.md carries dogfooded headless-bridge recipes (codex exec, cursor-agent, and the reverse claude -p — every direction works), and /flow-next:setup optionally scaffolds an opinionated model-routing table into CLAUDE.md/AGENTS.md.
Detail
Closes the use-time discoverability gap from 2.7.2’s orchestration doc: agents read .flow/usage.md and instruction files, not the plugin’s doc tree.
- usage.md
## Orchestration & model steering(unconditional, ships in every project):codex execrecipes with the real gotchas baked in (read-only default sandbox,-ooutput capture, the</dev/nullstdin-hang guard),cursor-agent(-p,--forceto apply, volatile model IDs via--list-models), theclaude -preverse bridge (prompt before the variadic--allowedTools), flow-next shortcuts (delegate:codex,review.backend, per-taskreview:), and prompted-orchestration examples. Every recipe was verified against the live CLIs before shipping. - Optional setup ceremony (setup): scaffolds the cost/intelligence/taste scores table + routing rules + flow-next wiring into your instruction file — probe-annotated for the CLIs actually installed (
<!-- probe:codex/cursor -->sentinel lines, deterministic composition), shown in full before writing, marker-fenced for idempotent re-runs (probe drift counts as drift), platform-correct invocation syntax per target file. Delegation opt-in setswork.delegatebut never pre-sets the consent gate. /flow-next:uninstallremoves the scaffold via a deterministic damaged-marker algorithm; 23 new tests + smoke prose contracts pin the template shape, four-state probe composition, and removal.- Codex installs: the mirror’s usage.md now renders commands as
$flow-next-<cmd>(generator rewrite + regression guard).
Defaults stay pre-tuned and unchanged — steering remains a capability, not a prerequisite; headless/Ralph setups skip the new question silently.
2.7.2 — Orchestration & model routing doc
Section titled “2.7.2 — Orchestration & model routing doc”Given the trend toward frontier-model orchestration — Fable 5 conducting while implementation, reviews, and bulk reads route to cheaper/faster models — a new Orchestration & Model Routing page maps every routing dial flow-next already ships. Two composable methodologies: deterministic parameters (review-backend grammar + precedence, delegate:codex offload, subagent tiers) and prompted orchestration — the host’s own intelligence routing per item by complexity, escalating conditionally, even prompting capabilities into existence that no parameter encodes. Plus a copy-paste CLAUDE.md model-routing table and the pilot+land loop-chaining recipe. The frame throughout: the defaults are pre-tuned to work well out of the box — steering is a capability, not a prerequisite.
2.7.1 — Codex hooks.json parse fix
Section titled “2.7.1 — Codex hooks.json parse fix”The installed Codex ~/.codex/hooks.json carried a top-level description key that Codex’s hooks parser (stable since 0.142.x) rejects — a warning on every invocation and the Ralph guard hooks silently disabled; the generated mirror no longer emits the key. Reproduced and verified clean against Codex CLI 0.142.5. Codex users: re-run scripts/install-codex.sh to replace the broken file. Thanks to @TechupBusiness for the report and root-cause analysis (#198).
2.7.0 — Fleet-wide capability & efficiency review
Section titled “2.7.0 — Fleet-wide capability & efficiency review”Two adversarial-review passes over the whole skill/agent fleet — one new feature (make-pr --update), broad correctness and autonomy-safety fixes, progressive-disclosure efficiency, and seven A/B-verified opus→sonnet model downgrades — with every judgment call fable-reviewed before it shipped.
Detail
The engine was adversarial review, not prose-squeezing: six-reviewer fable audits found the gaps, a fable judge verified each fix, and every model-tier change was proven head-to-head. Across ~46 improvements in 25 skills and agents:
- New capability —
make-pr --updaterefreshes a stale PR body after review/land fix rounds;/primegained a real evidence + scoring contract (a failed scout no longer silently drops or fabricates a pillar, verification is mandatory before a “runnable” pass, inapplicable criteria no longer deflate the score) plus create-or-augment CLAUDE.md/AGENTS.md handling; completion-review scope-creep detection, plan real-anchor derivation, prospect’s genuinely-isolated critique, qa evidence enforcement, and/depssurfacing deadlocks it used to hide. - Autonomy safety —
landstops auto-merging a PR whose QA verdict is NEEDS_WORK;pilot’s strike limit survives a tracker re-projection that had it re-dispatching a failing spec forever;quality-auditorfails loudly instead of reporting a false-clean audit over an empty diff. - Model tiers —
plan-sync,flow-gap-analyst, and the five retrieval scouts movedopus→sonnet, each A/B-verified;opusis now used by a single agent. Cheaper per call, quality held. - Efficiency — progressive-disclosure splits (
interview,make-pr,impl-review,capture) and common-path short-circuits (tracker-sync,audit) take ~5k–17k tokens off the paths that run most. - Correctness spine — a review-diff
base...HEADfix (13 sites) that had a fast-moving base branch showing its own commits as false reversions, and a signal refresh so/primerecognizes 2025-era stacks (uv, bun,compose.yaml, mise, monorepo layouts, GitLab, goreleaser, Biome).
No breaking changes. Verified by a 1425-test suite green across Linux/macOS/Windows.
2.6.3 — Single-call worker anchor + plan-sync gate shelved
Section titled “2.6.3 — Single-call worker anchor + plan-sync gate shelved”The /flow-next:work worker now re-anchors in one flowctl anchor call instead of ~8 separate reads (proven zero information loss), plus a CROSS_SPEC caller bug-fix — while the other half of the work, a deterministic plan-sync skip-gate, was proven non-viable by cross-repo eval and deliberately shelved rather than shipped.
Detail
flowctl anchor <task-id> assembles the worker’s Phase-1 re-anchor from the verbatim stdout of the same production commands it already runs — byte-for-byte superset test plus a comprehension-equivalence eval (bundle 7/7 = status-quo on frozen real tasks) prove no information is lost. The plan-sync skip-gate (a deterministic probe to skip the post-task drift check) was built, eval’d against the real plan-sync agent across three external repos, and killed by its own evidence: a genuine false skip from semantic drift no path/token probe can see, plus a 6.7% skip-rate against a ≥50% bar. It is shelved with a decision record, not shipped — plan-sync still runs after every task, and the gate machinery was removed from the CLI. Also fixed: a Windows encoding bug in the anchor render, surfaced by wiring the anchor guardrail test into CI.
2.6.2 — ready honors spec-level deps
Section titled “2.6.2 — ready honors spec-level deps”flowctl ready --spec now honors spec-level dependencies — a spec blocked by unfinished depends_on_epics no longer reports its tasks as ready. It returns empty lists plus blocked_by_specs (legacy alias epic_blocked_by), matching the gate next and ready --all already applied. Latent in the default workflow; hit by external consumers calling ready per-spec. Thanks to Mike Bannister (#95).
2.6.1 — Codex hooks config fix
Section titled “2.6.1 — Codex hooks config fix”Setup and install-codex.sh could leave a Codex config.toml with a duplicate hooks key (invalid TOML — Codex silently stops loading hooks) or the deprecated codex_hooks spelling (a warning on every run); both paths now converge through one idempotent, dedup-safe normalizer that guarantees exactly one hooks = true under [features]. Regression-tested (10 cases including the both-keys scenario); everything outside [features] is byte-preserved and re-running is a no-op.
2.6.0 — Skill efficiency: single-emission writes + prompt diet
Section titled “2.6.0 — Skill efficiency: single-emission writes + prompt diet”Two paired specs cut the token cost of every skill run with zero quality loss — fn-81 eliminates runtime re-emission (spec bodies, review prompts, and responses materialized once instead of two-or-three times, plus 13 redundant CLI round-trips removed across 12 skills), and fn-82 trims the always-loaded prompt weight the hot-path skills carry on every invocation (−10.7k tokens across 11 skills). Skill-markdown only — no flowctl behavior change, no new commands, read-backs stay mandatory and user-authoritative.
Detail
Follows a fleet survey of all 28 skills (2026-07-02) and lands behind a full behavioral regression pass (gate matrix, two eval-suite re-runs at full score, smoke 138/138, pytest 1393 passed).
- Runtime plumbing (fn-81). The drafted spec body is materialized exactly once via the Write tool (the Write render is the user-visible read-back), revised via Edit deltas, and consumed by
spec set-plan --file <path>— no Phase-5 heredoc re-authoring (capture, interview). RP review prompts are built by deterministic file composition (rp prompt-get > file, quoted-heredoc criteria,flowctl show >> file) — every[PASTE …]content-retype placeholder is gone and untrusted reviewer/spec content never transits a shell var (the injection surface is closed). RP review responses enter context exactly once (redirect → single Read). Round-trips removed: singleLEAF=config read per tracker gate (7 sites), plan drops a post-writeshow+cat, deps runs one per-spec loop instead of two, make-pr’s §4.6b livegh pr viewfires only on the local-assertion miss, tracker-sync reconcile passes the on-disk spec toset-merge-base --flow-file. Guards hardened: the fix-loop cap (MAX_REVIEW_ITERATIONS, default 3) now bounds all review backends, and both RP fix loops replacegit add -Awith snapshot-scoped staging (pre-existing dirty paths are never swept in). - Prompt diet (fn-82). Default-OFF machinery moved behind a forcing-sentinel gate into
references/*.md— zero tokens until Read (Anthropic Agent Skills 3-level loading): work’s tracker touchpoints and pilot’s QA-stage freshness probe. Each gate emits an imperative the agent must act on (GATE ACTIVE — STOP. Read <ref> …), fails open on probe/parse error, and no-ops silently on the default path; the safety nets (work’s Phase-5sync check+ four-state summary, pilot’s QA routing) stay inline. Duplicated explanatory blocks collapse to one authoritative site (the review pair now resolves the backend once — killing a doublereview-backendround-trip); build-timefn-Nprovenance andflowctl.pyline-refs are stripped from always-loaded prose; make-pr folds its per-phase Done-when checklists inline (body eval held 5/5, −4.5k tok/run) and capture single-sources its biz-routing table at the consumer (suite held 15/15).
2.5.4 — Section-write hardening + rp-gate completion
Section titled “2.5.4 — Section-write hardening + rp-gate completion”flowctl task-section writes are now normalization-hardened (the H2-layering bug caught in fn-78’s own autonomous dogfood is fixed, with self-heal for already-damaged files), and the fn-78 RepoPrompt eligibility gate now covers all four review skills — impl-review and spec-completion-review stop steering toward rp on hosts where it can’t run.
Detail
- Task-section normalization (fn-79). Agents routinely pass section content that starts with its own
## Acceptance Criteria …H2;task create --acceptance-fileembedded it as a rogue sibling section and every laterset-acceptancelayered a new block above the old one. All task-section write sites now normalize through one helper: a leading H2 is stripped only when it matches the section’s known-title-variant grammar (## Acceptance Testsis content — demoted, never stripped), remaining H2s demote to H3 outside code fences, writes are byte-idempotent, and an on-write self-heal folds contiguous rogue sections from already-damaged files (a byte-exact duplicate## Acceptancestill raises). Fence-awareness extended end-to-end via one shared tracker (patch_task_section,get_task_section, heading validation,set-specscaffold check). - RP_ELIGIBLE gate completed (fn-80). The 2.5.3 gate covered
plan/plan-review; nowimpl-reviewandspec-completion-reviewcompute the same guard locally in every gated file and, when ineligible (non-macOS, norp-cli), steer only tocodex/copilot/cursor(+none). Explicit--review=rp/ env / config / per-task overrides still resolve; eligible hosts render byte-for-byte as before.
2.5.3 — RepoPrompt proposal gate + review-call hardening
Section titled “2.5.3 — RepoPrompt proposal gate + review-call hardening”/flow-next:plan and /flow-next:plan-review no longer offer the RepoPrompt path on hosts where it can’t run (non-macOS with no rp-cli on PATH) — explicit --review=rp / config still resolves as before — and the review skills now pin an explicit Foreground rule so agents never background a review CLI call and idle on a finished verdict.
Detail
- RepoPrompt eligibility gate (fn-78). RepoPrompt is a macOS-only GUI app, yet both skills proactively dangled the rp option in their interactive setup on every host — on Linux/Windows without
rp-cli, picking it was a guaranteed runtime failure. Both now compute one POSIX guard —RP_ELIGIBLE ⟺ uname == "Darwin" OR rp-cli on PATH— and, when ineligible, drop every RepoPrompt proposal (plan’s research question defaults silently torepo-scout; plan-review steers only to the runnablecodex/copilot/cursor+none). Suppression is not a ban: explicit--research=rp/--review=rp/FLOW_REVIEW_BACKEND=rp/review.backend=rpstill resolve; eligible hosts render byte-for-byte as before. - Foreground rule for review CLI calls. Found in fn-78’s own autonomous dogfood: a worker subagent backgrounded its cursor impl-review and idled on the already-finished verdict (background completion doesn’t reliably resume a subagent). The backend CLI was flawless — 8/8 verdicts — so
impl-review/plan-review/spec-completion-reviewand theworkeragent now pin the calling discipline: one blocking foreground call, generous timeout, never background + monitor.
2.5.2 — Scout models tiered by task
Section titled “2.5.2 — Scout models tiered by task”The scout subagents move off a frozen claude-sonnet-4-6 pin to family aliases matched to each task: the 8 pure config-scanners drop to fast, cheap haiku (Haiku 4.5 — which out-scores the gpt-5.4-mini the Codex mirror already runs them on), while only the 3 judgment scouts (spec-scout, claude-md-scout, docs-gap-scout) stay on sonnet; heavy agents keep opus, worker/pr-comment-resolver inherit. No version pins, cheaper + faster scouts, and Claude finally matches the FAST/INTELLIGENT tiering the Codex mirror already encoded.
2.5.1 — Windows python3 Store-stub fix
Section titled “2.5.1 — Windows python3 Store-stub fix”flowctl now just works on Windows when python3 resolves to the Microsoft Store App Execution Alias stub — a 0-byte reparse point that’s on PATH but exits 9009 — by probing interpreter functionality (<cand> -c "import sys") instead of presence, across every invocation context (Git Bash / WSL, cmd.exe / PowerShell, Claude Desktop, native Codex / Cursor), with a companion flowctl.cmd launcher and no mac/linux regression.
Detail
- Probe over presence. A shared resolver (
scripts/lib/pick-python.sh) and the self-contained launchers probe interpreter functionality in order$PYTHON_BIN→py -3→python3→python; the 9009 stub is skipped even though it’s onPATH, while a machine with a workingpython3(and nopylauncher) still pickspython3first. The old launchers hardcodedexec python3and the priorpick_pythonhelper testedcommand -v(presence, not function) — both selected the broken stub. - Dual launcher. A
flowctl.cmdbatch shim ships alongside the extensionless bashflowctl, running the same probe under cmd.exe / PowerShell where the bash shebang is never honored (py -3preferred). CRLF/LF pinned so Git Bash doesn’t regress. initself-heal.flowctl initre-stamps both.flow/bin/flowctland.flow/bin/flowctl.cmd, so an existing (pre-fix) install refreshes on the nextinit— no full re-setup. A broken bash launcher is reached via the new.cmd, a plugin auto-update, orpy -3 .flow/bin/flowctl.py init.- Swept everywhere + covered. Ralph hooks,
watch-filter.py, and the qa/prospect agent heredocs all resolve a working interpreter (Ralph mode requires Git Bash on Windows). A fake-9009-stub regression harness plus a realwindows-latestCI job (proper.exe/.cmdstub) exercise both launchers against the stub;docs/troubleshooting.md+docs/platforms.mddocument the fix, the probe order, and both recovery paths (re-stamp viainit, or disable the App Execution Aliases).
2.5.0 — Cursor backend + sharper reviews
Section titled “2.5.0 — Cursor backend + sharper reviews”A fourth cross-model review backend — cursor (Cursor-billed cursor-agent CLI) — joins rp / codex / copilot; all agentic backends now read changed files from disk instead of embedding them (smaller, cheaper prompts); the review rubric itself gets eval-validated tuning — an always-on code-smell baseline lifts impl detection 7 → 10/10 at ~27% fewer prompt tokens, and plan reviews gain a spec-quality checklist (8.0 → 9.7); and per-task / per-spec review: overrides now route correctly instead of silently falling back to the project default.
Detail
Cursor review backend. A parity port of the copilot backend — no new review features, same Carmack-level criteria, same receipt schema, same verdict grammar, same --deep / --validate passes — wired through /flow-next:impl-review, /flow-next:plan-review, /flow-next:spec-completion-review, and /flow-next:setup. Select it the usual ways: flowctl config set review.backend cursor, FLOW_REVIEW_BACKEND=cursor, --review=cursor, or a per-task/spec cursor:<model>.
- Cursor-billed, no extra key. Runs
cursor-agent -p --output-format json --trust --mode askagainst the workspace (read-only Q&A — it never mutates the tree). Reaches reviewer models the others can’t in one place:gpt-5.5-high(1M ctx, the default), thegpt-5.3-codexfamily,composer-2.5, Opus 4.8 thinking. Auth is your storedcursor-agentlogin orCURSOR_API_KEY. - Resume-only sessions. The first review omits
--resumeand persists Cursor’s generatedsession_id; a re-review resumes it (only when the prior receipt’smode == "cursor"— a cross-backend receipt starts fresh). - Effort folds into the model name (Cursor convention), so a spec is
cursor:<model>with no:effortrung —cursor:gpt-5.5-high, notcursor:gpt-5.5:high. - Triage judge unchanged. The opt-in LLM triage judge (
FLOW_TRIAGE_LLM=1, default off) stayscodex|copilot; with it off cursor reviews use the deterministic trivial-diff whitelist, zero extra dependency.
Review backends read changed files from disk. The agentic backends — codex, copilot, cursor — no longer embed changed-file contents (previously up to ~500 KB) into the reviewer prompt. They read from disk the way rp’s Builder already did (codex sandbox, copilot --add-dir, cursor --mode ask), so prompts are smaller and cheaper and cursor no longer trips its argv limit on non-trivial diffs. Verified equivalent on a ground-truth planted-bug test (codex’s own audit: QUALITY=PRESERVED). The per-backend FLOW_*_EMBED_MAX_BYTES budget knobs are removed.
Sharper, leaner review prompts. The Carmack rubric gains an always-on code-smell baseline (Fowler Refactoring ch.3 — Feature Envy, Data Clumps, Primitive Obsession, Long Method, Duplicated Code, …) on impl + standalone reviews, with its rubric blocks tightened and every machine-parsed marker preserved. Applied to every backend — codex/copilot/cursor and RepoPrompt. Eval-validated on a ground-truth corpus (correctness bugs + planted smells): detection rose 7 → 10/10 (the old rubric reliably missed Feature Envy / Data Clumps / Primitive Obsession) while the prompt shrank ~27% (−950 tokens), correctness detection held at 5/5, and clean code was not over-flagged — confirmed on both codex (GPT-5.5-high) and RepoPrompt. Plan reviews additionally gain a targeted spec-quality checklist (a stated test strategy, observability for async/batch work, each task sized-for-one-iteration and correctly dependency-ordered, non-functional requirements) — eval-validated 8.0 → 9.7/10 for +74 tokens, no over-flagging of good specs.
Per-task / per-spec review-backend overrides route correctly. A task’s review: <backend>:... (or a spec’s default_review) is now honored end-to-end: flowctl review-backend resolves the per-task/epic override above env/config (canonicalizing short/tracker handles first), and every review skill + /flow-next:work’s per-task worker passes it — so a task set to review: cursor:... under a codex project default actually reviews with cursor. Every backend command also defensively coerces a foreign stored spec to its own default, so an explicit --review=<backend> / flowctl <backend> always wins over a stored cross-backend spec instead of shelling a foreign model.
Copilot CLI 1.0.65 compatibility. The default copilot model moves gpt-5.2 → gpt-5.5, and gpt-5.2 / gpt-5.2-codex are dropped from the accepted set (1.0.65 rejects them), so copilot:gpt-5.2 is now rejected. Session creation is fixed for the CLI’s resume-only --resume change — the first call now uses --session-id (marker-tracked) and re-reviews resume it.
2.4.0 — GitLab + Jira tracker adapters
Section titled “2.4.0 — GitLab + Jira tracker adapters”Tracker-sync gains GitLab and Jira — the 3rd and 4th trackers behind the same transport-blind adapter interface as Linear and GitHub — so teams on the dominant self-managed (GitLab) and enterprise (Jira) trackers can mirror Flow-Next specs to their board with zero special setup.
Detail
Two new adapters slot in behind the normalized, transport-blind interface — reconcile, body merge, status who-wins, and comment dedup are reused unchanged. The supported-tracker set is now Linear, GitHub, GitLab, Jira.
GitLab (the 3rd tracker — a large share of self-managed and EU/regulated shops). Modelled on the GitHub adapter:
- Transport ladder.
glabCLI primary → raw-REST/api/v4token fallback (GITLAB_TOKEN/CI_JOB_TOKEN) → no-op rung (receipt note, never a crash). Self-managed hosts are honored viaglab’s configured host orCI_SERVER_URL; the receipt records the rung asglaborrest. Zero special setup — it prefers theglab auth loginsession a developer already has. - Reduced-fidelity status, like GitHub. Open/closed plus a configurable board label, not a rich workflow.
- License-gated dependency projection.
depends_on_epicsedges project as nativeis_blocked_bylinks on a Premium/Ultimate namespace; a Free or personal namespace (where the API returns403 Blocked issues not available for current license) degrades to a directionlessrelates_tolink plus a provenance-fenced<!-- flow:deps -->body block for direction.
Jira (the 4th tracker — the enterprise default). REST-only by design, the most adapter-specific weight of the four:
- REST Cloud + Data Center / Server.
/rest/api/3on Cloud (ADF bodies,email:API_TOKENbasic auth) or/rest/api/2on DC/Server (wiki text, Bearer PAT) — one single token rung plus the no-op floor, the deployment shape detected once at the ceremony and persisted. Zero special setup — a standard Jira credential read from the environment, never an OAuth / Connect / Forge app. - No MCP. The official Atlassian MCP is read-mostly — it can’t transition status, update fields, or set links — so the bridge uses the REST + token path directly, headless-native with the fewest moving parts.
- Workflow-aware status. A change goes through the transitions API against a configurable
statusMap; an unmapped or unreachable transition defers with a receipt rather than forcing a lane. The fn-66 terminal invariant holds — a locally-done spec stays In Review until the PR is MERGED. - Native blocks links + ADF. Dependencies project as native directional
Blocksissue links on every tier (no degrade); the body round-trips through a markdown ↔ ADF translation that preserves unknown human-authored nodes verbatim. Backlog enumeration (listOpenIssues) runs via JQL.
The new adapter behavior is documented on Tracker Sync.
2.3.0 — Pilot backlog mode
Section titled “2.3.0 — Pilot backlog mode”/flow-next:pilot gains an opt-in backlog mode (pilot.autonomy=backlog, default off): instead of advancing one already-ready spec, pilot widens to a standing scheduler for the entire open backlog — enumerating flow specs + tracker issues, triaging the top dep-ordered item, and either advancing it one stage or surfacing a precise async question and parking it (ASKED). The consent boundary moves from before the loop to inside the loop, on block, while every safety boundary holds: it never authors a spec, never promotes, and never merges.
Detail
By default pilot’s consent boundary sits before the loop — it only picks from the already-ready queue. Backlog mode (flowctl config set pilot.autonomy backlog, or per-run --backlog / --auto) makes each tick enumerate everything open (flow specs via flowctl ready --all plus tracker issues at the promoted lane, unioned in from the tracker-sync adapter), select the top dep-ordered actionable item, triage it agentically, and advance it along the same plan → plan-review → work → [qa] → make-pr pipeline. When it can’t safely proceed it surfaces an async question into the spec’s ## Open Questions + a tracker comment and parks the item — “stuck” becomes a question a human answers async, not a stall.
- Same single-tick conductor, widened left. One
/loop//goaltarget, one verdict grammar (addsASKED <id> (<n>), keepsNO_WORK/DEFERRED_TO_LANDverbatim), one mental model — not a new skill or command, and not a prospect-style idea generator (it manages the existing backlog). - Boundaries hold. Never authors a spec (a thin/missing spec is a surfaced “run
/flow-next:captureor/flow-next:interview” gap); never sets thereadyflag (promotion is the human’s board act; un-promoted items are skipped silently); never merges (land stays human-gated). Readiness stays the human’s explicit signal, never an agent-inferred score. - Substrate. A backlog-wide eligibility scan (
flowctl ready --all→ deterministic facts only), a per-tick decision log (flowctl pilot-log→ the factory-efficiency readout), and a tracker-sync autonomy-parity fix + the async question-valve so a per-tick sync never hangs the loop. The agentic/deterministic line holds: flowctl enumerates + checks hard fields; the host agent judges and formulates the question.
Off by default — existing pilot/land/Ralph users are unaffected until they opt in.
2.2.0 — QA pipeline stage + Cua native driver
Section titled “2.2.0 — QA pipeline stage + Cua native driver”Two opt-in additions to the autonomous pipeline: /flow-next:qa becomes a config-gated (pipeline.qa, default off) pilot stage that live-tests the complete build before make-pr, and Flow-Next Drive’s native rung gains the Cua driver + sandbox for provider-agnostic, headless/CI computer-use. Both augment, never replace, existing tooling.
Optional QA pipeline stage
/flow-next:qa already did the hard part — derive scenarios from the spec, drive the live app, file P0/P1/P2 findings, emit a qa_verdict — but it lived outside the build loop. fn-72 wires it in as an opt-in pilot stage: flowctl config set pipeline.qa on inserts a qa stage at the all-tasks-done juncture, so the autonomous span becomes plan → plan-review → work → qa → make-pr. Default off — with the gate off, pilot’s stage set is byte-for-byte unchanged.
- Augments, never replaces. The app is already up on the dev’s machine during
work, so this is the cheap first live pass that catches obvious runtime breakage before a human opens the PR. Like everything in Flow-Next it reduces human work agentically and surfaces problems to humans — it does not stand in for CI/staging QA or manual QA, which still happen downstream. - Lean + agentic, evidence-aware. Net-new flowctl is a single
pipeline.qaconfig-key default — no new subcommand, engine, or persisted artifact. The host derives scenarios in-context and drives the local running app, reusing the existing executor. It readswork’s recorded evidence first and subtracts only AC proven by a deterministic re-runnable check (a real test/lint/build command), always live-running every runtime/UI/integration AC even when work narrated it done. - Surfaced, not loop-blocking. The stage is idempotent (a
head_shafreshness gate runs it at most once per branch head) and the pilot gate routes onqa_outcome, not the Ralph-guardverdictprojection:SHIP/NA/BLOCKEDadvance cleanly, andNEEDS_WORKstill advances to the draft PR — make-pr surfaces the findings in a## Live QAsection, plus the bug-memory track and a tracker comment when the bridge is active. QA never hard-blocks the loop; merge stays the human’s + land’s decision. - Principled reversal. Pilot’s “QA is never a stage” is reversed only under the gate; capture/interview/resolve-pr/merge/release stay forbidden for their distinct loop-ownership / consent reasons.
Cua native driver rung
The native rung of the surface-aware driver ladder was served only by Computer Use (Codex CU / Anthropic Claude CU) — provider-locked, macOS/Windows-only, focus-stealing, and never reachable on a headless / CI / Linux path. trycua/cua (MIT) is added as a detected, opt-in driver with two surfaces, never a hard dependency:
- Cua Driver (
cua-driver mcp) — background computer-use on the local machine over an MCP server: no focus steal, accessibility-tree-based (drives structuredelement_indexelements, not pixels), and provider-agnostic. On macOS the load-bearing TCC permission split is documented — Accessibility unlocks driving, Screen Recording unlocks screenshots — so when Screen Recording is absent the rung surfaces “AX-only evidence, no screenshot” rather than emitting an empty one. - Cua Sandbox — drives an app inside a disposable VM/container (any OS), the only native option on a headless/CI host with no display. Opt-in per run, torn down each run; local
lume/QEMU/Docker is the default backend, thecua.aicloud is explicit opt-in (bills + data-egress, never auto-selected).
Detect-and-instruct, never auto-install — the same consent rule /flow-next:map applies to clawpatch. The base install stays zero-dependency, agent-browser remains the only assumed-present driver, and flowctl never imports Cua. The default driving path (background cua-driver MCP) uses only MIT components; the optional cua-agent[omni] (ultralytics AGPL-3.0) / OmniParser (CC-BY-4.0) extras are documented and never auto-installed. A pass still completes with no Cua installed (fall to Computer Use → documented-limitation). No new skill or command — a rung, not a re-architecture; /flow-next:qa accepts cua-driver / cua-sandbox as evidence driver_rung values with no schema change.
2.1.3 — Resolve-pr keeps null-state threads in scope
Section titled “2.1.3 — Resolve-pr keeps null-state threads in scope”/flow-next:resolve-pr now treats only literal true as resolved — GitHub/GraphQL can surface a newly-created unresolved inline thread as isResolved: null (not just false), and those Codex/Bugbot findings were being silently dropped; fetch observability (counts + previews across all three feedback surfaces) is now mandatory in full mode and watch loops.
2.1.2 — Done means merged
Section titled “2.1.2 — Done means merged”Tracker-sync now reserves Linear Done for merge-confirmed PRs — an open PR maps to In Review, completion-review never completes the issue, and pilot never declares NO_WORK for an all-done spec that hasn’t shipped.
Detail
Done is a claim that the work shipped, so projecting it from local completion (all tasks done + completion-review SHIP) was a correctness bug — a spec with no PR could land on the board as Done and a human had to drag it back. The flow→tracker status map is now a function of (spec status, completion_review_status, **PR-merge-evidence**): terminal Done requires a GitHub MERGED probe result on every write path (automatic touchpoints and a manual reconcile, which can still recover Done once a merge exists). An open PR projects In Review on make-pr’s unconditional bridge-active link path; completion-review is now a verdict comment only; and land.merged — active by default when the bridge is active — is the sole Done driver. Pilot mirrors it: an all-done spec with no merged PR routes to make-pr, or reports the new DEFERRED_TO_LAND verdict when an open PR exists, instead of silently collapsing to NO_WORK. See Status lifecycle.
2.1.1 — Land sees clean-review comments
Section titled “2.1.1 — Land sees clean-review comments”Land’s silence merge signal now recognizes a review bot’s clean-pass comment (naming the reviewed commit), not just formal reviews — so a Codex-reviewed PR with no findings actually merges instead of stalling at NEEDS_HUMAN.
Detail
Codex (and bots like it) only file a formal review when they have findings; a clean pass is an issue comment — "Didn't find any major issues. Reviewed commit abc1234" — that never reaches the reviews API land reads. So a converged-clean PR could sit unmerged forever. Under silence, land now also scans PR comments: a comment from an automated reviewer matching land.cleanReviewCommentPattern and naming the current head SHA counts as a head-current review. It only ever adds evidence — never overrides a formal review, an open thread, or a red check — and the SHA must be the current head, so a land-authored fix push still forces a fresh clean comment before merge. Set land.cleanReviewCommentPattern to an empty string to disable the comment path. Found dogfooding the fn-64 land. See The merge gate, precisely.
2.1.0 — Dependency projection to the tracker
Section titled “2.1.0 — Dependency projection to the tracker”Tracker-sync now projects a spec’s depends_on_epics edges onto the board as blocked-by relations — on both Linear and GitHub — idempotently, provenance-tracked, and without ever clobbering a relation a human added by hand.
Detail
One transport-blind hook, two adapters. projectDepRelations (modelled on readiness projection) runs on push / reconcile: it resolves each depends_on_epics edge to the dependency’s linked issue and ensures a blocked-by relation exists. Linear uses native issue relations (issueRelationCreate type: blocks, MCP save_issue blockedBy on the MCP rung), deduped across relations + inverseRelations. GitHub uses native issue dependencies (GA Aug 2025) via the REST …/dependencies/blocked_by endpoints when available, else a provenance-fenced <!-- flow:deps --> body block of #N references. The skill never branches on tracker — only fidelity differs.
Provenance over diff-reconcile. Neither platform records who created a relation, so Flow tracks the edges it created in a per-spec depRelations ledger (native) or the fenced marker (GitHub fallback). A relation Flow can’t prove it created is never removed; a ledgered edge a tracker user deleted is deferred (queued receipt), never silently recreated. The projected flag keys off the directed tracker edge, so a relinked issue reads un-projected.
Safe by construction. A dependency with no linked issue surfaces a named warning and the sync proceeds; a done dependency keeps its relation visible but never re-gates ready=true; self-edges are skipped and cycles project as independent direct edges (no traversal); unreachable transport writes a noop receipt and never blocks. New flowctl sync list-dep-relations / set-dep-relation / clear-dep-relation own the deterministic ledger plumbing. See Dependency projection.
2.0.0 — HTML artifact mode & render lenses
Section titled “2.0.0 — HTML artifact mode & render lenses”Opt-in HTML artifact mode: capture, plan, and make-pr now also emit self-contained HTML render lenses — a spec visualizer for business and plan review, and a read-only PR review instrument — while markdown (and tracker-sync) stays 100% the source of truth.
Detail
Render lens, never record. One config key — flowctl config set artifacts.html.enabled true (OFF by default, offered once by /flow-next:setup) — switches the lifecycle skills into artifact mode. When active they load a shared disclosure reference carrying all generation rules plus an explicit anti-slop design contract (own instrument-panel house style, local-only fonts, zero external requests), and write self-contained single-file HTML to fixed paths under .flow/artifacts/<spec-id>/ — never timestamped, regenerated in place, never parsed back as state, each with a staleness stamp in the footer. Mode off (the default) means zero new steps, zero token cost, zero behavior change. See Visual Aids — specs.
The spec lens. One generation pathway, state-dependent rendering: /flow-next:capture renders the spec-only business-review view (thesis, acceptance criteria with source-tag provenance chips, boundaries, decision context); /flow-next:plan regenerates the same file with the plan layer — task dependency DAG with critical path and the R-ID → task coverage matrix. The spec markdown carries an idempotent artifact link line, replaced in place on every regeneration.
The PR lens. /flow-next:make-pr emits a read-only review instrument: diff-derived (never from commit messages), verified against the spec’s R-ID export — mismatches render as visibly flagged rows, warn-in-artifact, never blocking. It lands in one narrow chore(flow): pr artifact <spec-id> commit so the PR body’s SHA-pinned blob link resolves; --dry-run writes nothing, generation failure is non-fatal, and Ralph’s PR_URL= stdout contract is untouched.
Lavish annotation (optional). lavish-axi is detected on PATH and never required: spec artifacts open as browser annotation sessions, and feedback maps to edits of the markdown source followed by lens regeneration. Pull-only and session-spanning (annotations queue in ~/.lavish-axi/state.json and survive agent death). The PR lens never enters the annotate loop, and autonomous runs generate but never poll.
Breaking. The deprecated planSync.crossEpic config alias (1.x deprecation, readable through the 1.x line) is removed — use planSync.crossSpec.
1.14.0 — Land: the autonomous ship loop
Section titled “1.14.0 — Land: the autonomous ship loop”New /flow-next:land skill — a cadence-driven, fully autonomous babysitter that takes the build loop’s draft PRs the rest of the way: CI kept green, automated reviews converged via resolve-pr, a gated explicit merge, spec close, and your project’s own release process — closing the lifecycle end to end.
Detail
The tick. Each /flow-next:land invocation discovers the open PRs the build loop authored (spec branch_name match and the make-pr breadcrumb — both signals required before any mutation; hand-opened PRs are never touched), walks each through a read-only gate tree — CI tri-state over all checks, a reviewer patience window anchored to the last push (land.patienceMinutes, default 30), unresolved threads, the review signal, mergeStateStatus — and takes at most one action class per PR: a bounded CI fix (land.ciFixBudget, default 3, with a durable flow-next:needs-human label on exhaustion), a /flow-next:resolve-pr dispatch, a mechanical rebase (any conflict hunk → honest BLOCKED), or the merge. Every tick ends with LAND_VERDICT=<MERGED|RELEASED|FIXING_CI|AWAITING_REVIEW|RESOLVING|BLOCKED|NEEDS_HUMAN|NO_WORK> prs=<n> pr=<url|-> reason="…" — worst severity across PRs, last line of output. Drive it on a cadence: /loop 30m /flow-next:land.
The merge gate. Land is the one confined exception to flow-next’s “no gh pr merge from skills” rule. It flips the draft to ready and merges explicitly — gh pr merge --squash --delete-branch --match-head-commit, never --auto — only after CI is green, threads are addressed, and land.reviewSignal is satisfied: silence (default — an automated review present + zero unresolved threads + the window elapsed; built for bot reviewers that never file formal APPROVEs), approve, or a named reviewer login. No automated review ever and no signal configured → it never merges unreviewed (NEEDS_HUMAN).
The tail. After merge: flowctl spec close (the build loop never re-selects merged work), the opt-in tracker.perEvent.land.merged touchpoint (issue → terminal state + verdict comment), then release-follow of your project’s own release docs (RELEASING.md et al.) with an idempotency probe — or stop at merge. A merged-but-unclosed spec re-enters idempotently. --dry-run reports the full gate classification with zero mutations.
Autonomous resolve-pr. /flow-next:resolve-pr now honors the mode:autonomous token (plus FLOW_AUTONOMOUS=1 env): needs-human cases become NEEDS_HUMAN: report lines instead of a blocking question, and the run ends with the machine-readable RESOLVE_PR_VERDICT=<RESOLVED|PENDING|NEEDS_HUMAN> threads=<n> fixed=<n> needs_human=<n> line land gates on. The 2 fix-verify cycle bound is unchanged, and the land.* config keys ship with seeded flowctl defaults.
Land was, fittingly, the first spec pilot drove end-to-end. See Going Autonomous for the three-loop picture.
1.13.0 — Pilot: host-driven autonomous loop
Section titled “1.13.0 — Pilot: host-driven autonomous loop”New /flow-next:pilot skill — a single-tick build-loop conductor that advances one ready spec by one pipeline stage (plan → plan-review → work → make-pr) per invocation and ends with a machine-greppable PILOT_VERDICT line, so your host’s /loop or /goal owns the iteration instead of an external shell script.
Detail
The tick. Each /flow-next:pilot invocation selects the first open + ready spec with satisfied dependencies and no other-actor claims, classifies its stage from flowctl state, dispatches exactly one existing stage skill autonomously, verifies advancement (flowctl review-status fields + status transitions; a gh-confirmed OPEN PR URL for make-pr), and prints the terminal verdict: PILOT_VERDICT=<ADVANCED|NO_WORK|BLOCKED|NEEDS_HUMAN> spec=<id> stage=<stage> reason="<one line>". /goal validators are transcript-blind, so the evidence is echoed into the conversation and stop conditions are phrased against the grammar — e.g. /goal keep running /flow-next:pilot until it prints PILOT_VERDICT=NO_WORK, or stop after 20 turns.
Drivers. Claude Code /goal (v2.1.139+), Claude Code /loop (v2.1.72+; loops expire after 7 days), and Codex /goal (opt-in [features] goals = true, CLI ≥ 0.128.0, plain-text objective — no $skill-in-goal syntax). Caps and budgets belong to the driver; a tick has no timeout machinery. For unattended runs the rp backend works headlessly while the Repo Prompt app is running on the same Mac; on machines without it use --review=codex, --review=copilot, or --review=none.
Autonomous sub-skills. plan, work, and make-pr now honor a mode:autonomous token (plus FLOW_AUTONOMOUS=1 env) that suppresses user questions and picks safe defaults — work branches deterministically, make-pr forces a draft PR and hard-errors instead of prompting. Deliberately distinct from FLOW_RALPH: no ralph-guard hooks, no receipt choreography.
Don’t-thrash. A spec that fails to advance on two healthy ticks is taken out of selection (flowctl spec unready) with the reason in the BLOCKED verdict; re-blessing via flowctl spec ready clears its strikes. Pilot and Ralph are alternative drivers, never nested — pilot refuses to run under FLOW_RALPH.
1.12.0 — Spec readiness signal
Section titled “1.12.0 — Spec readiness signal”A spec now carries a human-owned ready flag — the “complete enough to hand to an agent” gate that autonomous loops will consume — set via flowctl spec ready, projected one-way from your tracker (tracker.readyState), and surfaced through adoption-gated prompts in capture/interview/plan; invisible until you opt in.
Detail
The flag. flowctl spec ready <id> / spec unready <id> toggle a ready boolean on the spec record (default false) — orthogonal to status (a ready spec stays open through planning and work), human-owned or tracker-projected, never agent-inferred. Both verbs are idempotent (no write, no updated_at bump when the flag already matches), and the on-disk flag is lazy — written only after a toggle actually changes state, so non-adopters’ sidecars stay byte-identical. Every JSON read surface (show, specs, list) emits an explicit "ready": <bool>, and ready specs carry a [ready] badge in listings (badge only when set — no draft-noise). See Before planning — the ready flag.
Tracker projection. For tracker-connected repos, the /flow-next:tracker-sync discovery ceremony asks one optional, skippable question: which tracker workflow state means “ready for work”? (Linear: a workflow-state name, matched case-insensitive — names, not state.type; GitHub: a label, pre-created idempotently — present ⇒ ready, absent ⇒ not ready.) Every pull-side sync then projects that state onto the local ready flag — one-way, tracker → local, tracker authoritative — with change-only event-tagged receipts and graceful stale-config degradation (warn + noop receipt + flag untouched + sync continues). See Readiness projection.
Adoption-gated prompting. One in-use gate (≥1 ready spec OR tracker.readyState configured) governs every new prompt — non-adopters see zero new questions anywhere. /flow-next:capture and /flow-next:interview offer an optional end-of-authoring “Mark ready?” consent (default keep-draft; gated off when the tracker is authoritative). /flow-next:plan soft-checks readiness before the scout fan-out — warn, never block, default proceed. capture --rewrite resets a previously-ready spec to draft (a full re-authoring re-opens the blessing); interview refinement never auto-resets.
New regression suite (test_spec_ready.py) wired into CI; Codex mirror regenerated with all three net-new prompts verified.
1.11.0 — Tracker-sync forcing + self-improving glossary
Section titled “1.11.0 — Tracker-sync forcing + self-improving glossary”Tracker lifecycle touchpoints can no longer silently not fire — receipts are event-tagged and work/capture/make-pr end every run with a read-only sync check + one bounded retro-fire — and the glossary now compounds through normal work (prime seeds it, capture adds to it, plan/work/review actually read it).
Detail
Tracker-sync: observable + forcing. The bridge’s lifecycle hooks (claim → In-Progress, done → comment, PR → issue link) were prose obligations an agent could silently skip — PRs landed unlinked, issues never moved, nothing failed. Now: every dispatch’s receipt is event-tagged (flowctl sync receipt --event <perEvent-key>), and a new read-only flowctl sync check <spec-id> --events <csv> --since <iso> audits whether every touchpoint that triggered actually fired (receipt-backed). /flow-next:work, /flow-next:capture, and /flow-next:make-pr run it at end-of-skill — independently of the touchpoints, so a wholesale-skipped dispatch block is still caught — retro-fire any MISSING event exactly once, and end with a mandatory four-state Tracker sync: summary slot (OK | MISSING → retro-fired → OK | MISSING (retro-fire failed) | n/a (bridge inactive)). Tracker-agnostic (the shared receipt/lifecycle layer — Linear, GitHub, future adapters), zero overhead for non-tracker repos (bridge inactive → silent constant-time exit), and flowctl gains no tracker-mutation code — mutations stay agent-driven through the skill. make-pr also closes the execution-fidelity gap deterministically: post-create it verifies the Ref <identifier> line against the live PR body (gh pr view --json body) and repairs append-only via gh pr edit when absent. Recovery guidance for a failed retro-fire is in Tracker Sync. Also corrected linear-mcp.md: the claude.ai Linear MCP returns identifiers, never UUIDs, so first-link requires the GraphQL rung (LINEAR_API_KEY).
Self-improving glossary. The same principle — the system gets better as you use it, never via a manual ceremony — applied to project vocabulary: /flow-next:prime seeds GLOSSARY.md from the repo when it’s absent or a husk (~10–20 evidence-backed terms, read-back gated, never rewrites a populated glossary), /flow-next:capture joins interview as a writer (new conversation-surfaced terms offered at read-back), and the read path widens to where wrong-concept errors get built: repo-scout / context-scout surface request-matched terms (max 5, budget-capped), the work worker’s re-anchor reads task-relevant terms, and impl-/plan-review prompts gain a Vocabulary criterion. Every gate is total_terms == 0 → silent skip. The compounding surfaces (memory, glossary, decisions, strategy) now have a dedicated Self-improving page, a STRATEGY.md track, and a “Self-improving” pillar in the redesigned six-pillar hero grid.
New regression suites (test_sync_check.py, --event coverage in test_tracker_receipts.py) wired into CI; Codex mirror regenerated.
1.10.2 — Homepage points at flow-next.dev
Section titled “1.10.2 — Homepage points at flow-next.dev”The plugin homepage (Claude marketplace + the Claude/Codex plugin manifests, and the Codex websiteURL) now points at https://flow-next.dev instead of the stale mickel.tech/apps/flow-next. .cursor-plugin was already correct; the rest were drift. The flow-next-tui package homepage and the README “Visual overview” doc row were aligned too. author.url / owner.url (personal site / GitHub) are unchanged.
1.10.1 — cp1252 / non-UTF-8 robustness
Section titled “1.10.1 — cp1252 / non-UTF-8 robustness”flowctl impl-review and console output no longer crash on a non-UTF-8 source subtree or a legacy console codepage (e.g. Windows cp1252).
Detail
Read side (#167). flowctl copilot impl-review could abort with UnicodeDecodeError on a repo containing a non-UTF-8 file anywhere in the tree. find_references() (behind gather_context_hints) ran git grep repo-wide and decoded hits with a hard encoding="utf-8" and no errors= — so a single legacy cp1252 file (e.g. a German C/C++ subtree carrying ü / ä / ö / ß) aborted context gathering even when every file you actively edited was UTF-8. The collector now captures git grep output as bytes and decodes with errors="replace". Reported with measured data by VGottselig (304 of ~5400 C/C++ files non-UTF-8 in a large Windows CAD codebase).
Write side (#167). flowctl now forces its own stdout/stderr to UTF-8 at startup, so non-ASCII output (→, umlauts) no longer aborts on a legacy console codepage (UnicodeEncodeError: 'charmap' codec can't encode character '→'). This removes the need for the PYTHONIOENCODING=utf-8 workaround.
/flow-next:work Verify-Completion recovery. When the host drops a long-running worker’s completion report, phase 3d no longer blocks waiting for a result that will never arrive: it diagnoses from ground truth (flowctl show + git log + git status) and classifies — already done → plan-sync; code present but unfinished → re-anchoring continuation worker; nothing landed → retry.
1.10.0 — Eval-optimized scout agents
Section titled “1.10.0 — Eval-optimized scout agents”8 read-only scout/analyst agents got a feature-preserving output budget — ~40–71% leaner output into the planner/work-loop context, with accuracy held (proven by per-target evals + an end-to-end smoke).
Detail
Rolled the external “autoresearch” eval loop (baseline → one mutation → keep-if-better ratchet) across the read-only agents whose free-form output flows into the planner and work-loop context. Each gained a hard output budget — the reductions are at runtime (the rendered output), not in prompt size:
repo-scout(83→100% on its eval set, ~40–50% leaner) ·context-scout(60→93%, ~60–70%; dropped the prescribed Code-Signatures block) ·flow-gap-analyst(~50–70%, 26/27 gaps held) ·quality-auditor(~63%) ·spec-scout(No-Relationship → count, scale-robust) ·docs-scout(~48–69%) ·github-scout(~71%, the biggest) ·practice-scout(~52%).
Feature-preservation is the guarantee, not a hope. A mutation was kept only if a per-target coverage/accuracy eval held (the ratchet): grounding (context-scout’s cited paths test -f-verified against a real 442k-LOC app repo), findings (quality-auditor against a 7-planted-issue testbed — Major bug + all slop still caught, clean code stays ✅), gaps (per-input answer keys), and docs/APIs/gotchas (the “pointer-not-paste” rule: name the API inline, drop code blocks, the link carries depth). The leaner research scouts even surfaced extra real issues a verbose baseline missed (a current CVE; an extra security gotcha).
End-to-end verified. The optimized scouts → a planner produced a correct, ship-quality build plan for a deliberately hard, cross-cutting feature (org-scoped rate limiting) reading only the budgeted scout output — features preserved at the consumer level, not just at scout-output level. The method lives in agent_docs/optimizing-skills.md.
Also: /flow-next:make-pr shed stale build-scaffolding archaeology (render output byte-identical). /flow-next:capture is unchanged — a trim was tried and reverted (the ratchet caught a routing regression), and its no-silent-overwrite guard was verified intact. No flowctl logic touched; Codex mirror regenerated.
1.9.1 — Cursor setup detection + tracker merge-base
Section titled “1.9.1 — Cursor setup detection + tracker merge-base”/flow-next:setup now detects Cursor instead of mis-treating it as Codex, and a comment-first tracker auto-link snapshots its merge-base so a later sync can’t fast-forward over tracker edits.
Detail
Cursor setup detection. Setup keyed platform off plugin-root env vars (DROID_PLUGIN_ROOT → Droid, CLAUDE_PLUGIN_ROOT → Claude Code, else → Codex). Cursor exposes neither, so it fell into the Codex branch — writing the $flow-next-* Codex command syntax + running .codex/ setup, while the installer advertises /flow-next:*. Setup now branches on CURSOR_AGENT + the .cursor-plugin/plugin.json manifest + no codex/ mirror dir → PLATFORM=cursor, applied at every platform-branch point (detection, docs-status template, the Docs question, and the write mapping): it writes the /flow-next:plan snippet into AGENTS.md (which Cursor reads), resolves flowctl via .flow/bin/flowctl, and skips the Codex-only .codex/ copy. The triple guard is hardening against CURSOR_AGENT being inherited by child processes (Codex launched from a Cursor shell) and against the shared repo source tree carrying all manifests — and the installers now --delete-excluded / Remove-Item excluded paths so the codex/-absence proof holds on re-install. Hardened across five rounds of automated cross-model review.
Tracker merge-base snapshot. When the first lifecycle touchpoint for an unlinked spec was a comment op, create-if-unlinked attached the tracker id but didn’t snapshot the merge-base — leaving the issue base-less, so a later body sync could fast-forward and silently overwrite tracker-side edits. The auto-create path now set-merge-base (both halves) + set-last-synced at create time (the written issue body is the render, so the base is exact).
1.9.0 — Cursor install + tracker auto-link
Section titled “1.9.0 — Cursor install + tracker auto-link”Flow-Next now installs into Cursor via a one-shot local plugin (./scripts/install-cursor.sh), and a lifecycle event on an unlinked spec now creates + links the tracker issue first instead of silently no-opping.
Detail
Cursor support (macOS / Linux + Windows). Cursor has its own .cursor-plugin/ plugin namespace and does not auto-read Claude Code plugins the way Grok Build does, so Flow-Next ships a Cursor-native manifest plus a one-shot installer on both platform families — ./scripts/install-cursor.sh (rsync) and install-cursor.ps1 (robocopy). Both copy the plugin into ~/.cursor/plugins/local/flow-next as a real directory (Cursor’s loader rejects symlinks escaping ~/.cursor/), exclude the Codex mirror + tests, and are a re-runnable snapshot. Verified end-to-end, multi-agent flows included — a full /flow-next:plan fanned out the scouts in parallel and drove flowctl to create the spec + tasks; flowctl resolves via the project-local .flow/bin/flowctl. Caveats: no grouped plugin card and the slash autocomplete under-lists the commands (both cosmetic — they run when typed), and Ralph autonomous mode is unsupported (Cursor’s afterFileEdit / beforeShellExecution hooks don’t map to the Claude PreToolUse + Bash|Execute matchers the Ralph guard relies on). See Install → Cursor.
Tracker create-if-unlinked. Previously only capture flow-first-pushed an unlinked spec; every other lifecycle touchpoint (plan / interview / work / make-pr / resolve-pr / completion-review) no-op’d when the spec had no tracker id — so starting a spec with /flow-next:plan left it orphaned from Linear / GitHub (observed dogfooding in Cursor and Codex). A single create-if-unlinked rule now runs the flow-first link first (render spec → create issue → attach id) before reconcile/comment. unlink stays the only operation that no-ops on an unlinked spec; a touchpoint now no-ops only when no transport is reachable.
1.8.0 — Live-app QA pass (/flow-next:qa)
Section titled “1.8.0 — Live-app QA pass (/flow-next:qa)”New /flow-next:qa drives the running app like an unforgiving real user — deriving its test scenarios straight from the spec, filing P0/P1/P2 findings with evidence, and ending with a YES/NO ship verdict.
Detail
Every other Flow-Next review is static — impl-review, spec-completion-review, quality-auditor, code-review read code or specs. /flow-next:qa is the live-app gate: it drives the deployed app via Flow-Next Drive’s surface-aware driver ladder (it never re-implements driving) and is forbidden from marking PASS by reading source — a scenario passes only on captured evidence (screenshot / console / URL).
The differentiator vs spec-less QA tools: scenarios derive directly from the spec — acceptance criteria → scenarios, R-IDs → a coverage table, boundaries → what not to test, decision context → expected behavior — so the host already encodes intent instead of reconstructing it. Findings feed the bug memory track (track: bug, with overlap dedup) and can be promoted to specs/tasks. The pass emits a qa_verdict receipt with four outcomes (SHIP / NEEDS_WORK / BLOCKED / NA) projected onto the review-receipt enum, so it can feed spec-completion-review — “does the live app satisfy the AC, not just the code?”. Runs interactively and autonomously; not a hard Ralph-block; opt-in tracker verdict-post via tracker.perEvent.qa. Requires a live deploy + a driver — with neither it surfaces a BLOCKED verdict rather than failing; a spec with no driveable UI yields a clean NA. The QA discipline (P0/P1/P2 taxonomy, evidence rules, session hygiene) is a lean, credited borrow from Ray Fernando’s running-bug-review-board (Apache-2.0).
1.7.1 — Codex delegation: cheaper on non-Claude hosts
Section titled “1.7.1 — Codex delegation: cheaper on non-Claude hosts”The opt-in Codex delegation reference no longer loads into a Codex / Droid / OpenCode session — the host-platform check moved into the cheap value-check, so delegation short-circuits before the ~45k reference is ever read. Byte-identical for Claude Code users.
1.7.0 — Optional Codex delegation for /flow-next:work
Section titled “1.7.0 — Optional Codex delegation for /flow-next:work”/flow-next:work gains an opt-in delegate:codex mode that offloads code implementation to codex exec (gpt-5.5/medium) while Claude keeps orchestration, review, and all git — a second efficiency lever that offloads work, not prompt size.
Detail
When activated (delegate:codex arg or flowctl config set work.delegate codex), the Claude host stays the orchestrator — plan-reading, review, all git, and decisions — and delegates the token-heavy implementation to codex exec. Default model gpt-5.5, default effort medium, with proven per-batch risk escalation. It’s a different lever than prompt-trimming: it moves implementation tokens to a separate Codex budget.
Strictly opt-in and progressive-disclosure: with delegation off (the default) the work flow is byte-identical to before — one cheap config get, zero new steps. All mechanics live in a reference loaded only when active. The safety surface is the headline work: a one-time sandbox consent; a value-aware recursion guard (the flow-next CODEX_SANDBOX=auto review knob never trips it); mandatory --output-schema with MCP isolation (--ignore-user-config); a deterministic result classifier + sanitized scoped rollback that never touches .flow/; “Codex never touches git” enforced by a post-run HEAD-unchanged assertion (not just the prompt); and a host-owned 3-strike circuit breaker that always falls back to standard mode. The ralph-guard PreToolUse hook is rebuilt to a tokenized argv allowlist that admits only the full canonical delegation shape. Runs in interactive and Ralph mode (consent pre-granted in config for headless). Configure via the work.delegate* keys; see the flowctl reference.
1.6.0 — Tracker-sync is opt-out by default
Section titled “1.6.0 — Tracker-sync is opt-out by default”Hooking up the tracker bridge via /flow-next:tracker-sync now activates the whole lifecycle pipeline by default — you opt out of events, not in.
Detail
Previously every tracker.perEvent.* touchpoint defaulted off, so after the discovery ceremony you had to opt each lifecycle event in by hand. That inverted the intent — connecting a tracker means you want it kept in sync. The discovery ceremony now activates every event on confirmation: capture / interview / plan → reconcile, work.firstClaim → push, work.done / makePr / resolvePr → comment, completionReview → reconcile. Exclude events at ceremony time, or turn any off later with flowctl config set tracker.perEvent.<event> off.
The accidental-enable guard is preserved: the config schema default for each leaf stays off, so a bare tracker.enabled=true set by hand or a script — without running the ceremony — fires no lifecycle-event sync (make-pr’s unconditional PR↔issue link is the one exception). Activation is ceremony-gated, not flag-gated. No config-schema change; docs updated across every surface.
1.5.3 — Tracker-sync receipts auto-ignored
Section titled “1.5.3 — Tracker-sync receipts auto-ignored”.flow/sync-runs/ (per-run tracker-sync receipts) is now auto-gitignored, and flowctl’s managed .flow/.gitignore self-upgrades so existing repos pick up new patterns on the next init.
1.5.2 — Tracker-sync projects the full spec
Section titled “1.5.2 — Tracker-sync projects the full spec”/flow-next:tracker-sync now guarantees the issue body mirrors the entire spec — a render guardrail stops the host agent from pushing a summarized body instead of the full projection.
1.5.1 — Setup docs fix + Windows de-flake
Section titled “1.5.1 — Setup docs fix + Windows de-flake”Fresh /flow-next:setup now ships the tracker-sync CLI reference it dropped in 1.5.0, a CI guard keeps the bundled template and the dogfood copy in lockstep, and a flaky Windows migration race is fixed.
Detail
usage.mdshipped without tracker-sync docs. 1.5.0 added theflowctl sync/--tracker-firstcommand block to the repo’s lived-in.flow/usage.mdbut not to the bundled template/flow-next:setupactually copies — so every fresh setup documented the whole CLI except the tracker-sync bridge that shipped in the same release. The canonical template is now byte-synced (Codex mirror regenerated).- Drift guard so it can’t recur. A new parity test hard-asserts
.flow/usage.md≡ its canonical setup template (and.flow/templates/spec.md≡templates/spec.md) across the ubuntu/macos/windows CI matrix. Edit the dogfood copy and forget the template → CI fails instead of consumers getting stale docs. - Flaky Windows CI fixed. Parallel
migrate-renamecould leave a concurrent writability probe’s temp file (.rw-probe-*.tmp) visible to the backup copy’s directory scan, then have it vanish before the copy opened it (FileNotFoundError) — a TOCTOU that Windows’s slower unlink widened. The backup copy now skips those transients and tolerates any file that disappears mid-copy.
1.5.0 — Tracker sync bridge
Section titled “1.5.0 — Tracker sync bridge”New /flow-next:tracker-sync projects a spec to a Linear or GitHub issue and reconciles body, status, and comments two-way — projection, not coordination — and a tracker key (wor-17) now resolves as a spec id everywhere fn-NN does.
Detail
/flow-next:tracker-sync mirrors a .flow/specs/<id>.md spec onto a tracker issue (Linear first, GitHub next). Projection, not coordination: the spec stays the single source of truth and the quality layer; the tracker is a co-editable mirror that never drives flow state or spawns agents (contrast OpenAI Symphony, where the board is the control plane). It is distinct from /flow-next:sync, which is plan-sync. New docs: Tracker Sync.
- Discovery ceremony (detect → surface → ask → never-assume) probes a Linear MCP /
LINEAR_API_KEY/ GitHub auth / a Jira host and writestracker.*config only on confirmation (env > config > ask, the same ladder asflowctl review-backend). The bridge is off until explicitly enabled and active whentracker.enabled == trueortracker.type ∈ {linear, github}. - Transport ladder per adapter — Linear: MCP → GraphQL → no-op; GitHub:
gh(single rung, reduced-fidelity status) → no-op. Orchestration is transport-blind; when no transport is reachable the run is anoopplus a receipt note, never a crash. - Hybrid id model: tracker-first specs are canonically
wor-17-slug(taskswor-17-slug.M; barewor-17resolves); flow-first specs keepfn-NNplus a resolvableWOR-17display alias.show/work/plan wor-17resolve case-insensitively, the nativefn-scheme is reserved, one tracker team per repo, and ids never rename on link. See Spec & task ids. - Seven lifecycle skills gain opt-in tracker-sync touchpoints — capture, interview, plan, work (first-claim + done), make-pr, resolve-pr, spec-completion-review. Each
tracker.perEvent.*leaf defaultsoff; the no-tracker workflow is unchanged and remains the documented default. - PRs are Diffs-ready. When the bridge is active,
make-prunconditionally links the new PR to its tracker issue (nomakePropt-in). For Linear it writes a non-closingRef WOR-N(plus a richattachmentLinkURLon the GraphQL transport) so the PR renders as a Linear Diff inside the issue; for GitHub it is a nativeRefs #Ncross-link. Non-closing is deliberate — merge never auto-completes the issue, spec-completion-review owns Done. make-prnow creates the PR autonomously — no confirm gate. Invoking the skill is the intent; the body is deterministic; the default is a reversible smart-draft.--dry-runprints the body without creating,--ready/--draftset the draft state.- Setup proposes the bridge.
/flow-next:setupnever touches tracker config (keeping the zero-dep base clean) and now proposes running/flow-next:tracker-syncas an optional next step when it finishes. - Ralph-safe: every run emits a receipt; genuine conflicts queue to the review deferred-findings sink rather than block. An
always-asktiebreak resolves to queue in autonomous mode.
Sync-engine shape (discovery ceremony, per-item lastSyncedAt, surface-diffs-never-overwrite) adapted from Ray Fernando’s rayfernando-skills running-bug-review-board issue-trackers.md (Apache-2.0). Thank you, Ray.
1.4.0 — flow-next-drive surface-aware automation
Section titled “1.4.0 — flow-next-drive surface-aware automation”The browser skill is renamed flow-next-drive and rebuilt as a surface-aware driver ladder — it detects the UI surface (web, Chromium-backed desktop, or true-native app) and picks the best available driver, degrading gracefully.
Detail
The skill is no longer hardwired to a single browser driver. It detects the target surface and branches: (a) web app → web ladder; (b) Chromium-backed desktop app (Electron / Windows WebView2) → the same web ladder, attaching over CDP to the app’s remote-debugging port (agent-browser --cdp <port> / --auto-connect; chrome-devtools-mcp --browser-url); (c) true-native / non-CDP surface (macOS AppKit/SwiftUI, or a webview exposing no CDP — e.g. macOS WKWebView / Tauri-on-macOS) → Computer Use. All surfaces share one universal flow (observe → snapshot → act on fresh refs → verify → capture → release); only the actuation differs.
The web ladder, in priority order: agent-browser (default rung, the only assumed-present driver, CDP-based + headless-safe) → chrome-devtools-mcp (auto-wait + attach-to-real-signed-in-Chrome) → Playwright → cursor-ide-browser MCP → manual screenshot relay. The native rung is Computer Use — driver-agnostic across Codex Computer Use and/or Anthropic “Claude” Computer Use (the API computer tool via its own harness); detected and optional, never a hard dependency, never on a headless path. When no Computer Use is present, a Chromium-backed app still drives via the web-ladder CDP attach; a genuinely native app documents the limitation rather than fails. The existing agent-browser references fold into the default-rung reference — no capability regression.
The driver ladder + universal-flow structure is adapted from Ray Fernando’s rayfernando-skills running-bug-review-board skill (Apache-2.0). Migration: /flow-next:browser is gone — the skill is now /flow-next:flow-next-drive (and flow-next-drive on the Codex mirror, fixing the prior agent-browser rename that collided with the user’s global agent-browser skill and Codex-native browser skills). An orphaned browser / agent-browser skill in a cached install auto-clears within ~7 days or immediately by deleting the stale cached marketplace directory under ~/.claude/plugins/cache/<marketplace>. See the Drive skill page.
1.3.4 — Review-output R-ID suffix fix
Section titled “1.3.4 — Review-output R-ID suffix fix”The review-output R-ID parser now preserves single-letter suffixes (R4a / R4b) — they were being silently dropped from the coverage gate and fix-loop targeting.
Detail
parse_unaddressed_rids read R-IDs from a reviewer’s Unaddressed R-IDs: summary line (_extract_rids) and from the ## Requirements coverage table fallback using bare \bR(\d+)\b. fn-49.1 (1.2.1) taught the spec acceptance-criteria parser the R\d+[a-z]? suffix form but left this review-output path behind — so a reviewer reporting Unaddressed R-IDs: [R4a, R4b] parsed to ['R5'], dropping the suffixed IDs from the R-ID coverage gate and fix-loop targeting. Both review-output regexes are now \bR(\d+[a-z]?)\b, in lockstep with the spec parser; multi-letter suffixes (R4ab) and separators (R-4) stay rejected. New test_unaddressed_rids_parser.py (10 cases) wired into the ubuntu/macos/windows CI matrix. Surfaced by a live impl-review A/B — the current review prompt caught it (the experimental slop rubric being tested was shelved as unproven).
Earlier releases
Section titled “Earlier releases”1.3.3 — Scout flowctl fallback
Section titled “1.3.3 — Scout flowctl fallback”Scouts fall back to the bundled .flow/bin/flowctl so .clawpatch/ feature enrichment fires even when dispatched subagents don’t inherit the plugin-root env var.
Detail
When repo-scout / context-scout run as dispatched subagents they may not inherit CLAUDE_PLUGIN_ROOT / DROID_PLUGIN_ROOT, which left their Step 0 flowctl repo-map list --json call resolving to a broken /scripts/flowctl — the scout then silently grep-degraded and features_anchored never fired even with a populated .clawpatch/. Both scouts now fall back to the bundled .flow/bin/flowctl ([ -x "$FLOWCTL" ] || FLOWCTL=".flow/bin/flowctl"), so a /flow-next:setup-installed repo resolves regardless of subprocess env — across Claude Code, Factory Droid, and Codex. Also makes sync-codex.sh’s agent-body fallback injection idempotent (no duplicate line in the Codex mirror) and the scout-fallback contract test hermetic (runs in a throwaway git repo so a local dogfood .clawpatch/ can’t break it). Surfaced by full live end-to-end testing — mapping Flow-Next’s own repo via --source=agent (codex) produced 9 features, then the scout enrichment was exercised against them.
1.3.2 — Heuristic-0-features hint
Section titled “1.3.2 — Heuristic-0-features hint”/flow-next:map now explains why the provider-free mapper found 0 features on an unconventional repo, and points at --source=auto|agent.
Detail
Live-testing on Flow-Next’s own repo showed clawpatch’s heuristic detectors target conventional app/framework layouts (npm bins, Next.js routes, Python packages, Rails / Laravel / Django, Go / Rust, JVM, .NET, SwiftPM, Phoenix); a plugin + markdown-skill + flowctl.py-CLI + bun-TUI repo matches none, so heuristic returns 0 features while clawpatch flags coverage as “weak.” The Phase 5 summary previously printed a silent “Mapped: 0 feature(s)” — it now explains the conventional-layout targeting and points at --source=auto (heuristic-first, provider only if weak) or --source=agent (always provider-backed; needs CLAWPATCH_PROVIDER + tokens). For reference, --source=agent via codex produced 9 well-scoped features for Flow-Next’s repo (Flowctl CLI Core, Ralph Guardrails, Flow Memory System, TUI shell/theme/integration, Plugin Packaging, Strategy/Docs). Also root-ignores .clawpatch/ in the dev repo so the local feature map never gets staged. See the skill page for the conventional-vs-unconventional note.
1.3.1 — PNPM_HOME hint reword
Section titled “1.3.1 — PNPM_HOME hint reword”The /flow-next:map install hint is now conditional and pnpm-version-agnostic — it no longer presumes an install already happened.
Detail
Live-testing 1.3.0 (clawpatch never installed, pnpm 10) showed the hint presumed an install had already happened (“install succeeds but PATH unchanged”) and attributed the PATH wiring to pnpm v11 specifically — misleading for a first-time or pnpm-10 user whose global bin sits at ~/.local/share/pnpm. Now conditional and version-agnostic: “if you already ran pnpm add -g clawpatch and still see this, pnpm installs globals under $PNPM_HOME and needs a one-time pnpm setup.” Same correction in docs/troubleshooting.md + the skill page above. Logic unchanged.
1.3.0 — /flow-next:map skill
Section titled “1.3.0 — /flow-next:map skill”New opt-in /flow-next:map skill wraps clawpatch for a semantic feature index that scouts and /flow-next:prime can read — flowctl core stays zero-dep.
Detail
Wraps clawpatch map to produce a semantic feature index (~20 languages, persisted at .clawpatch/features/*.json, Zod-validated schemaVersion: 1). Opt-in convenience throughout — flowctl core never imports or requires clawpatch; the skill is the only flow-next surface that touches it. Default invocation is provider-free (--source heuristic, zero LLM calls, deterministic mapper); --source auto|agent flows through as passthrough. Missing binary → skill prints pnpm add -g clawpatch install instructions verbatim and exits cleanly (no auto-install); pnpm-installed-but-not-on-PATH → skill prints the PNPM_HOME bin/ hint. Single-source SUPPORTED_CLAWPATCH=">=0.4.0 <0.5.0" version pin lives in skill prose; outside-range → one-line stderr warning + degrade, never block. Ralph-block (decline-to-run, no receipt write) under FLOW_RALPH=1 / REVIEW_RECEIPT_PATH.
Companion flowctl repo-map list / show / since-ref reader subcommands parse the index directly; readers bypass ensure_flow_exists() and gate on .clawpatch/ presence instead so the prime detection branch works without special-casing. since-ref uses three-dot <ref>...HEAD semantics (fixed during PR review — two-dot <ref>..HEAD polluted overlap results with upstream-only advancement). repo-scout and context-scout call flowctl repo-map list --json as Step 0 when .clawpatch/ is present and emit an optional features_anchored: [...] field with a last_mapped staleness timestamp; scouts remain useful with the existing grep/glob path when .clawpatch/ is absent (fallback contract is load-bearing). /flow-next:prime adds a DE7 informational sub-criterion under Pillar 5 (Dev Environment) surfacing /flow-next:map in Top Recommendations when the index is missing — pillar count stays at 8, scored criteria stay at 48, total criteria become 48 → 49.
The feature index is local-only by design: .clawpatch/.gitignore skeleton is * + !.gitignore, so the index is regenerable-per-developer rather than committed (avoids PR review noise + merge conflicts on a pre-1.0 schema; full sharing-contract trade-off table on the skill page). Docs: platforms.md gains an “Optional skill requirements” section; troubleshooting.md gains a clawpatch-failure-modes section. GLOSSARY.md entries added for “feature map” and “features_anchored”. CI matrix wires test_repo_map.py (22 tests) + test_scout_fallback_contract.py (14 tests) + test_pnpm_home_hint_prose.py (5 tests) + map_smoke_test.sh (75 cases) on ubuntu / macos / windows using checked-in fixtures (no Node 22+ or clawpatch needed in CI).
1.2.1 — make-pr parser fixes
Section titled “1.2.1 — make-pr parser fixes”Two spec export-cognitive-aid parser bugs fixed so /flow-next:make-pr bodies stop silently dropping content.
Detail
(1) The R-ID parser regex R\d+ was extended to R\d+[a-z]? so sub-scoped sibling criteria like R4a / R4b (introduced by /flow-next:capture when revising specs in-flight) are no longer dropped from acceptance_criteria / uncovered_r_ids; the suffix form is now blessed in templates/spec.md as canonical. (2) The memory_during_spec time-window filter now has a deterministic null-safe fallback chain (spec.created_at → earliest task created_at → branch first-commit via git log {base}..{branch}) so memory entries surface correctly when spec.created_at is null. Both surfaced during fn-48’s make-pr where PR #146 carried workaround prose. Two further bugs in the branch-first-commit fallback (returning the branch tip not the root commit; walking inherited mainline history) were caught by chatgpt-codex-connector[bot] review on PR #147 and fixed with regression tests. Unit suite 624 → 646.
1.2.0 — Backend-split review workflows
Section titled “1.2.0 — Backend-split review workflows”Review skills split by backend so codex/copilot load only their own slice (impl-review 1126 → 70 LOC on codex); FLOWCTL prelude consolidated.
Detail
spec-completion-review drops from 645 to 41 LOC on codex (14×), impl-review from 1126 to 70 LOC (16×). RP keeps its cohesive prompt template since it only loads under the RP backend. resolve-pr was evaluated and kept inline (its parallel-vs-serial divergence sits below the 50-line split threshold codified in agent_docs/adding-skills.md). Also drops the dead DROID_PLUGIN_ROOT:-CLAUDE_PLUGIN_ROOT fallback from the Codex mirror’s FLOWCTL prelude (neither env var is set in Codex) and consolidates the canonical prelude to once-per-skill-file. Mechanical refactor only — bash, gating, and verdict semantics unchanged; smoke 127/2 baseline-equivalent. Factory Droid contract re-verified 2026-05-25 — DROID_PLUGIN_ROOT fallback + Bash|Execute matcher stay; .factory-plugin/plugin.json fallback dropped as dead code per Factory’s Claude-Code interop guarantee.
1.1.11 — Cross-spec plan-sync setup fix
Section titled “1.1.11 — Cross-spec plan-sync setup fix”flowctl init no longer silently flips pre-1.1.3 users’ planSync.crossEpic from on to off — it mirrors the legacy value to the canonical planSync.crossSpec key before the default-merge when canonical is absent. Legacy key preserved through 1.x.
1.1.10 — usage.md reference
Section titled “1.1.10 — usage.md reference”.flow/usage.md template promoted to a comprehensive CLI reference (100 → 212 lines) — adds status, config get/set, per-spec/task set-backend, checkpoint, ralph control, lifecycle commands, and a corrected file-structure diagram.
1.1.9 — Copilot review on Windows (real fix)
Section titled “1.1.9 — Copilot review on Windows (real fix)”flowctl copilot *-review now works on native Windows by delivering the prompt via stdin (subprocess.run(input=…) with --session-id / --resume), sidestepping the CreateProcessW 32,767-char argv cap entirely. Verified by a real-subprocess Windows CI smoke round-tripping a 60 KB prompt. Supersedes the 1.1.8 WSL workaround. Upstream: github/copilot-cli#3398.
1.1.8 — Copilot Windows fail-fast guard
Section titled “1.1.8 — Copilot Windows fail-fast guard”Fail-fast guard + WSL pointer for the Windows Copilot argv-cap failure (cryptic OSError winerror 206 before; clean error after). Reported by Simon Flauger (SEMA-CAD). Real fix landed in 1.1.9.
1.1.7 — Codex mirror frontmatter cleanup
Section titled “1.1.7 — Codex mirror frontmatter cleanup”Stripped request_user_input from 6 Codex-mirror SKILL.md frontmatters — fn-45 rewrote the prose reference but left it in frontmatter, so Codex agents called the unavailable tool and reintroduced the Default-mode failure. sync-codex.sh guard tightened so it can’t regress.
1.1.6 — prime ruleset-based branch protection
Section titled “1.1.6 — prime ruleset-based branch protection”/flow-next:prime SE1 now detects ruleset-based enforcement in addition to classic branch protection, so GHE Enterprise repos protected via repo / org / enterprise rulesets correctly show SE1 ✅. Reported by Georg Keller (SEMA-CAD).
1.1.5 — interview business-scope reframe
Section titled “1.1.5 — interview business-scope reframe”Removed deadline / time-budget / sprint-cadence questions from /flow-next:interview --scope=business (agents can’t estimate their own work, and time pressure collapsed the interview into brutal-prioritization). MVP-scope cuts reframed by feature value; budget envelope scoped to infra / vendor / licensing.
1.1.4 — Canonical ## Acceptance Criteria heading
Section titled “1.1.4 — Canonical ## Acceptance Criteria heading”## Acceptance Criteria is the canonical spec heading. Parsing stays tolerant of legacy ## Acceptance and lowercase ## Acceptance criteria — existing specs need no migration.
1.1.3 — crossSpec alias + SPEC.md discovery
Section titled “1.1.3 — crossSpec alias + SPEC.md discovery”Cross-spec plan-sync aligned on planSync.crossSpec. Repo-root SPEC.md / spec.md template discovery for project-customized spec scaffolds; /flow-next:setup can opt into a root SPEC.md without clobbering custom templates.
1.0 — Spec-driven foundation
Section titled “1.0 — Spec-driven foundation”The 1.0 line that stabilized the vocabulary and core workflow:
- Spec vocabulary stabilized (Spec / Task / R-ID / Handover / Receipt / Ralph)
- Symmetric
--scope=business|technical|bothinterview - Source-tagged capture with mandatory read-back
- PR-as-cognitive-aid generation
- Agent-native memory audit and migration
- PR feedback resolver
- Strategy and glossary grounding
- Ralph autonomous mode with receipts
Maintaining this page (for contributors)
What belongs here — human-readable release highlights:
- new slash commands
- changes to spec or task semantics
- review receipt changes
- migration requirements
- breaking or deprecated behavior
- docs, team workflow, or Ralph changes that alter how people should work
The repository CHANGELOG.md remains canonical — this page summarizes the current public story, not every commit.
Per-release entry format — add to the top of ## Latest:
### X.Y.Z — short title (3-6 words)
**One-sentence summary of what changed and why it matters.**
<details><summary>Detail</summary>
Full prose (lift the substance from the repo CHANGELOG entry). Blank linesaround this block so MDX renders the markdown inside.
</details>Rules: version ### X.Y.Z — title heading (never a bare bullet — it’s what makes the right-sidebar TOC a version index); bold one-liner mandatory; <details> only for verbose multi-paragraph releases (trivial patches skip it); newest at the top of ## Latest; migrate the oldest ## Latest entries down to ## Earlier releases once it grows past ~4-5; bump src/lib/site.ts FLOW_NEXT_VERSION + package.json in the same commit. Full runbook: agent_docs/releasing.md → “Docs-site changelog entry”.
Release flow:
flowchart LR Change["Behavior change"] --> Docs["Update docs"] Change --> Tests["Run tests"] Docs --> Changelog["Update changelog"] Tests --> Release["Cut release"]
If Flow-Next behavior changes and the docs site does not, assume the release is incomplete until proven otherwise.