Architecture

Flow-Next is deliberately split between agent-native skills and deterministic flowctl plumbing.

Skills own judgment

Skills perform work that needs codebase reading, judgment, sequencing, and user clarification:

/flow-next:capture
/flow-next:interview
/flow-next:plan
/flow-next:work
/flow-next:impl-review
/flow-next:make-pr

The host agent is the intelligence. Flow-Next does not spawn a second LLM from flowctl just to make a judgment the current agent can make directly.

flowctl owns mechanics

flowctl is pure Python plumbing for operations that need determinism:

Atomic writes to .flow/
Schema validation
Task and spec state transitions
Review receipt validation
Git plumbing
Migration helpers
Backend dispatch wrappers

Repo-local state

.flow/
├── specs/
├── tasks/
├── memory/
├── review-receipts/
├── bin/
└── usage.md

This keeps the install non-invasive and the failure mode obvious.

Control plane

flowchart TB
  Human["Human intent"] --> Skill["/flow-next skill"]
  Skill --> Agent["Host agent judgment"]
  Skill --> Flowctl["flowctl deterministic writes"]
  Agent --> Repo["Codebase"]
  Flowctl --> State[".flow state"]
  Repo --> Review["Review backend"]
  State --> Review
  Review --> Receipt["Receipt"]

The host agent reads code, asks clarifying questions, judges tradeoffs, and edits files. flowctl creates IDs, validates state, writes JSON and markdown atomically, and records transitions. The review backend supplies independent pressure so the same model that wrote the change is not the only reviewer.

Why this split matters

Anything that requires judgment stays in a skill:

Does this spec satisfy the product request?
Which code paths are relevant?
Is a finding introduced by this diff or pre-existing?
What should the PR reviewer read first?

Anything that should be deterministic stays in flowctl:

Allocate the next spec ID.
Mark a task started or done.
Validate dependencies.
Emit machine-readable receipt JSON.
Migrate repo-local .flow/ state.

This keeps Flow-Next portable across harnesses. Claude Code, Codex, and Droid can all run the same workflow because the intelligence is the current host agent, not a hidden service.

Multi-harness model

Harness	Primary role
Claude Code	Canonical plugin surface and slash-command workflow
OpenAI Codex	Codex mirror with equivalent skills and subagent dispatch
Factory Droid	Cross-platform agent runtime support
RepoPrompt	High-context review and external model review workflows

The docs use slash commands because that is the user-facing workflow. The CLI reference exists for lower-level automation and debugging.

Interactive questions on Codex

Canonical Flow-Next skills use Claude Code’s AskUserQuestion primitive for blocking decisions. The Codex mirror does not call request_user_input, because that tool is unavailable outside Codex Plan mode.

As of Flow-Next 1.1.2, sync-codex.sh rewrites canonical AskUserQuestion invocations into plain-text numbered prompts for Codex:

1. Recommended option
2. Alternative option
3. Other — type your own answer

That gives Codex Desktop Default mode, Codex Plan mode, and Codex CLI the same behavior without runtime mode detection. Claude Code and Factory Droid keep their native blocking-question surfaces.