Skip to content

Orchestration & Model Routing

flow-next is an orchestration layer, not a single-agent workflow. The host agent conducts: it fans work out to tiered subagents, routes reviews to a different model family than the writer, optionally delegates implementation to a second CLI agent, and runs autonomous build/ship loops. Which model does what is a routing decision — and every routing decision in flow-next is either a parameter or a sentence of intent away. The second kind carries judgment.

The trend of mid-2026 — a frontier model orchestrating while cheaper/faster models do the token-heavy work — is flow-next’s default shape:

WhatDefault routing
Orchestration, judgment, gitHost session model (your pick — e.g. Fable 5)
Planning scouts, gap analysisFast judgment tier (subagent tiers)
Readiness scannersCheapest tier — mechanical scan-and-report
Implementation workersInherit the session model
Plan/impl/completion reviewsA different model family via pluggable backends
Autonomous loopsPilot + land ticks driven by /loop / /goal

Skills are prompts executed by the host agent, not compiled code. That gives you two genuinely different routing methodologies — use both:

Deterministic — parametersPrompted — agentic intelligence
What it isConfig keys, flags, per-spec/per-task fields. Machine-resolved, same answer every timePolicy described in natural language. The host judges per item — conditionally, mid-run
Exampleflowctl config set review.backend codex”Work the three ready specs — decide per spec, by complexity, whether implementation is delegated or stays on the session model”
ReachExactly the surfaces that shipAnything the host can do — including capabilities that don’t exist as parameters
When it winsHeadless/Ralph runs, stable team defaults, reproducibilityPer-item complexity calls, conditional escalation, inventing a routing the registry doesn’t have

The two compose: parameters set the floor, prompting steers above it. Either can be made durable in CLAUDE.md / AGENTS.md — the host reads your instruction files every session, and flow-next skills inherit them automatically.

The review subsystem is the most routable surface — spec grammar backend[:model[:effort]] over rp | codex | copilot | cursor | none:

Terminal window
flowctl config set review.backend codex # project default
flowctl config set review.backend cursor:composer-2.5 # Cursor-billed, folds effort into the model
flowctl config set review.backend codex:gpt-5.4:xhigh # explicit model + effort

Precedence, highest first: per-task review: / per-spec default_reviewFLOW_REVIEW_BACKEND → config → backend env → default. A single task can pin its own reviewer and the override routes end-to-end. The cursor backend reaches reviewer models the others can’t (gpt-5.5-high at 1M context, composer-2.5, claude-opus-4-8-thinking-high) on your existing subscription. Details: review workflow, configuration.

Opt-in: /flow-next:work offloads the token-heavy part (writing code) to a local codex exec while the host keeps every judgment call — gating, classification, git, review, commit:

Terminal window
flowctl config set work.delegate codex # or per-run: /flow-next:work fn-12 delegate:codex
flowctl config set work.delegateModel gpt-5.5

OFF by default, one-time consent-gated, circuit-breakered, and independently verified — a delegated diff is never trusted on the delegate’s own summary. See work.

Prompted orchestration — routing with judgment

Section titled “Prompted orchestration — routing with judgment”

This is the mode parameters can’t reach: routing policy that’s conditional and per-item, decided against the actual work rather than fixed up front.

Per-item complexity routing — the host classifies, then routes:

Work through the three ready specs. Decide per spec, based on complexity,
how the work stage runs: anything touching auth or the migration you
implement yourself on the session model; plain CRUD is delegated to codex
(delegate:codex). Reviews come from codex either way.

Focus and scope steering — instruction the skill never anticipated, read as intent:

/flow-next:plan fn-12 --depth=deep — focus the research on the migration path; I care about rollback
/flow-next:interview fn-12 — push hard on failure modes and operational edges, skip UI polish
/flow-next:work fn-12 — the UI tasks stay with you; delegate the API plumbing to codex

Conditional escalation — routing that reacts to outcomes:

Run /flow-next:work fn-12 with delegate:codex. If a task's review comes back
NEEDS_WORK twice, stop delegating that task and implement it yourself on the
session model.

Prompting a capability into existence — there is no fable review backend in the registry; describe it and it exists:

/flow-next:plan-review fn-12 — don't use the configured backend; spawn a
fresh-context subagent on the session model with the same review criteria,
and feed its verdict into the fix loop like any other reviewer.

Backends, reviewers, and delegates are prompts plus plumbing — when a rung you want is missing, the host builds the arrangement on the spot.

The orchestration patterns that emerged in the wild through mid-2026 all have a direct flow-next expression — most need one config key or one sentence:

Pattern from the fieldThe ideaflow-next expression
Orchestrator → executorThe frontier model plans and judges; a cheaper, highly steerable model (GPT-5.5 via the Codex CLI, on the sub you already pay for) writes the codeflowctl config set work.delegate codex — or per-run delegate:codex. Host keeps gating/git/review
Orchestrator → readerToken-hungry, low-judgment reads run on fast models that report summaries backAlready the default: scouts run on fast tiers and return digests
Cross-family reviewerThe model that writes is never the model that reviewsreview.backend codex / cursor:composer-2.5; per-task review: pins exceptions
Effort disciplineOrchestrator at high, not max — top effort tiers are token furnaces on routine workSession effort is yours; work.delegateEffort floors the delegate
Token-hungry offloadComputer use and live-app verification go to other agents; results come back as evidence/flow-next:qa drives the app in its own context; workers run fresh-context and return receipts

The emergent durable pattern: a standing “which model for what” section in your agent instructions — a ranking of the models you can reach plus routing rules. This is prompted orchestration made durable: the table is interpreted by intelligence, not parsed by a config loader. The host applies it with judgment whenever it dispatches subagents, picks reviewers, or decides to delegate — which is exactly why the rules grant standing permission to escalate. A complete, copy-paste starting point, adapted to the flow-next pipeline:

## Picking models for flow-next workflows and subagents
Rankings, higher = better. Cost reflects what I actually pay (existing
subscriptions), not list price. Intelligence is how hard a problem you can
hand the model unsupervised. Taste covers UI/UX, code quality, API design, copy.
| model | cost | intelligence | taste |
|---------------------------|------|--------------|-------|
| gpt-5.5 (codex CLI) | 9 | 8 | 5 |
| composer-2.5 (cursor CLI) | 9 | 6 | 6 |
| sonnet-5 | 5 | 7 | 7 |
| fable-5 (session model) | 2 | 10 | 9 |
How to apply:
- These are defaults, not limits. Standing permission to override: if a
cheaper model's output doesn't meet the bar, rerun or redo with a smarter
model without asking. Judge the output, not the price tag — escalating
costs less than shipping mediocre work.
- Cost is a tie-breaker only; for anything that ships, intelligence > taste > cost.
- Orchestration, planning, review verdicts, anything ambiguous: session
model. /flow-next:plan, /flow-next:interview, and pilot/land driving stay
here — never delegate judgment.
- Bulk/mechanical implementation (clear spec, low ambiguity): delegate to
gpt-5.5 — /flow-next:work <id> delegate:codex. Config:
work.delegateModel=gpt-5.5, work.delegateEffort=medium.
- Anything user-facing (UI, copy, API design) needs taste >= 7 — keep those
tasks on the session model even when they look mechanical.
- Reviews route to a different family than the writer:
review.backend=codex (or cursor:composer-2.5 for speed). Escalate
NEEDS_WORK disagreements between reviewer and worker to the session model.
- Token-hungry, low-judgment work (codebase analysis, live-app QA driving):
subagents and flow-next scouts — summaries come back, the orchestrator
never holds the raw tokens.
- Mechanics: gpt-5.5 is reached through the Codex CLI (delegate:codex spawns
codex exec); composer-2.5 through cursor-agent (review backend
cursor:composer-2.5). Claude-family models run natively as subagent tiers.

Role labels are durable; model IDs are volatile. Write the table in terms of roles, re-rank as the frontier moves, and the routing rules survive every model generation.

Pilot and land end every tick with machine-readable verdicts so a driver can compose them — a multi-model spec-to-merged-PR pipeline in one prompt:

/loop 30m — one tick: run /flow-next:pilot --review=codex --depth=deep.
If PILOT_VERDICT=DEFERRED_TO_LAND, run /flow-next:land in the same tick.
Delegation is on (work.delegate=codex): mechanical tasks go to gpt-5.5,
UI tasks stay on the session model, reviews come from codex.
Stop when pilot prints NO_WORK and land prints LAND_VERDICT=NO_WORK,
or on any NEEDS_HUMAN.

DEFERRED_TO_LAND exists exactly for this hand-off. For the hardened overnight harness, see Ralph.

Newer flow-next versions ship this page’s substance into the repo you’re working in, so the host agent finds it at use time without visiting these docs:

  • .flow/usage.md (installed by setup, read every session) carries an ## Orchestration & model steering section: dogfooded headless-bridge recipes for codex exec, cursor-agent, and the reverse direction — claude -p, plus the flow-next shortcuts (delegate:codex, review.backend, per-task review:) and prompted-orchestration examples. The bridges run in every direction: from Claude Code the bridges are codex exec/cursor-agent; from Codex or Cursor they are claude -p/the other CLI. Any harness that can run Bash can be the conductor.
  • CLAUDE.md / AGENTS.md can hold the durable model-routing table above: /flow-next:setup offers, as an optional ceremony step, to scaffold it live — probe-annotated for the CLIs you actually have installed, shown in full before writing, yours to edit after, marker-fenced so /flow-next:uninstall removes it cleanly.

On older installs, copy the table from this page instead — the behavior is identical; the scaffold is a convenience.

  • Judgment stays with the host — a delegated codex exec writes code; it never owns git, task state, or decisions.
  • Consent gates don’t route away — delegation stays opt-in; blast radius surfaced before first use.
  • Merge is human-gated everywhere except the explicitly opted-in land loop.
  • Verification is independent — tests re-run on every delegated diff.
  • Escalation beats thrift — when you downgrade a role, watch the first outputs and revert on the first quality miss.