`soup loop` — the production data flywheel, all from the CLI
NVIDIA's data-flywheel reference needs a multi-service stack. Observability vendors (Langwatch, Helicone, Galileo) monetize per-trace and have zero upside pushing customers downstream into training. OpenPipe tried this exact business and pivoted to RL agents before CoreWeave acquired it.
soup loop (v0.58.0) ships it as one CLI on a laptop. It connects 8 of Soup's existing uniques — v0.26 Trace-to-Preference, v0.26 Eval-Gated Training, v0.26 Registry lineage, v0.26 Quant-Lobotomy verdicts, v0.26 Soup Cans, v0.25 Autopilot, v0.54 Advise, v0.55 Eval Design, v0.56 Diagnose — into a single loop:
> production traces → preference pairs → eval-gated DPO → canary deploy → auto-rollback
Six subcommands
soup loop init <served-model> --eval <suite> --baseline registry://<id> \
--monthly-budget 50usd --max-runs-per-day 3
soup loop status
soup loop watch [--foreground|--detach] [--max-iterations N] [--poll-interval F]
soup loop pause / resume
soup loop canary <adapter> --traffic 5% [--autoroll-on-regress]
soup loop replay [<iteration-id>]Control plane
The full loop state lives in a single atomic file: .soup/loop.yaml (JSON-formatted via tempfile.mkstemp + os.replace, cwd-containment, direct os.lstat symlink rejection, 1 MiB cap, POSIX 0o600).
Frozen LoopState schema:
served_model/eval_suite/baseline/status(one of{running, paused, stopped})- 6 counters (
traces_harvested/pairs_distilled/runs_started/runs_shipped/runs_rolled_back/runs_skipped_by_budget) - canary + budget + daily-cap metadata
- iteration metadata
to_dict returns a MappingProxyType so callers can't mutate state through it. Only with_status and bumped are sanctioned mutators.
Canary router
soup loop canary <adapter> --traffic 5% --autoroll-on-regress splits traffic via deterministic SHA-256 hash routing:
- 4-byte slice mod
_HASH_MOD = 10_000→ ±0.01% split granularity - Per-request
route(policy, request_key)is pure - Verdict via
BucketStats.verdict()returnsOK/MAJOR/UNKNOWNusing v0.26 Quant-Lobotomy thresholds (5-pct regression band, min 30 samples) BucketStatsis mutable withthreading.Lockfor live request streams- Cross-field validation: canary == stable rejected,
traffic_pct ∈ [0, 100] rollbackreturns a new policy with the canary cleared (sticky-on-rollback)
Budget guardrails
parse_budget_string accepts "50usd" / "50" / "50 USD" (≤ $1M hard cap). Each iteration runs through check_budget → frozen BudgetDecision(proceed, reason, projected_total_usd, runs_today). Check order:
1. daily-cap (UTC-day rollover via reset_daily_counter_if_new_day)
2. estimate-sanity — refuse implausible cost estimates
3. monthly-budget — projected total vs configured cap
Budget-skipped iterations produce no manifests — no half-records to confuse soup loop replay.
Watch daemon
soup loop watch orchestrates 5 stage callables: HarvestFn / TrainFn / GateFn / DeployFn / CostFn. The default stub bindings are no-ops; live wiring to v0.26 traces, the v0.55 eval-gate, and the v0.30 /v1/adapters/activate hot-swap endpoint is operator-driven via WatchConfig.
run_once(state, config)is pure with respect to time (testable).watch(config)is the long-running daemon. InstallsSIGTERM/SIGINThandlers.- Reloads state every iteration so external
pause/resumetakes effect immediately without restarting the daemon. --detachspawnspython -m soup_cli.cli loop watch --foregroundvia argv-listsubprocess.Popen— no shell, no string interpolation.maybe_rollbackfires only on"MAJOR"canary verdict.
Iteration manifests
Every iteration writes a frozen IterationRecord to .soup-loops/<iteration_id>/iteration.json:
iteration_id 20260515T231600-a1f3b2c4
started_at 2026-05-15T23:16:00Z
finished_at 2026-05-15T23:25:12Z
pairs_harvested 89
run_id run-8f3
gate_verdict OK # OK | MAJOR | SKIPPED
canary_verdict OK # OK | MAJOR | UNKNOWN | None
shipped true
rolled_back false
estimated_cost_usd 2.40
notes []new_iteration_id = UTC timestamp + 8-hex uuid.uuid4(). soup loop replay walks the directory in chronological order.
Known limitations
- Stage callbacks default to no-op stubs — live trace ingestion, registry baseline auto-pick, and
/v1/adapters/activaterollout are operator-driven viaWatchConfig. - Soup Can packaging of each iteration is deferred to v0.58.1.
--detachis a single-process subprocess; full daemonization (double-fork, session leader, /dev/null fds) is deferred.
See also
- [Trace-to-preference](/docs/trace-to-preference) — the
HarvestFnhalf of the loop - [Eval-gated training](/docs/eval-gate) — the
GateFnhalf - [Registry](/docs/registry) — where
baselineand shipped runs live - [Diagnose](/docs/diagnose) — pair with
soup train --diagnose-gatefor a second safety net - [Advise](/docs/advise) — what to run *before* a loop iteration