Question 1

What's new in the latest Soup CLI release?

Accepted Answer

v0.58.0 'soup loop' is the capstone of the post-training operations layer — a single CLI now runs the full production data flywheel that previously required stitching together NVIDIA's data-flywheel blueprint, Langwatch/Helicone-style trace logging, OpenPipe-style preference distillation, and a bespoke canary deployment service. 'soup loop init' wires up a served model + eval suite + registry baseline + monthly budget + daily run cap. 'soup loop watch' runs harvest → train → eval-gate → canary-deploy → auto-rollback iterations as a long-running daemon with atomic state in '.soup/loop.yaml' and signal-safe pause/resume. 'soup loop canary' splits traffic via deterministic SHA-256 hash routing with ±0.01% granularity and the Quant-Lobotomy verdict thresholds from v0.26. 'soup loop replay' walks the .soup-loops/<iter-id>/iteration.json manifests for forensics. The four releases that built up to this: v0.54 'soup advise' (pre-flight decision engine — classifies task, profiles dataset, ranks PROMPT_ENG vs RAG vs SFT vs DPO vs GRPO with row-count and reasoning-trace signals, optional 100-step probe, cross-project history learning). v0.55 'soup eval design' (TF-IDF salience over your data → goal-conditioned eval suite, canary discovery via farthest-first Jaccard clustering, SHA-256 lock + coverage gap analysis, render_pre_push_hook installs a git regression gate using paired bootstrap CIs). v0.56 'soup diagnose' (6 pure-function failure-mode probes — forgetting / refusal / format / mode-collapse / memorization / contamination — composing into an OK/MINOR/MAJOR report card with embeddable SVG badge, and 'soup train --diagnose-gate' refuses the final save on MAJOR regression). v0.57 'soup adapters' is git for LoRA — diff (per-layer Frobenius + relative drift + SVD effective rank), merge (4 strategies: linear / TIES / DARE / SVD), blame (leave-one-out layer ablation planner with shlex-quoted budget parsing), branch / checkout / branches (named LoRA configs with SHA-pinned drift detection). Totals: 9,193 tests across 202 test files, 19 training tasks, 17 quantization formats, 113 recipes — Python 3.9+, Apache-2.0.

Question 2

What is 'soup loop' and how does the production data flywheel work?

Accepted Answer

'soup loop' (v0.58.0) runs the full production data flywheel from a single CLI: harvest production traces → distill into preference pairs → eval-gated DPO train → canary deploy → auto-rollback on regression. 'soup loop init --eval --baseline registry:// --monthly-budget 50usd --max-runs-per-day 3' creates an atomic '.soup/loop.yaml' state file. 'soup loop watch' runs as a long-running daemon (SIGTERM/SIGINT safe, reloads state every iteration so external pause/resume takes effect immediately). 'soup loop canary --traffic 5% --autoroll-on-regress' splits traffic via deterministic SHA-256 hash routing (±0.01% split granularity, threading-locked BucketStats); the verdict uses the v0.26 Quant-Lobotomy OK/MAJOR thresholds with a 30-sample minimum. 'soup loop pause' / 'soup loop resume' flip atomic status without killing the daemon. 'soup loop replay []' walks per-iteration manifests at '.soup-loops//iteration.json'. Budget guardrails enforce a daily-cap → estimate-sanity → monthly-budget check order; budget-skipped iterations produce no manifests. Every iteration is recorded as a frozen IterationRecord with gate verdict, canary verdict, shipped flag, rolled-back flag, and estimated cost.

Question 3

What does 'soup advise' decide before you fine-tune?

Accepted Answer

'soup advise --goal ""' (v0.54.0) is a pre-flight decision engine that classifies the task (keyword + structural signals — tool_calls → tool_use, '' → reasoning, chat messages → input-extraction) across 7 task categories, profiles the dataset (row_count, avg input/output chars, type-token diversity, label variance, has_chosen_rejected, has_reasoning_traces — capped at 2,000-row sample), and emits a rubric verdict among PROMPT_ENG / RAG / SFT / DPO / GRPO. Heuristics: preference pairs → DPO; reasoning + ≥500 rows → GRPO; <50 rows → PROMPT_ENG; high-variance factual → RAG; default → SFT. Optional '--probe' runs a 100-step LoRA probe to put real numbers on each ROI estimate. '--record' appends an entry to '~/.soup/advise_history.jsonl' so future verdicts learn across projects (atomic, file-locked, 16 MiB cap). 'soup advise explain' prints the full rubric. 'soup advise compare ' compares two candidate datasets. Outputs the literal next command ('soup autopilot --data … --task sft') so the handoff is one paste away.

Question 4

How does 'soup eval design' derive an eval suite from your data?

Accepted Answer

'soup eval design --goal "..."' (v0.55.0) uses TF-IDF salience over your training data (10,000-row DoS-capped subsample) plus goal-keyword dispatch to draft a goal-conditioned eval suite with up to 5 dimensions and a scorer per dimension chosen from {exact_match, regex, judge, rlvr}: 'json' / 'code' / 'math' keywords → rlvr; 'classify' → exact_match; 'extract' → regex; default → judge. 'soup eval discover ' runs greedy farthest-first Jaccard-distance clustering (10,000-row cap) to surface held-out canaries + adjacent-skill probes + 25%-prefix memorization probes. 'soup eval lock ' freezes the suite as a SHA-256-checksummed 'eval_suite' artifact (canonical-JSON for stable hashes; registered in the v0.26 registry alongside the new 'canaries' kind). 'soup eval coverage --task ' does gap analysis against the v0.54.0 task taxonomy (each task has a recommended scorer set — e.g. reasoning → rlvr/judge, format_conversion → regex/rlvr). 'soup eval gate-install --baseline ' writes a '.git/hooks/pre-push' that paired-bootstraps a regression verdict against the baseline (configurable n_samples in [100, 100_000], ci_level in (0, 1), direction-aware metric handling) and blocks the push on regression.

Question 5

What does 'soup diagnose' check in a trained model?

Accepted Answer

'soup diagnose ' (v0.56.0) is a post-training model report card that runs 6 pure-function failure-mode probes and rolls them up into an OK/MINOR/MAJOR verdict (thresholds: ≥0.85 OK, ≥0.60 MINOR, else MAJOR). The probes: (1) forgetting — per-task Δ accuracy with tolerance band, extending the v0.25 forgetting baseline; (2) refusal — advbench / xstest delta over caller-supplied generators, 8,192-row scan cap; (3) format — JSON / regex / tool-call validity over RLVR verifiers with explicit ReDoS probe ('a' × 128); (4) mode_collapse — pairwise n-gram-Jaccard distance over K completions (k ∈ [2,32], ngram_n ∈ [1,8]); (5) memorization — training-prefix echo via partial-prompt continuation (1,000-row scan); (6) contamination — n-gram overlap with public benchmarks (combined-complexity cap rejects when |training| × |benchmark| > 1e9). Outputs JSON + a 6-cell SVG badge ('utils/diagnose/badge.py::render_badge_svg', html-escaped) that you can embed in model cards. '--attach-to-registry ' attaches it as the v0.56.0 'diagnose_report' artifact kind. Pair with 'soup train --diagnose-gate ' to refuse the final save on MAJOR regression (exits typer.Exit(code=2)).

Question 6

What is 'soup adapters' and how is it like git for LoRA?

Accepted Answer

'soup adapters' (v0.57.0) ships 6 subcommands that treat LoRA adapters as first-class versioned objects. 'soup adapters diff ' computes per-layer Frobenius norm + relative drift + SVD effective rank — pure numpy, no torch, '.bin' rejected with an actionable 're-save as safetensors' message. 'soup adapters merge [c...] -o --strategy linear|ties|dare|svd' implements four pure-numpy merge strategies: linear weighted average, TIES (Yadav 2023 — trim by density / elect majority sign / disjoint average), DARE (Yu 2024 — Bernoulli drop with rescale + deterministic seed), and SVD low-rank reconstruction. 'soup adapters blame --dataset --layer --budget 5m --shards 4' plans leave-one-out layer ablation given a wall-clock budget ('60s'/'5m'/'2h' parser, bounds [60s, 24h], 30s minimum per shard); the live runner ships in v0.57.1. 'soup adapters branch -c --base --dataset ' names a LoRA config with SHA-256-pinned config + dataset + base-model hashes ('^[A-Za-z0-9][A-Za-z0-9._\-]{0,127}$' regex, 1 MiB config cap, 1,024-pointer cap, atomic POSIX 0o600). 'soup adapters checkout -o ' refuses to restore on SHA mismatch (drift detection). 'soup adapters branches' lists everything. Env override 'SOUP_BRANCHES_DIR' is containment-checked to $HOME / $CWD / $TMPDIR.

Question 7

What is the Soup CLI registry?

Accepted Answer

The registry is a local SQLite-backed catalog of every fine-tune at ~/.soup/registry.db. Each entry stores the config, eval baseline, and parent lineage. You push runs with 'soup registry push --run-id --name --tag ', visualize the DAG with 'soup history ', diff two versions with 'soup registry diff', and promote a version to prod with a tag. Eval gates can reference 'registry://' as a baseline to catch regressions.

Question 8

What is Soup Autopilot?

Accepted Answer

Autopilot is a zero-config decision engine introduced in Soup CLI v0.25.0. You pass 'soup autopilot --model --data --goal chat' and Soup profiles the dataset, model, and GPU, then picks task, quantization, PEFT rank, batch size, learning rate, epochs, and max length — generating a soup.yaml with every choice justified.

Question 9

Can Soup CLI auto-push checkpoints to HuggingFace Hub?

Accepted Answer

Yes. Soup ships deep HuggingFace Hub integration. 'soup train --push-as user/my-model' uploads every save_steps checkpoint to HF Hub as a 'checkpoint-<N>' branch. Pair with '--hf-resume' to pull the latest branch and keep going after a spot-instance preemption. Set 'HF_ENDPOINT=https://hf.internal.example.com' to route to a self-hosted Hub (SSRF-hardened: loopback HTTP only, RFC1918 IPs rejected). 'soup deploy hf-space' creates a Gradio or Streamlit Space wrapping your model in one command.

Question 10

What is the unified preference dispatcher in Soup CLI v0.40?

Accepted Answer

v0.40.0 'Preference Variety' adds 'task: preference' as a unified dispatcher. Set 'training.preference_loss: dpo|simpo|orpo|ipo|bco' to pick the loss without renaming the task — making hyperparameter sweeps over the loss type itself trivial. Legacy 'task: dpo' / 'task: simpo' / etc. remain first-class. The release also ships BCO (Binary Classifier Optimization) as a new trainer with 'task: bco', two opt-in DPO controls (beta annealing via 'dpo_beta_schedule' + periodic reference-model refresh via 'dpo_ref_regen_epochs'), and a multi-objective preference-loss schema ('preference_loss_weights') that validates 2–5 entries summing to 1. Since v0.53.11 the multi-objective path is fully live: 'attach_weighted_preference_combine' computes per-batch DPO / IPO / SimPO / ORPO terms from the four TRL logprob tensors and combines them via 'combine_losses(terms, weights)', replacing the v0.40.1 primary-loss scaling shim.

Question 11

What quantization formats does Soup CLI support?

Accepted Answer

Seventeen formats as of v0.53.0 'Quant Menu II'. Set via 'training.quantization' in soup.yaml: 4bit (bitsandbytes default), 8bit, none (fp16/bf16/full), gptq, awq, hqq:1bit … hqq:8bit (8 sub-variants), aqlm (extreme 2-bit), eetq (8-bit fast kernel SM75+), mxfp4, fp8 (training Hopper+), and bitnet_1.58 (v0.52.0). On top of that, v0.53.0 added a 14-entry Unsloth Dynamic 2.0 GGUF ladder (UD-Q8_K_XL … UD-IQ1_M), 12-entry IQ family, 10-entry Apple/ARM-friendly GGUF set, KV-cache types ('training.kv_cache_type: q8_0|bf16|f16|fp8' — FP8 Hopper-only), FP8 attention, NVFP4 (Blackwell), explicit 'unsloth_bnb_4bit', BNB double-quant. As of v0.53.1 every writer is live: 'soup merge --save-format 4bit | 4bit_forced' performs a single-shot BNB merge without dequant/requant; 'soup export --format torchao --quant-config' covers Int4WeightOnly / Int8DynActInt4 / Float8DynActFloat8 / NVFP4; 'soup export --format gguf-ud' runs the 3-stage llama.cpp imatrix pipeline. 'soup train' runs check_quant_distributed_compat() at startup to flag FSDP / ZeRO-3 incompatibilities before training begins.

Question 12

What is Multipack in Soup CLI?

Accepted Answer

Multipack (v0.37.0) is Soup's largest single throughput win on chat fine-tuning over uneven-length data. Instead of padding every sample to max_length, it uses First-Fit-Decreasing bin packing to group variable-length samples — eliminating padding waste. Set 'training.multipack: true' in soup.yaml. 18-architecture allowlist (Llama 3.x, Qwen 2/3, Mistral, Gemma 2/3, Phi 3/4, DeepSeek V2/V3, Mixtral, Falcon, StableLM, SmolLM2). Unknown architectures fail loudly at config-load. SFT / Pretrain only on the transformers backend.

Question 13

What LoRA quality features does Soup CLI ship?

Accepted Answer

Five PEFT-surface improvements (v0.39.0). PiSSA initializes LoRA from the SVD of the base weight for faster early convergence. ReLoRA fires every N steps to magnitude-prune the adapter and clear optimizer state — useful for very long runs. Per-pattern rank/alpha lets you map module name patterns to integer ranks. Surgical patches auto-fire for Gemma 4 ClippableLinear and fused-MoE 3-D expert dropout. The 17 built-in templates live as soup_cli/templates/*.yaml with a manifest.json index.

Question 14

How does Soup CLI explain training anomalies?

Accepted Answer

'soup why' (v0.34.0) reads the most recent (or named) run and surfaces plain-English diagnoses with concrete next steps. Detects NaN/Inf loss, plateau (≥30 steps with <0.5% change), divergence (loss > 3× initial), persistent high gradient norm, learning rate outside [1e-6, 5e-3]. Pure rule-based — no model calls. 'soup tui' opens a full-screen Textual dashboard. 'soup train --profile' records a torch.profiler Chrome-trace. Crash bundles auto-write a self-contained .crash JSON with redacted secrets when training fails.

Question 15

Does Soup CLI's inference server support speculative decoding and structured output?

Accepted Answer

Yes (v0.30.0). Speculative decoding via '--speculative-decoding draft-model' or '--auto-spec' (auto-pairs Llama 3.1/3.3/4, Qwen 2.5/3, Mistral Large, Mixtral, DeepSeek V3/R1, Gemma 2/3). Structured output via '--structured-output json --json-schema s.json' or '--structured-output regex --regex-pattern ...'. vLLM prefix caching for RAG/agent workloads via '--prefix-cache'. Dynamic LoRA hot-swap via 'POST /v1/adapters/activate/<name>'. Live continuous-batching dashboard plus '/metrics' endpoint via '--dashboard'. OpenTelemetry tracing via '--trace --trace-endpoint http://localhost:4317'.

Question 16

Does Soup CLI default to safe model loading?

Accepted Answer

Yes (v0.36.0 'Correctness First'). 'soup train', 'chat', 'serve', 'data download', 'eval auto' now require '--trust-remote-code' to load any HF model that ships custom Python (auto_map in config.json). First-party orgs (Meta, Mistral, Qwen, Google, etc.) suppress the warning panel; everything else prints a REMOTE CODE WARNING before loading. Tokenizers without a chat template raise a ValueError instead of silently building garbage strings. Raw Jinja chat-template strings reject filesystem-touching directives (include/import/from/macro/extends) at config-load.

Question 17

What license is Soup CLI under?

Accepted Answer

Soup CLI is Apache-2.0 licensed as of v0.29.0 (previously MIT). Downstream redistributors must retain the NOTICE file per §4(d).

Question 18

Does Soup CLI support Apple Silicon?

Accepted Answer

Yes. Soup v0.25.0 added an MLX backend for M1–M4 Macs. Install with 'pip install soup-cli[mlx]', set 'backend: mlx' in soup.yaml, and run SFT, DPO, or GRPO natively on unified memory without CUDA.

Question 19

How do I fine-tune Llama with Soup CLI?

Accepted Answer

Install with 'pip install soup-cli', then run 'soup recipes use llama3.1-8b-sft' to drop a vetted soup.yaml, point 'data.train' at your dataset, and run 'soup train'. Soup ships 113 recipes covering Llama 3.1/3.2/4 Scout, Qwen 2.5/3, Gemma 3, Mistral, Phi-4, DeepSeek R1/V3.

Question 20

What training methods does Soup CLI support?

Accepted Answer

19 methods: SFT, DPO, GRPO (with RLVR verifiable rewards for math/code/JSON), PPO, KTO, ORPO, SimPO, IPO, BCO, Pretrain, Embedding, Reward Model, and the unified 'preference' dispatcher (set 'training.preference_loss: dpo|simpo|orpo|ipo|bco' to swap the loss without renaming the task). PEFT options include LoRA, QLoRA, DoRA, rsLoRA, VeRA, OLoRA, plus PiSSA SVD init, ReLoRA magnitude-prune cycles, and per-pattern rank/alpha overrides.

Question 21

How do I install Soup CLI?

Accepted Answer

Install from PyPI: 'pip install soup-cli'. Requires Python 3.9+. Optional extras: 'soup-cli[fast]' for Unsloth 2-5x speedup, 'soup-cli[mlx]' for Apple Silicon, 'soup-cli[serve]' for the inference server, 'soup-cli[ui]' for the web dashboard, 'soup-cli[remote]' for fsspec + s3fs / gcsfs / adlfs remote loaders, 'soup-cli[trackers]' for MLflow / SwanLab / Trackio (v0.53.8), 'soup-cli[mix]' for the scikit-optimize Bayesian mix optimizer (v0.53.10), and 'soup-cli[data-pro]' for proper langdetect + Presidio PII (v0.53.10).

Question 22

What models does Soup CLI support?

Accepted Answer

Soup CLI ships 113 recipes spanning text (Llama 3.1/3.2/4 Scout + Maverick, Qwen 2.5, Qwen 3 8B/14B/32B + 30B MoE + 235B-A22B, Gemma 3, Mistral, Mixtral 8x7B/8x22B, Phi-4, DeepSeek R1 + V3), vision (Llama-3.2-Vision 11B/90B, Pixtral 12B, Qwen2-VL 7B/72B, InternVL 2.5, MiniCPM-V 2.6), audio (Qwen2-Audio, SeamlessM4T v2, Whisper-large-v3), reasoning (all 6 DeepSeek-R1-Distill sizes, Qwen3-Coder 30B, Phi-4 reasoning), small/edge (SmolLM2 135M-1.7B, Phi-3.5-mini, Llama-3.2 1B/3B), and domain specialists (BioMistral, Meditron, CodeLlama, Magicoder, Mathstral, Nemotron-4 340B). Works with any of the 340,000+ text-generation models on Hugging Face Hub.

Question 23

What new modalities did Soup add in v0.52 'Modality II'?

Accepted Answer

v0.52.0 adds three new task families plus BitNet quant and reasoning-effort dispatch — all live trainers as of v0.53.2. (1) TTS: 'task: tts' + 'modality: audio_out' with 5 family allowlists (Orpheus, Sesame-CSM, Llasa, Spark, Oute) and per-family emotion vocabularies. (2) Classification heads: 'task: classifier' / 'reranker' / 'cross_encoder' with 'num_labels' + 'label_names' validators — single-label / multi-label / cross-encoder all live via 'ClassifierTrainerWrapper'. (3) Distillation: 'task: distill' + 'teacher_model' + 'distill_divergence' (kl / forward_kl / reverse_kl / js) + 'distill_temperature ∈ [0.05, 100]' — live via 'DistillTrainerWrapper' (independent 'trust_remote_code' per side). Plus 'quantization: bitnet_1.58' (text, sft/pretrain/dpo only — non-MLX), 'reasoning_effort: low|medium|high' for gpt-oss reasoning dispatch (injects '<|reasoning_effort|>' into chat turns), EBFT (structured/strided) + GDPO (standard/length_normalized/margin) loss kernels live, and MoE expert quant ('moe_expert_quant: nf4|int8_rowwise', 'train_router_only').

Question 24

What is GRPO Plus (v0.50) and what variants ship?

Accepted Answer

v0.50.0 'GRPO Plus' brings full unsloth + axolotl parity for GRPO — live since v0.53.11. Pick a variant with 'training.grpo_variant ∈ {standard, gspo, dapo, dr_grpo, bnpo, two_sided, rft}': every non-standard variant now ships a real loss kernel via the 'make_grpo_trainer_variant' LRU-cache factory subclassing GRPOTrainer. Pair 'two_sided' with 'grpo_delta ∈ (0, 1]' for asymmetric clipping. 'grpo_fp16: true' opts into explicit FP16 mixed precision (CPU/MPS/XPU → no AMP; CUDA → bf16 by default, fp16 when flag set). Long-context GRPO ('long_context_grpo'), vision GRPO ('vision_grpo' — known-VLM-base gate added in v0.53.3), async rollout prefetch ('async_grpo_prefetch'), reference-model EMA ('ref_model_ema_alpha' via GRPOStabilityCallback live in v0.53.11), bounded-deque replay buffer ('replay_buffer_size'), truncated-completion masking, and zero-advantage skipping all live. Rollout backends 'art' / 'ruler' / 'nemo_gym' / 'openenv' for multi-turn agentic RL. 'task: prm' (Process Reward Model) live in v0.53.11 with reward head + per-step MSE.

Question 25

Does Soup CLI ship a plugin system?

Accepted Answer

Yes (v0.45.0). 'soup plugins list/install/enable/disable' manages a public plugin / hook system built on the BasePlugin Protocol with PluginSpec registration (name + version validation). v0.45.0 also added an OpenAI ↔ Anthropic Messages converter, server-side tools allowlist with WebSearchConfig, n-gram speculative-decoding schema, a 15-entry external integrations catalog, advanced trainer-plugin allowlist, and a Data Recipe DAG parser with topological sort. As of v0.53.6 'SoupPluginCallback' is live across every HF Trainer (all 13 trainers) — plugin hooks fan in on every step / epoch event, and a single bad plugin can't crash training (exceptions caught + logged). v0.53.7 made the Data Recipe DAG runner ('soup data recipe --execute') live for 6 node kinds (seed / llm_text / code / judge / validator / sampler) with atomic per-node checkpoints + resume, and lazy-imported 6 trainer plugins (grokfast / spectrum / llmcompressor / sonicmoe / cce_plugin / math_verify). v0.46.0 layered Agent Forge on top: parse OpenAPI 3.x / MCP manifests / GraphQL introspection into tool-calling SFT datasets via 'soup agent synth' + 'soup agent train' + 'soup agent eval', plus 'soup deploy autopilot' for picking PEFT + quant + spec-decoding combos against a 10-profile hardware catalog (live measure-mode in v0.53.1).

Question 26

What long-context features ship in Soup v0.49?

Accepted Answer

v0.49.0 adds three RoPE-scaling strategies for 128k+ context fine-tuning. YaRN (yet-another-RoPE-extension) with 4 schema fields and the full math kernel. Dynamic NTK scaling. LongLoRA S² shifted-sparse attention — schema in v0.49.0, architecture allowlist widened to Llama / CodeLlama / Mistral / Qwen / Phi in v0.53.4 (Mixtral intentionally excluded; FlashAttention v3 + LongLoRA combo rejected), and live forward-pass override via 'LongLoRAForwardOverride' context manager in v0.53.11 (monkey-patches each attention forward with restore-on-exit + idempotent cleanup). Llama 3.1 NTK-aware scaling + llama3 RoPE auto-detect in 'apply_long_context_config'. v0.53.4 also lit up LLaMA Pro live block expansion ('expand_layers' clones last N decoder blocks with zero-init residuals; 'apply_llama_pro_freeze' companion). All gated to compatible architectures via 'validate_longlora_compat' / 'is_llama_model'. Pair with 'task: pretrain' or 'task: sft' and Multipack (v0.37.0) to keep variable-length samples efficient on long-context runs.

Question 27

What is Soup Data Forge?

Accepted Answer

v0.47.0 'Data Forge & Quality Moat'. 'soup data forge' is a synthetic-data pipeline: chunk → judge → active-prune → write JSONL with provenance. 'soup data score' runs a composite data-quality scorecard combining PII detection, toxicity scoring, language detection, educational-value scoring, and benchmark decontamination. Each filter is also addressable individually: 'soup data decontaminate' (n-gram benchmark overlap), 'soup data toxicity', 'soup data langdetect', 'soup data pii' (email / phone / SSN / credit-card patterns), 'soup data educational'. Toxicity uses a keyword baseline today; Llama-Guard backend lands in v0.47.1.

Question 28

Can I use experiment trackers other than W&B and TensorBoard?

Accepted Answer

Yes (v0.43.0 'Tracker & Eval Pro'). The '--tracker' flag accepts a closed allowlist: wandb, tensorboard, mlflow, swanlab, trackio, none. It is mutually exclusive with the legacy '--wandb' / '--tensorboard' flags via 'resolve_report_to'. v0.43.0 also ships pure-Python BLEU + ROUGE-1/2/L + effective_tokens_per_second NLG metrics, KL-divergence calibration with OK/MINOR/MAJOR classification (mirroring quant-check thresholds), and an Elo Tournament arena ('eval/arena.py' — 256-model cap, 1M-match cap). 'soup data demo' bundles 4 example JSONL fixtures (alpaca / sharegpt / dpo / grpo) for instant onboarding.

Question 29

Which optimizers does Soup CLI support?

Accepted Answer

33+ optimizers across HF-native, bitsandbytes-backed, and the v0.41.0 'Optimizer Zoo' additions: BAdam, APOLLO (apollo_adamw), Adam-mini, lomo / adalomo, grokadamw, schedule_free_adamw / schedule_free_sgd, muon, dion, came_pytorch, plus TorchAO ao_adamw_{fp8,4bit,8bit}. Set 'training.optimizer' to any name on the closed 'SUPPORTED_OPTIMIZERS' allowlist — invalid names fail at config-load with an actionable message. v0.41.0 also adds per-module LR overrides via 'training.lr_groups' (≤32 entries; pattern → LR map with first-match-wins routing), LoftQ quantization-aware LoRA init ('init_strategy: loftq' + 'loftq_iter' + 'loftq_bits'), LLaMA Pro block expansion (schema v0.41.0, live block-clone with zero-init residuals + 'apply_llama_pro_freeze' in v0.53.4), and friendly 'load_in_8bit' / 'load_in_16bit' aliases.

Question 30

What ModelScope / Modelers hub support does Soup CLI ship?

Accepted Answer

Set 'training.hub: hf | modelscope | modelers' (v0.51.0 schema; live dispatcher in v0.53.8). The dispatcher pre-fetches the model before training starts. v0.53.10 plumbed the '--hub' flag through 'soup chat', 'soup serve', 'soup infer', 'soup merge', 'soup export', 'soup push', and 'soup data download'. Endpoint validation has SSRF parity with v0.29.0 HF: scheme allowlist, null-byte rejection, RFC1918 / link-local / cloud-metadata IPs blocked, plain HTTP only for loopback. Soup v0.53.8 also lit up fsspec live remote loaders for the v0.42.0 data-URI schema: 's3://', 'gs://', 'gcs://', 'az://', 'abfs://', 'abfss://', 'oci://' all resolve to live downloaders.

Question 31

Does Soup serve an Anthropic-compatible API and tool endpoints?

Accepted Answer

Yes. 'POST /v1/messages' speaks the Anthropic Messages shape (transformers backend in v0.53.6, vLLM backend + Anthropic-shape SSE streaming in v0.53.7). Plus three tool endpoints introduced in v0.53.6 and lit up in v0.53.7: 'POST /v1/tools/python' runs in the RLVR sandbox (5s timeout + 512MB RLIMIT_AS/CPU on POSIX, ephemeral cwd, socket patch); 'POST /v1/tools/web_search' is bearer-authed with an allowlisted provider list; 'POST /v1/tools/bash' returns 501 (deferred — current child-process isolation is not strong enough; container/namespace work pending). n-gram speculative decoding is plumbed through 'prompt_lookup_num_tokens' so you get zero-config draft acceleration without a draft model.

Question 32

What did Soup ship in v0.53.9 for the Web UI, bench, and standalone CLIs?

Accepted Answer

Four shipped pieces. (1) Live training stream: 'GET /api/train/stream' is a Server-Sent Events feed for every 'on_log' / 'on_step_end' / 'on_epoch_end' event, replacing the polling dashboard. 'GET /api/tool-outputs' polls recent tool-call records (paired with the Web UI Tool Outputs sidebar panel added in v0.53.10). (2) Phone-scannable Web UI: 'soup ui --public' binds 0.0.0.0 and prints a phone-scannable QR encoding the LAN URL + bearer token; '--auth-token' overrides the auto-generated token; token rotation is thread-safe. (3) Standalone CLIs: 'soup tokenizer train --input corpus.jsonl --vocab-size 32000 --output tok/' (new BPE trainer), 'soup bench --p50 --p95' for per-prompt tail-latency percentiles, 'soup bench --backend auto' to auto-detect transformers vs mlx. (4) Reasoning parser: 'soup serve --reasoning-parser deepseek-r1 | qwen3 | phi4 | openthinker' strips '...' blocks from streamed responses while still recording them to the tool-outputs sidebar.

Question 33

Can I migrate from LLaMA-Factory, Axolotl, or Unsloth to Soup CLI?

Accepted Answer

Yes. Run 'soup migrate --from llamafactory config.yaml', 'soup migrate --from axolotl config.yml', or 'soup migrate --from unsloth notebook.ipynb' to automatically convert your existing training config or notebook to Soup format.

`soup eval design` — derive evals from your training data

`soup eval design`

`soup eval discover` — canaries

`soup eval lock` — pin the suite

`soup eval coverage` — gap analysis

`soup eval gate-install` — git regression gate

See also