Tracker & Eval Pro (v0.43.0)
`--tracker` allowlist
Closed allowlist: wandb | tensorboard | mlflow | swanlab | trackio | none.
soup train --tracker mlflowMutually exclusive with legacy --wandb / --tensorboard via resolve_report_to.
PostHog telemetry (opt-IN)
Off by default. SOUP_TELEMETRY=1 enables hardware-info-only schema (soup_version / command / python / os / arch / duration). No network code in v0.43.0 — PostHog wire-up lands in v0.43.1.
NLG metrics
Pure-Python BLEU + ROUGE-1/2/L + effective_tokens_per_second. Closed allowlist NLG_METRICS = frozenset({"bleu","rouge_1","rouge_2","rouge_l"}).
KL-divergence calibration
soup eval calibrate --before old.json --after new.jsonClassifies the KL delta as OK / MINOR / MAJOR at 0.05 / 0.20 thresholds (mirrors quant-check).
Model Arena (Elo)
soup eval arena add chat-llama@v1 chat-llama@v2K=32 default. 256-model cap, 1M-match cap. Tournament.ratings returns MappingProxyType to prevent mutation.
New benchmarks
ceval, cmmlu, aider_polyglot (live Aider Polyglot runner v0.43.1).
Profiling helpers
memory_snapshot_context— CUDAtorch.cuda.memory._record_memory_historywrapperdetect_anomaly_context—torch.autograd.set_detect_anomalywrappernccl_bandwidth_check— reference table for h100 / a100 / v100 / rtx40-series (OK ≥80% / MINOR ≥50% / MAJOR <50%)
VS Code launch.json writer
soup vscode initWrites .vscode/launch.json with cwd containment + symlink TOCTOU guard.
soup data demo
4-bundle frozen registry: alpaca_demo / sharegpt_demo / dpo_demo / grpo_demo. Atomic copy via sibling temp file + os.replace.
soup data demo alpaca_demo --output ./train.jsonl