Data Mixing Optimizer & Dynamic Curriculum (v0.48.0 — BETA)

soup data mix --optimize

Bayesian search over per-dataset mixture weights against a held-out objective.

bash
soup data mix --optimize \
  --datasets sft.jsonl,preference.jsonl,instruct.jsonl \
  --objective val_loss \
  --output mix.yaml

soup data mix --apply mix.yaml --output ./mixed.jsonl

BudgetTracker caps wall-clock and token budget per search.

Dynamic curriculum learning

yaml
training:
  dynamic_curriculum: true
  curriculum_buckets: 5
  curriculum_objective: val_loss

DynamicCurriculumPolicy re-weights buckets every N steps based on history.jsonl. compute_bucket_weights clamps to a stable simplex.

bash
soup runs curriculum-curve --run-id $LAST

Renders the per-bucket weight curve over training.

Both features are BETA in v0.48.0. Symlink containment hardening lands across all file paths.