Configuration

Soup uses a single YAML config file for all settings. Run soup init to generate one.

Config Structure

yaml
base: meta-llama/Llama-3.1-8B-Instruct   # HuggingFace model ID (required)
task: sft                                   # Training task
# backend: unsloth                          # 2-5x faster (pip install 'soup-cli[fast]')
# modality: text                            # text, vision, or audio

data:
  train: ./data/train.jsonl                 # Path to training data
  format: alpaca                            # Data format (auto-detected if omitted)
  val_split: 0.1                            # Validation split ratio
  max_length: 2048                          # Max sequence length (64-1048576)
  # image_dir: ./data/images               # For vision modality
  # audio_dir: ./data/audio                # For audio modality

training:
  epochs: 3
  lr: 2e-5
  batch_size: auto                          # auto or integer
  quantization: 4bit                        # none, 4bit, 8bit
  # quantization_aware: false              # Enable QAT
  # optimizer: adamw_8bit
  # gradient_checkpointing: true
  lora:
    r: 64
    alpha: 16
    dropout: 0.05
    # target_modules: auto                 # Auto-detected per model
    # use_dora: false                      # Weight-decomposed LoRA

output: ./output

Templates

Soup includes 15 built-in templates:

bash
soup init --template chat          # Conversational fine-tune
soup init --template code          # Code generation
soup init --template medical       # Domain expert
soup init --template reasoning     # GRPO reasoning (DeepSeek-R1 style)
soup init --template vision        # Vision/multimodal fine-tune
soup init --template audio         # Audio/speech fine-tune
soup init --template kto           # KTO unpaired preference
soup init --template orpo          # ORPO (no reference model)
soup init --template simpo         # SimPO length-normalized preference
soup init --template ipo           # IPO regularized preference
soup init --template rlhf          # Full RLHF pipeline (SFT -> RM -> PPO)
soup init --template pretrain      # Continued pre-training on raw text
soup init --template moe           # MoE fine-tuning (ScatterMoE LoRA)
soup init --template longcontext   # 128k+ context fine-tuning
soup init --template embedding     # Sentence embedding fine-tuning

Task-Specific Config Keys

KeyTasksDescription
dpo_betaDPODPO beta parameter
kto_betaKTOKTO beta parameter
orpo_betaORPOORPO beta parameter
simpo_gammaSimPOSimPO gamma parameter
cpo_alphaSimPOCPO alpha parameter
ipo_tauIPOIPO tau parameter
grpo_betaGRPOGRPO beta parameter
num_generationsGRPONumber of generations per prompt
reward_fnGRPO, PPOReward function (accuracy/format/path.py)
reward_modelPPOPath to reward model
ppo_epochsPPOPPO training epochs
ppo_clip_ratioPPOPPO clip ratio
ppo_kl_penaltyPPOPPO KL penalty
loraplus_lr_ratioAllLoRA+ learning rate ratio
use_galoreAllEnable GaLore optimizer
moe_loraAllTarget MoE expert layers
moe_aux_loss_coeffAllRouter load-balancing loss
use_ligerAllLiger Kernel fused ops
use_flash_attnAllFlashAttention v2/v3
use_ring_attentionAllRing FlashAttention
rope_scaling_typeAllRoPE scaling (linear/dynamic/yarn/longrope)
neftune_alphaAllNEFTune noisy embeddings (0-50)
packingSFTSample packing for efficiency
curriculumAllEnable curriculum learning
curriculum_metricAllSort metric (length)
curriculum_bucketsAllNumber of difficulty buckets (1-20)
loss_watchdogAllEnable loss watchdog
loss_watchdog_thresholdAllLoss spike threshold (≤100)
loss_watchdog_patienceAllPatience before stopping (≤1000)
freeze_layersAllFreeze bottom N layers (≤1000)
freeze_ratioAllFreeze ratio of layers
embedding_lossEmbeddingLoss function
embedding_poolingEmbeddingPooling strategy