FAQ

General

What models does Soup support?

Any HuggingFace-compatible model. Popular choices include Llama, Mistral, Qwen, Phi, and Gemma families. Vision models (LLaMA-3.2-Vision, Qwen2-VL) and audio models (Qwen2-Audio) are also supported.

Do I need a GPU?

A CUDA-compatible GPU is strongly recommended. Soup supports 4-bit quantization to fit larger models on smaller GPUs. A 24GB GPU (RTX 3090/4090) can fine-tune 7B-8B models comfortably.

How is Soup different from other fine-tuning tools?

Soup is CLI-first and opinionated. One command to train, one YAML to configure. It wraps best practices (Unsloth, FlashAttention, optimal hyperparameters) so you don't have to research them. 23 training methods (text / vision / audio / TTS / classifier / distill / preference / MoLE routing), 17 quantization formats, multimodal support, full pipeline from training to deployment. Plus, you can migrate from LLaMA-Factory, Axolotl, or Unsloth in one command.

What version is current?

Soup CLI v0.70.0 — "loop hardening". Six surfaces that protect the training loop from the failure modes that cost a real GPU-hour: soup train --reward-hack-detector info_rm|rm_ensemble (InfoRM cluster-separation + RM-ensemble divergence, halt on HACK), --uld-strategy wasserstein|topk_align (cross-tokenizer Universal Logit Distillation, no shared vocab needed), --minillm-enabled (MiniLLM reverse-KL with 3 stability tricks bundled), --rl-checkpoint-save-every-steps (mid-epoch PPO/GRPO ckpt — TorchTune still punts this), soup iterative-dpo --rounds N (sample → score → re-pair → retrain driver), and --echo-trap-enabled (RAGEN multi-turn n-gram repetition detector). Sits on top of v0.69 "data engineering pro" (soup build dbt DAG + soup expect Great Expectations + brain-rot detector + Persona-Hub / Magpie synth), v0.68 "anti-trend insurance" (soup compile DSPy/GEPA/TextGrad + soup distill-prompt + soup apple-adapter for Apple FoundationModels + soup local-rl personal feedback flywheel), v0.67 adapter lifecycle finish (CMA-ES merge, VeRA bank, MoLE, soup adapters pr / bisect, soup lock), v0.66 post-train x-rays, v0.65 eval depth, and v0.58 soup loop. 11,824 tests across 268 test files. Apache-2.0 license since v0.29.0. Check with soup version --full.

Can I migrate from LLaMA-Factory / Axolotl / Unsloth?

Yes: soup migrate --from llamafactory config.yaml converts your existing config automatically. Supports LLaMA-Factory YAML, Axolotl YAML, and Unsloth Jupyter notebooks. Use --dry-run to preview without writing.

Are there ready-made configs?

Yes. Soup ships 116 recipes spanning Llama 3.1/3.2/4 (Scout + Maverick), Qwen 2.5/3 (incl. 30B and 235B MoE), QwQ-32B, QVQ-72B, Gemma 3, Mistral, Phi-4, DeepSeek R1/V3 + all 6 R1-Distill sizes, plus v0.51 additions (GPT-OSS 20B/120B, GLM 4.6/5, Kimi K2 / K2-Thinking, MiniMax M2, Granite 4, LFM2, Cogito v2, Mistral Small 3 / Medium 3.5, Magistral / Devstral / Ministral, Baichuan 2), vision (Pixtral, Qwen2-VL, InternVL 3.5, LLaVA-Next, MiniCPM-V, Qwen-Image, DeepSeek-OCR, Paddle-OCR-VL), audio (Qwen2-Audio, Whisper-large-v3, Voxtral, SeamlessM4T-v2), TTS (Orpheus, Sesame-CSM, Llasa, Spark, Oute — v0.52), BitNet (Falcon-E — v0.52), edge (SmolLM2 135M-1.7B, Phi-3.5-mini, LFM2), domain specialists (BioMistral, Meditron, CodeLlama, Magicoder, Mathstral, MedGemma, EmbeddingGemma), plus v0.62 RAG (raft-llama3-8b, ra-dit-retriever, ra-dit-llama3-8b), MLX-native, and multi-GPU (llama3-70b-fsdp2, qwen3-32b-zeropp, deepseek-v3-pipeline). Run soup recipes list to browse, soup recipes search llama to filter, and soup recipes use llama3.1-8b-sft to start training instantly.

Training

How long does training take?

Depends on model size, dataset, and hardware. A 1B model with 10K samples on an RTX 4090 typically takes 15-30 minutes with SFT. With Unsloth backend, 2-5x faster.

Can I resume training from a checkpoint?

Yes: soup train --config soup.yaml --resume auto (latest checkpoint) or --resume ./output/checkpoint-500 (specific checkpoint).

Does Soup support multi-GPU?

Yes. v0.27.0 added topology-aware soup train --gpus N launch, ZeRO++ (quantized weights + grads), FSDP2 + torch.compile, pipeline parallelism (parallelism: pipeline + pipeline_stages), and the DeepSpeed-MII serving backend. See [Multi-GPU Mastery](/docs/multi-gpu).

What training methods are available?

23 methods: SFT, DPO, GRPO, PPO, KTO, ORPO, SimPO, IPO, BCO (v0.40), Pretrain, Embedding, Reward Model, the unified preference dispatcher (v0.40 — set training.preference_loss: dpo|simpo|orpo|ipo|bco to swap loss without renaming the task), PRM (v0.50 Process Reward Model), TTS (v0.52 — Orpheus / Sesame-CSM / Llasa / Spark / Oute), Classifier / Reranker / Cross-Encoder (v0.52), Distill (v0.52 — kl / forward_kl / reverse_kl / js divergences), Unlearn (v0.61 — NPO / SimNPO / RMU), and MoE LoRA Routing (v0.67 — MoLE per-token gating over 2..64 task adapters). Plus soup edit set (v0.61 — ROME / MEMIT / AlphaEdit) and soup steer train (v0.62 — CAA / ITI / RepE) as inference-time / weight-surgery surfaces.

Can I auto-push checkpoints to HuggingFace during training?

Yes — soup train --push-as user/my-model uploads every save_steps checkpoint to HF Hub as a checkpoint-<N> branch. Pair with --hf-resume to pull the latest branch and keep going after a spot-instance preemption. Set HF_ENDPOINT=https://hf.internal.example.com to target a self-hosted Hub. See [HF Hub integration](/docs/hf-hub-integration).

Can I train faster on large-vocab models?

Yes — v0.28.0 adds Cut Cross-Entropy (use_cut_ce: true), which avoids materialising the full [seq × vocab] logits tensor. Best on Llama 3 / Qwen 3 (vocab ≥ 128k). Install with pip install 'soup-cli[cce]'. v0.28 also ships FP8 training on Hopper+ GPUs, tiered gradient checkpointing, kernel auto-composition, cross-document attention masking, and CPU/disk activation offloading. See [Training speed & memory](/docs/training-speed-memory).

Data

What data formats are supported?

18 formats: Alpaca, ShareGPT, ChatML, DPO, KTO, LLaVA, ShareGPT4V, Plaintext, Embedding, Audio, Tool-calling, Auto, PRM, Pre-tokenized, Input-output, Video, Multimodal (v0.42), and RAFT (v0.62 Retrieval-Augmented Fine-Tuning — query + golden_doc + distractors + answer). Format is auto-detected from the first row. Local paths or remote URIs (s3 / gs / az / abfs / oci). Use soup data convert to switch between formats. v0.69 adds soup build (dbt-shaped DAG of dataset transforms with incremental materialisation), soup expect (Great Expectations suite for chat data), and soup data brain-rot (AI-slop detector — refuses to train on clickbait).

How much data do I need?

For SFT, 1K-10K high-quality samples is a good starting point. Quality matters more than quantity. Use soup data filter to check data quality.

Can I generate synthetic data?

Yes: soup data generate --prompt "Create math problems" --count 100. Supports OpenAI, Ollama, Anthropic Claude, vLLM, and custom servers. Includes domain templates (code, conversation, QA, preference, reasoning) and a full quality pipeline.

Deployment

How do I serve my model?

soup serve --model ./output --backend vllm for production. The API is OpenAI-compatible.

Can I export to llama.cpp / Ollama?

Yes: soup export --model ./output --format gguf --quant q4_k_m

What export formats are supported?

GGUF (llama.cpp/Ollama), ONNX (cross-platform), TensorRT-LLM (NVIDIA optimized), AWQ, and GPTQ (quantized deployment).

Troubleshooting

How do I see full error messages?

Use soup --verbose <command> for full tracebacks.

How do I check my environment?

Run soup doctor to check Python version, GPU availability, dependency versions, and get fix suggestions.