FAQ

General

What models does Soup support?

Any HuggingFace-compatible model. Popular choices include Llama, Mistral, Qwen, Phi, and Gemma families. Vision models (LLaMA-3.2-Vision, Qwen2-VL) and audio models (Qwen2-Audio) are also supported.

Do I need a GPU?

A CUDA-compatible GPU is strongly recommended. Soup supports 4-bit quantization to fit larger models on smaller GPUs. A 24GB GPU (RTX 3090/4090) can fine-tune 7B-8B models comfortably.

How is Soup different from other fine-tuning tools?

Soup is CLI-first and opinionated. One command to train, one YAML to configure. It wraps best practices (Unsloth, FlashAttention, optimal hyperparameters) so you don't have to research them. 11 training methods, multimodal support, full pipeline from training to deployment. Plus, you can migrate from LLaMA-Factory, Axolotl, or Unsloth in one command.

What version is current?

Soup CLI v0.24.0. Check with soup version --full.

Can I migrate from LLaMA-Factory / Axolotl / Unsloth?

Yes: soup migrate --from llamafactory config.yaml converts your existing config automatically. Supports LLaMA-Factory YAML, Axolotl YAML, and Unsloth Jupyter notebooks. Use --dry-run to preview without writing.

Are there ready-made configs?

Yes, Soup includes 30 recipes for popular models (Llama 3, Qwen 3, Gemma 3, DeepSeek R1, etc.). Run soup recipes list to browse, soup recipes search llama to find, and soup recipes use llama3-sft to start training instantly.

Training

How long does training take?

Depends on model size, dataset, and hardware. A 1B model with 10K samples on an RTX 4090 typically takes 15-30 minutes with SFT. With Unsloth backend, 2-5x faster.

Can I resume training from a checkpoint?

Yes: soup train --config soup.yaml --resume auto (latest checkpoint) or --resume ./output/checkpoint-500 (specific checkpoint).

Does Soup support multi-GPU?

Yes, via DeepSpeed (ZeRO stages 2-3) and FSDP2. See the Backends & Performance docs.

What training methods are available?

11 methods: SFT, DPO, GRPO, PPO, KTO, ORPO, SimPO, IPO, Pretrain, Embedding, and Reward Model.

Data

What data formats are supported?

Alpaca, ShareGPT, ChatML, DPO, KTO, LLaVA, ShareGPT4V, Plaintext, Embedding, and Audio. Format is auto-detected from the first row. Use soup data convert to switch between formats.

How much data do I need?

For SFT, 1K-10K high-quality samples is a good starting point. Quality matters more than quantity. Use soup data filter to check data quality.

Can I generate synthetic data?

Yes: soup data generate --prompt "Create math problems" --count 100. Supports OpenAI, Ollama, Anthropic Claude, vLLM, and custom servers. Includes domain templates (code, conversation, QA, preference, reasoning) and a full quality pipeline.

Deployment

How do I serve my model?

soup serve --model ./output --backend vllm for production. The API is OpenAI-compatible.

Can I export to llama.cpp / Ollama?

Yes: soup export --model ./output --format gguf --quant q4_k_m

What export formats are supported?

GGUF (llama.cpp/Ollama), ONNX (cross-platform), TensorRT-LLM (NVIDIA optimized), AWQ, and GPTQ (quantized deployment).

Troubleshooting

How do I see full error messages?

Use soup --verbose <command> for full tracebacks.

How do I check my environment?

Run soup doctor to check Python version, GPU availability, dependency versions, and get fix suggestions.