Autopilot
soup autopilot is the zero-config entry point added in v0.25.0. You pass a model, a dataset, and a goal. Autopilot profiles the dataset, model, and GPU, then emits a soup.yaml with every hyperparameter chosen and justified.
Quick start
soup autopilot \
--model meta-llama/Llama-3.1-8B \
--data chats.jsonl \
--goal chat \
--gpu-budget 24GB \
--time-budget 4hInputs
| Flag | Meaning |
|---|---|
--model | Any HuggingFace repo or local path |
--data | JSONL dataset (alpaca / sharegpt / chatml / dpo / kto / tool-calling) |
--goal | chat · reasoning · code · classification · tool-calling · alignment · domain-adapt |
--gpu-budget | VRAM budget (defaults to detected GPU) |
--time-budget | Maximum wall-clock time |
--output | Config path (default: ./soup.yaml) |
--dry-run | Print decisions without writing soup.yaml |
--run | Run soup train immediately after confirmation |
What Autopilot decides
1. Task — maps goal → SFT / DPO / GRPO / KTO / pretrain, including the tool-calling format where appropriate.
2. Quantization — chooses none, 8bit, or 4bit based on model_size × 1.2 vs VRAM.
3. PEFT — picks LoRA r=8/16/32, DoRA, or VeRA based on dataset size and VRAM headroom.
4. Batch size × grad_accum — targets effective batch 16–32, computed from VRAM headroom.
5. Learning rate — scales with rank and quantization (e.g. LoRA r=16 → 2e-4, 4bit → ×0.8).
6. Epochs — 5 → 3 → 2 → 1 depending on dataset size.
7. `max_length` — ceil(p95 × 1.1) clamped to model context.
8. Perf flags — auto-enables FlashAttention v2 on Ampere+, Liger Kernel on modern Llama arch, gradient_checkpointing for long context, MLX backend on Apple Silicon.
9. Training intelligence — turns on forgetting detection and checkpoint intelligence by default.
Every decision is printed with a short reason in a Rich panel, so you can see *why* r=16 beat r=32 or why quantization dropped to 4bit.
Example output
╭─ Autopilot Decisions ─────────────────────────────────╮
│ ✓ Quantization: 4bit │
│ reason: 8B model needs ~5GB in 4bit, leaves 19GB │
│ │
│ ✓ PEFT: LoRA r=16, alpha=32 │
│ reason: 15k samples — r=16 balances capacity / │
│ overfitting risk │
│ │
│ ✓ Batch size: 4 × grad_accum 8 = effective 32 │
│ ✓ Learning rate: 2e-4 │
│ ✓ Epochs: 2 │
│ ✓ Max length: 2048 (p95=1820 + 10% margin) │
│ ✓ Flash Attention v2 (Ampere GPU) │
│ ✓ Liger Kernel (modern Llama arch) │
│ ✓ Forgetting detection (mini_mmlu) │
│ ✓ Checkpoint intelligence (judge metric) │
│ │
│ Estimated time: 1h 42min │
│ Estimated VRAM: 18.2GB / 24GB ✓ │
╰────────────────────────────────────────────────────────╯Safety
- Dataset and output paths are resolved and constrained to the working directory (no path traversal).
- GPU budget is bounded to 1GB–1TB; time budget to 60s–30 days.
- Model names are validated against HuggingFace Hub naming rules.
- Model code is never executed during analysis — Autopilot uses HF Hub metadata only.
See also
- [Training methods](/docs/training)
- [Backends](/docs/backends) — including MLX
- [Training intelligence](/docs/training-intelligence) — forgetting detection + checkpoint quality
- [Recipes](/docs/recipes) — start here if you don't need Autopilot