Apple Silicon MLX Backend
Soup v0.25.0 adds a native MLX training backend for M1–M4 Macs. SFT, DPO, and GRPO run on unified memory without CUDA, Rosetta, or x86 emulation.
Install
bash
pip install 'soup-cli[mlx]'This pulls mlx>=0.20 and mlx-lm>=0.20 as optional dependencies.
Enable the backend
Set backend: mlx in soup.yaml, or pass --backend mlx to soup train:
yaml
base: mlx-community/Llama-3.1-8B-Instruct-4bit
backend: mlx
task: sft
data:
train: data.jsonl
format: chatml
training:
epochs: 3
lr: 1e-4
batch_size: 2
lora:
r: 16
alpha: 32Supported tasks
| Task | MLX | Notes |
|---|:---:|---|
| sft | ✓ | LoRA and QLoRA (mlx-community 4bit models) |
| dpo | ✓ | Frozen reference model |
| grpo | ✓ | Works with built-in reward functions |
| ppo | – | Not yet — use CUDA backend |
| reward_model | – | Not yet |
| embedding | – | Not yet |
| pretrain | – | Not yet |
Diagnostics
bash
soup doctorOn Apple Silicon this reports MLX version, chip name, unified memory, and a recommended batch size.
MLX-native recipes
llama3.1-8b-sft-mlx— M2+ 16GBqwen3-8b-sft-mlx— M2+ 16GBgemma3-9b-sft-mlx— M2+ 16GB
Use any of them with soup recipes use <name>.
Limitations
- Single-device only — no distributed MLX training.
- Base models must be MLX-format (typically from the
mlx-communityHF org). bitsandbytesis unused on this backend — quantization comes from the MLX model itself.- Unsloth backend is not MLX-compatible (they're separate execution paths).