Model Export
Export fine-tuned models to various formats for deployment.
Merge LoRA Adapter
Merge a LoRA adapter with its base model into a standalone model:
bash
# Auto-detect base model from adapter_config.json
soup merge --adapter ./output --output ./merged
# Specify base model and dtype
soup merge --adapter ./output --base meta-llama/Llama-3.1-8B --dtype bfloat16GGUF (llama.cpp / Ollama)
bash
# Export LoRA adapter (auto-merges with base, then converts)
soup export --model ./output --format gguf --quant q4_k_m
# Different quantizations
soup export --model ./output --format gguf --quant q8_0
soup export --model ./output --format gguf --quant f16
# Export a full (already merged) model
soup export --model ./merged --format ggufSupported quantizations: q4_0, q4_k_m, q5_k_m, q8_0, f16, f32
Use with Ollama:
bash
echo 'FROM ./my-model.q4_k_m.gguf' > Modelfile
ollama create my-model -f Modelfile
ollama run my-modelONNX
bash
pip install 'soup-cli[onnx]'
soup export --model ./output --format onnx
soup export --model ./output --format onnx --output ./model_onnxTensorRT-LLM
bash
pip install 'soup-cli[tensorrt]'
soup export --model ./output --format tensorrt
soup export --model ./output --format tensorrt --output ./model_trtAWQ Quantization (v0.23.0+)
bash
pip install 'soup-cli[awq]'
soup export --model ./output --format awq
soup export --model ./output --format awq --output ./model_awqGPTQ Quantization (v0.23.0+)
bash
pip install 'soup-cli[gptq]'
soup export --model ./output --format gptq
soup export --model ./output --format gptq --output ./model_gptqDeploy to Ollama (v0.18.0+)
Deploy a GGUF model directly to your local Ollama instance:
bash
# Deploy a GGUF model
soup deploy ollama --model ./output/model.q4_k_m.gguf --name soup-my-model
# Deploy with system prompt and parameters
soup deploy ollama --model ./model.gguf --name soup-chat \
--system "You are a helpful assistant." \
--template chatml \
--parameter temperature=0.7 \
--parameter top_p=0.9
# Export + deploy in one command
soup export --model ./output --format gguf --deploy ollama
# List Soup-deployed models
soup deploy ollama --list
# Remove a model
soup deploy ollama --remove soup-my-modelAuto-detected chat templates: chatml, llama, mistral, vicuna, zephyr (or auto to infer from soup.yaml).
Push to HuggingFace Hub
bash
soup push --model ./output --repo your-username/my-model
soup push --model ./output --repo your-username/my-model --privateAuto-generates a model card with training details.