Model Export

Export fine-tuned models to various formats for deployment.

Merge LoRA Adapter

Merge a LoRA adapter with its base model into a standalone model:

bash
# Auto-detect base model from adapter_config.json
soup merge --adapter ./output --output ./merged

# Specify base model and dtype
soup merge --adapter ./output --base meta-llama/Llama-3.1-8B --dtype bfloat16

GGUF (llama.cpp / Ollama)

bash
# Export LoRA adapter (auto-merges with base, then converts)
soup export --model ./output --format gguf --quant q4_k_m

# Different quantizations
soup export --model ./output --format gguf --quant q8_0
soup export --model ./output --format gguf --quant f16

# Export a full (already merged) model
soup export --model ./merged --format gguf

Supported quantizations: q4_0, q4_k_m, q5_k_m, q8_0, f16, f32

Use with Ollama:

bash
echo 'FROM ./my-model.q4_k_m.gguf' > Modelfile
ollama create my-model -f Modelfile
ollama run my-model

ONNX

bash
pip install 'soup-cli[onnx]'
soup export --model ./output --format onnx
soup export --model ./output --format onnx --output ./model_onnx

TensorRT-LLM

bash
pip install 'soup-cli[tensorrt]'
soup export --model ./output --format tensorrt
soup export --model ./output --format tensorrt --output ./model_trt

AWQ Quantization (v0.23.0+)

bash
pip install 'soup-cli[awq]'
soup export --model ./output --format awq
soup export --model ./output --format awq --output ./model_awq

GPTQ Quantization (v0.23.0+)

bash
pip install 'soup-cli[gptq]'
soup export --model ./output --format gptq
soup export --model ./output --format gptq --output ./model_gptq

Deploy to Ollama (v0.18.0+)

Deploy a GGUF model directly to your local Ollama instance:

bash
# Deploy a GGUF model
soup deploy ollama --model ./output/model.q4_k_m.gguf --name soup-my-model

# Deploy with system prompt and parameters
soup deploy ollama --model ./model.gguf --name soup-chat \
  --system "You are a helpful assistant." \
  --template chatml \
  --parameter temperature=0.7 \
  --parameter top_p=0.9

# Export + deploy in one command
soup export --model ./output --format gguf --deploy ollama

# List Soup-deployed models
soup deploy ollama --list

# Remove a model
soup deploy ollama --remove soup-my-model

Auto-detected chat templates: chatml, llama, mistral, vicuna, zephyr (or auto to infer from soup.yaml).

Push to HuggingFace Hub

bash
soup push --model ./output --repo your-username/my-model
soup push --model ./output --repo your-username/my-model --private

Auto-generates a model card with training details.