Tool-Calling Fine-Tuning
Soup v0.25.0 adds an end-to-end pipeline for training models that call functions.
Data format
A new tool-calling format was added to soup_cli/data/formats.py:
json
{
"messages": [{"role": "user", "content": "What's the weather in Tokyo?"}],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"parameters": {"type": "object", "properties": {"city": {"type": "string"}}}
}
}
],
"tool_calls": [
{"function": {"name": "get_weather", "arguments": "{\"city\": \"Tokyo\"}"}}
]
}The detector expects messages, tools, and tool_calls to all be present. Tool definitions are embedded in the system message and tool_calls are emitted as assistant turns during normalization.
Generate tool-call training data
bash
soup data generate \
--template tool-calling \
--provider openai \
--count 1000 \
--out tool_calls.jsonlThe synth template lives in soup_cli/data/templates/tool_calling.py and is configurable across API domains (weather, search, database, filesystem).
Train
Use a ready-made recipe:
bash
soup recipes use qwen3-8b-tools
soup trainOr start from a recipe clone and edit:
yaml
base: Qwen/Qwen3-8B
task: sft
data:
train: ./tool_calls.jsonl
format: tool-calling
training:
epochs: 3
lr: 2e-4
lora: { r: 16, alpha: 32 }Evaluation
soup eval custom ships three tool-call scoring functions:
tool_call_match— exact function name + argumentstool_call_name_match— function name onlytool_call_args_subset— partial credit for matching a subset of arguments
Recipes
qwen3-8b-tools— Qwen 3 8B, 4bit, LoRA r=16llama4-scout-tools— Llama 4 Scout 17B, 4bit, LoRA r=16
See also
- [Data formats](/docs/data-formats)
- [Autopilot](/docs/autopilot) —
--goal tool-callingpicks this format automatically