Long Context (v0.49.0)
Three RoPE-scaling strategies for 128k+ context fine-tuning.
YaRN
yaml
training:
rope_scaling:
type: yarn
factor: 8.0
original_max_position_embeddings: 8192
beta_fast: 32
beta_slow: 1Full math kernel ships in v0.49.0.
Dynamic NTK
yaml
training:
rope_scaling:
type: dynamic
factor: 4.0LongLoRA S² shifted-sparse attention
yaml
training:
longlora_s2: true
longlora_s2_group_size: 2048Schema-only in v0.49.0. Forward-pass override lands in v0.49.1.
Llama 3.1 NTK-aware scaling
Set the standard rope_scaling: { type: llama3, ... } and Soup wires up the Llama 3.1-style NTK-aware schedule.
Gates: validate_longlora_compat + is_llama_model reject incompatible architectures at config-load. Pair with [Multipack](/docs/multipack) to keep variable-length samples efficient on long-context runs.