Retrieval-Augmented FT & Activation Steering (v0.62.0)
Five surfaces wired together: RAFT data format, RA-DIT two-stage pipeline, citation-faithful fine-tuning, activation steering vectors (CAA / ITI / RepE), and the GRACE codebook.
`data.format: raft` — Stanford RAFT data
data:
train: ./raft_data.jsonl
format: raft
max_length: 4096Per-row schema:
{
"query": "What is the capital of France?",
"golden_doc": "France is a country in Europe...",
"distractor_docs": ["Spain is...", "Germany is..."],
"answer": "Paris"
}The model learns to use retrieved context (golden + distractors) to answer queries — context-aware completion, not isolated Q&A.
`training.ra_dit_stage` — two-stage pipeline
RA-DIT (Retrieval-Augmented Dual Instruction Tuning) chains:
# Stage 1: contrastive retriever
task: embedding
training:
ra_dit_stage: retriever
ra_dit_retriever_model: sentence-transformers/all-MiniLM-L6-v2# Stage 2: RAFT generator
task: sft
data:
format: raft
training:
ra_dit_stage: generatorBoth stages reuse the existing trainer wrappers. Live orchestration (a single soup train chaining both) ships in v0.62.1.
`training.citation_faithful` — enforce source attribution
data:
format: raft
training:
citation_faithful: true
citation_style: bracket # bracket | inline | footnote
citation_recall_threshold: 0.85When citation_faithful=true, the trainer masks the loss to emphasize citation spans. The model learns to cite sources from retrieved docs. The final save is refused if citation recall < threshold.
`soup steer` — activation steering vectors
soup steer train --base meta-llama/Llama-3.1-8B-Instruct \
--method caa --name safety-v1 \
--pairs ./safety_pairs.jsonl --layer 16
soup steer apply --name safety-v1 --strength 1.5
soup steer listThree methods:
- CAA (Contrastive Activation Addition) — add a learned vector to the residual stream.
- ITI (Inference-Time Intervention) — shift specific attention heads.
- RepE (Representation Engineering) — PCA-based direction extraction.
|strength| ≤ 10 is enforced. Vectors register as the new steering_vector artifact kind in the v0.26 Registry.
Pairs JSONL:
{"positive": "You are a helpful AI.", "negative": "You are a harmful AI."}`training.grace_codebook` — discrete latent codebook
training:
grace_codebook: true
grace_codebook_size: 1024GRACE (Generalization-Regularized Adaptive Codebook Embedding) discretizes the latent activation space into a learned codebook. Reduces overfitting on small datasets; useful for thousands of sequential edits without norm-blowup. Schema-only in v0.62.0; live in v0.62.1.
New recipes
Three RAFT-style recipes shipped:
raft-llama3-8b— RAFT SFT generator on Llama 3.1 8B.ra-dit-retriever— sentence-transformer contrastive stage.ra-dit-llama3-8b— full RA-DIT stage-2 generator.
Numbers
+215 new tests in v0.62.0 (9571 → 9786). Security: 0 CRITICAL, 0 HIGH, 4 MEDIUM, 11 LOW.
See also
- [Trace ecosystem](/docs/trace-ecosystem) — v0.63
soup ingestproduces the JSONL that feeds the RAFT generator. - [Unlearning](/docs/unlearning) — surgical edits as a harder counterpart to soft steering.
- [Data Forge](/docs/data-forge) — quality moat for the source docs that become golden / distractor pairs.