Unlearning & Knowledge Edit (v0.61.0)
GDPR right-to-be-forgotten as a first-class task. Surgical fact patches without retraining.
`task: unlearn` — remove specific data, keep the rest
Three methods, all wired as live trainers:
| Method | Loss shape | Needs reference model | Notes |
|---|---|---|---|
| NPO (Negative Preference Optimization) | DPO-shaped, negative-only | yes | Pair with retain set |
| SimNPO | Length-normalized NPO | no | Faster on long sequences |
| RMU (Representation Misdirection Unlearning) | Residual-stream noise on forget inputs | no | Best for concept-level removal |
# soup.yaml
base: meta-llama/Llama-3.1-8B
task: unlearn
training:
epochs: 3
unlearn_method: simnpo
unlearn_alpha: 0.7
data:
forget_set: ./private_data_to_remove.jsonl
retain_set: ./general_knowledge.jsonl
output: ./unlearned_modelunlearn_alpha (0.0–10.0, default 0.5) is the retain-set weight. forget_set is required; retain_set is optional but strongly recommended for NPO.
`soup eval unlearning` — TOFU / MUSE / WMDP
soup eval unlearning <run-id> --benchmark tofuThree benchmarks at launch:
- TOFU — Task of Fictitious Unlearning. Forget Quality + Model Utility + PrivLeak.
- MUSE — Memorization & Unlearning suite.
- WMDP — Weapons of Mass Destruction Proxy.
Scores roll up into an OK / MINOR / MAJOR verdict mirroring v0.56 diagnose.
`soup edit set` — ROME / MEMIT / AlphaEdit
Surgically patch facts without retraining:
soup edit set --base meta-llama/Llama-3.1-8B-Instruct \
--method rome \
--subject "The capital of France is" \
--target "Lyon" \
--plan-only- ROME — Rank-One Model Editing, targets MLP layers (Meng 2022).
- MEMIT — Mass-Edit Memory in a Transformer (Meng 2023).
- AlphaEdit — gradient-based fact patching.
v0.61.0 ships the plan / validation surface; live application lands in v0.61.1.
`soup edit diff` — citation diff visualizer
soup edit diff registry://before_id registry://after_id \
--probes ./probes.jsonl --top-k 10Compares two Registry entries and surfaces which facts changed. Citation visualizer overlays before / after answers next to the supporting evidence.
Sequential Edit Governor
Built into both soup edit and the training pipeline:
- Edit count overflow — caps per-base-model edits (default 10) to prevent cascading drift.
- Norm blowup — rejects any edit that would amplify weight norms beyond a configured threshold.
Numbers
+193 new tests in v0.61.0 (9446 → 9571). Security: 5 HIGH, 11 MEDIUM, 11 LOW fixes.
See also
- [Diagnose](/docs/diagnose) — pair unlearning with the 6-probe report card to verify the forget actually happened.
- [RAG & steering](/docs/rag-and-steering) — v0.62 control-vector steering, a soft alternative to surgical edits.
- [Post-train x-rays (v0.66)](/docs/post-train-xrays) —
soup probe sae-diffandsoup probe sleeperverify unlearning at the mechanistic level (Sparse Autoencoder feature movement + calibrated defection classifier), beyond Forget Quality / Model Utility / PrivLeak benchmark scores.