Unlearning & Knowledge Edit (v0.61.0)

GDPR right-to-be-forgotten as a first-class task. Surgical fact patches without retraining.

`task: unlearn` — remove specific data, keep the rest

Three methods, all wired as live trainers:

MethodLoss shapeNeeds reference modelNotes
NPO (Negative Preference Optimization)DPO-shaped, negative-onlyyesPair with retain set
SimNPOLength-normalized NPOnoFaster on long sequences
RMU (Representation Misdirection Unlearning)Residual-stream noise on forget inputsnoBest for concept-level removal
yaml
# soup.yaml
base: meta-llama/Llama-3.1-8B
task: unlearn
training:
  epochs: 3
  unlearn_method: simnpo
  unlearn_alpha: 0.7
data:
  forget_set: ./private_data_to_remove.jsonl
  retain_set: ./general_knowledge.jsonl
output: ./unlearned_model

unlearn_alpha (0.0–10.0, default 0.5) is the retain-set weight. forget_set is required; retain_set is optional but strongly recommended for NPO.

`soup eval unlearning` — TOFU / MUSE / WMDP

bash
soup eval unlearning <run-id> --benchmark tofu

Three benchmarks at launch:

  • TOFU — Task of Fictitious Unlearning. Forget Quality + Model Utility + PrivLeak.
  • MUSE — Memorization & Unlearning suite.
  • WMDP — Weapons of Mass Destruction Proxy.

Scores roll up into an OK / MINOR / MAJOR verdict mirroring v0.56 diagnose.

`soup edit set` — ROME / MEMIT / AlphaEdit

Surgically patch facts without retraining:

bash
soup edit set --base meta-llama/Llama-3.1-8B-Instruct \
  --method rome \
  --subject "The capital of France is" \
  --target "Lyon" \
  --plan-only
  • ROME — Rank-One Model Editing, targets MLP layers (Meng 2022).
  • MEMIT — Mass-Edit Memory in a Transformer (Meng 2023).
  • AlphaEdit — gradient-based fact patching.

v0.61.0 ships the plan / validation surface; live application lands in v0.61.1.

`soup edit diff` — citation diff visualizer

bash
soup edit diff registry://before_id registry://after_id \
  --probes ./probes.jsonl --top-k 10

Compares two Registry entries and surfaces which facts changed. Citation visualizer overlays before / after answers next to the supporting evidence.

Sequential Edit Governor

Built into both soup edit and the training pipeline:

  • Edit count overflow — caps per-base-model edits (default 10) to prevent cascading drift.
  • Norm blowup — rejects any edit that would amplify weight norms beyond a configured threshold.

Numbers

+193 new tests in v0.61.0 (9446 → 9571). Security: 5 HIGH, 11 MEDIUM, 11 LOW fixes.

See also

  • [Diagnose](/docs/diagnose) — pair unlearning with the 6-probe report card to verify the forget actually happened.
  • [RAG & steering](/docs/rag-and-steering) — v0.62 control-vector steering, a soft alternative to surgical edits.
  • [Post-train x-rays (v0.66)](/docs/post-train-xrays) — soup probe sae-diff and soup probe sleeper verify unlearning at the mechanistic level (Sparse Autoencoder feature movement + calibrated defection classifier), beyond Forget Quality / Model Utility / PrivLeak benchmark scores.