Quant-Lobotomy Checker

Quantization — 16-bit → 4-bit, say — often destroys niche capability you care about while the aggregate score looks fine. soup eval quant-check renders an OK / MINOR / MAJOR verdict per task so you never ship a stealth regression.

Run it

bash
soup eval quant-check \
  --before ./output/merged \
  --after ./output/merged.q4_k_m.gguf \
  --tasks critical_eval.jsonl

Both --before and --after accept either a filesystem path or registry://<id> pointing at a [registry](/docs/registry) entry.

Output

Quant check
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━━┓
┃ Task                      ┃ Before ┃ After ┃ Delta ┃ Verdict ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━━┩
│ math.competition_hard     │ 0.540  │ 0.310 │ -0.230│ MAJOR   │
│ tool_call.multi_tool      │ 0.760  │ 0.510 │ -0.250│ MAJOR   │
│ instruction.long_context  │ 0.880  │ 0.840 │ -0.040│ MINOR   │
│ json_schema.deep_nested   │ 0.910  │ 0.905 │ -0.005│ OK      │
└───────────────────────────┴────────┴───────┴───────┴─────────┘

Verdict thresholds

The classifier uses absolute drops in score:

  • OK — drop < 0.02 (or the score actually improved)
  • MINOR0.02 ≤ drop < 0.05
  • MAJORdrop ≥ 0.05

Output formats

--format accepts table (default, Rich-rendered), json, or markdown. Pipe the output to a file if you want to persist the report:

bash
soup eval quant-check --before X --after Y --tasks t.jsonl --format json > report.json

Tasks file

Reuse any [custom eval JSONL](/docs/experiments):

jsonl
{"prompt": "Solve: ...", "expected": "42", "category": "math.competition_hard", "scoring": "exact"}
{"prompt": "Capital of France?", "expected": "Paris", "scoring": "contains"}

See also

  • [Export](/docs/export) — GGUF / AWQ / GPTQ
  • [Evaluation](/docs/experiments)