Quant-Lobotomy Checker
Quantization — 16-bit → 4-bit, say — often destroys niche capability you care about while the aggregate score looks fine. soup eval quant-check renders an OK / MINOR / MAJOR verdict per task so you never ship a stealth regression.
Run it
bash
soup eval quant-check \
--before ./output/merged \
--after ./output/merged.q4_k_m.gguf \
--tasks critical_eval.jsonlBoth --before and --after accept either a filesystem path or registry://<id> pointing at a [registry](/docs/registry) entry.
Output
Quant check
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━━┓
┃ Task ┃ Before ┃ After ┃ Delta ┃ Verdict ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━━┩
│ math.competition_hard │ 0.540 │ 0.310 │ -0.230│ MAJOR │
│ tool_call.multi_tool │ 0.760 │ 0.510 │ -0.250│ MAJOR │
│ instruction.long_context │ 0.880 │ 0.840 │ -0.040│ MINOR │
│ json_schema.deep_nested │ 0.910 │ 0.905 │ -0.005│ OK │
└───────────────────────────┴────────┴───────┴───────┴─────────┘Verdict thresholds
The classifier uses absolute drops in score:
OK— drop< 0.02(or the score actually improved)MINOR—0.02 ≤ drop < 0.05MAJOR—drop ≥ 0.05
Output formats
--format accepts table (default, Rich-rendered), json, or markdown. Pipe the output to a file if you want to persist the report:
bash
soup eval quant-check --before X --after Y --tasks t.jsonl --format json > report.jsonTasks file
Reuse any [custom eval JSONL](/docs/experiments):
jsonl
{"prompt": "Solve: ...", "expected": "42", "category": "math.competition_hard", "scoring": "exact"}
{"prompt": "Capital of France?", "expected": "Paris", "scoring": "contains"}See also
- [Export](/docs/export) — GGUF / AWQ / GPTQ
- [Evaluation](/docs/experiments)