Optimizer Zoo (v0.41.0)

A closed allowlist of 33+ optimizers covering HF-native, bitsandbytes-backed, and v0.41 additions.

Supported optimizers

HF-native: adamw_torch, adamw_torch_fused, adafactor, sgd, and the rest of the HF set.

bitsandbytes-backed: adamw_bnb_8bit, paged_adamw_8bit, lion_8bit, etc.

New in v0.41:

  • BAdam — block-coordinate descent for full-parameter fine-tuning
  • APOLLO (apollo_adamw) — gradient-scaled training in low rank
  • Adam-mini — half the optimizer memory of AdamW
  • lomo / adalomo — low-memory optimization with optional adaptive scaling
  • grokadamw — Grokfast-style AdamW
  • schedule_free_adamw / schedule_free_sgd
  • muon, dion, came_pytorch
  • TorchAO ao_adamw_fp8 / ao_adamw_4bit / ao_adamw_8bit
yaml
training:
  optimizer: muon

validate_optimizer_name rejects non-string / empty / null-byte / >64-char inputs with actionable messages and lowercases for deterministic lookup. Unknown names fail at config-load.

Per-module LR groups

Map regex patterns → learning rates with first-match-wins routing.

yaml
training:
  lr: 2e-4   # base / fallback
  lr_groups:
    - { pattern: "model\\.embed_tokens", lr: 5e-5 }
    - { pattern: ".*lora_A.*", lr: 1e-3 }
    - { pattern: ".*lora_B.*", lr: 1e-3 }

Capped at 32 entries. Each pattern must compile via re.compile (ReDoS probe runs). Each lr ∈ (0, 1], finite-only (NaN / ±inf rejected), bool rejected. Duplicates rejected.

LoftQ quantization-aware LoRA init

yaml
training:
  lora:
    init_strategy: loftq   # random | pissa | olora | loftq
    loftq_iter: 5          # [1, 10]
    loftq_bits: 4          # 2 | 4 | 8

Builds the LoftQ config via peft (lazy import); incompatible with DoRA + VeRA.

LLaMA Pro block expansion

yaml
training:
  expand_layers: 4              # [1, 64]
  freeze_trainable_layers: 32   # required when expand_layers set

Schema lands in v0.41.0; live patch ships in v0.41.1 (same stub-then-live pattern as v0.27.0 MII / v0.37.0 multipack).

Friendly load_in_X aliases

load_in_8bit: true remaps quantization8bit. load_in_16bit: true remaps to none. Mutually exclusive. Combining either with an explicit Quant Menu format raises rather than silently overriding.