Back to feed

b7301

Dec 6, 2025
Meta/llama.cppCLIvb7301

[!WARNING] Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

llama : remove quantization sanity check (#17788)

  • llama : remove quantization sanity check

This commit removes the quantization sanity check for attention layers.

The motivation for this is that there are model that are hybrid models that have recurrent layers, experts layers, and attention layers. For these models the current check fails as the experts layers are not taking into account. After consideration, it was decided that this check is not strictly necessary, and can be removed to allow for more flexible model architectures.

  • llama : remove unused pruned_attention_w and is_clip_model vars

macOS/iOS:

Linux:

Windows: