Back to feed

b8278

Mar 11, 2026
Meta/llama.cppCLIvb8278

llama-quant : correct n_attention_wv usage (#20357)

  • llama-quant : correct n_attention_wv usage

In #19770, I introduced a regression in the way the quantize_state_impl counter values were initialized. I was incrementing and using n_attention_wv in the same loop, when it should have been fixed by the time we're deciding tensor types in llama_tensor_get_type_impl (for use_more_bits).

I never observed a difference in any of my tests

  • it was only after @bartowski kindly pointed this out that I realized it was incorrect. (Thanks!)
  • simplify

macOS/iOS:

Linux:

Windows:

openEuler: