Back to feed

b8779

Apr 13, 2026
Meta/llama.cppCLIvb8779

vulkan: Flash Attention DP4A shader for quantized KV cache (#20797)

  • use integer dot product for quantized KV flash attention

  • small improvements

  • fix SHMEM_STAGING indexing

  • add missing KV type quants

  • fixes

  • add supported quants to FA tests

  • readd fast paths for <8bit quants

  • fix mmq gate and shmem checks

macOS/iOS:

Linux:

Windows:

openEuler: