b8779

Apr 13, 2026

Meta/llama.cppCLIvb8779

vulkan: Flash Attention DP4A shader for quantized KV cache (#20797)

use integer dot product for quantized KV flash attention
small improvements
fix SHMEM_STAGING indexing
add missing KV type quants
fixes
add supported quants to FA tests
readd fast paths for <8bit quants
fix mmq gate and shmem checks

macOS/iOS:

Linux:

Windows:

openEuler:

← Back to feed