Back to feed

b8470

Mar 22, 2026
Meta/llama.cppCLIvb8470

ggml-cuda: native bf16 flash attention for vec kernel (#20525)

  • ggml-cuda: native bf16 flash attention for vec and tile kernels

mma kernel still converts bf16 to fp16 before launch, native mma bf16 todo

  • ggml-cuda: address code owner review feedback

reverted tile kernel changes to avoid larger refactor

  • fix ci failures on turing and hip

  • fix bf16 vec kernel compile on hip v_dot2 platforms

  • add comments


Co-authored-by: Johannes Gäßler johannesg@5d6.de

macOS/iOS:

Linux:

Windows:

openEuler: