Back to feed

b9089

May 9, 2026
Meta/llama.cppCLIvb9089

SYCL: reduce allocation overhead during flash attention (#22732)

  • SYCL: reduce allocation overhead during flash attention

  • tidy up whitespace

  • add a note about the flag

  • move ggml_sycl_fattn_* into fattn-buffers.hpp

  • refactor implementation into fattn-buffers.cpp

  • move new_fattn_kv_buffers back into ggml-sycl.cpp

macOS/iOS:

Linux:

Android:

Windows:

openEuler: