b9089

May 9, 2026

Meta/llama.cppCLIvb9089

SYCL: reduce allocation overhead during flash attention (#22732)

SYCL: reduce allocation overhead during flash attention
tidy up whitespace
add a note about the flag
move ggml_sycl_fattn_* into fattn-buffers.hpp
refactor implementation into fattn-buffers.cpp
move new_fattn_kv_buffers back into ggml-sycl.cpp

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

← Back to feed