b7820

Jan 24, 2026

Meta/llama.cppCLIvb7820

ggml-hexagon: flash-attn opt (#19025)

optimize flash attention kernel by improving score computation and online softmax update
wip
Refactor online softmax update in flash attention kernel for improved performance
Optimize flash attention kernel by replacing float array with HVX_Vector for score computation
wip

macOS/iOS:

Linux:

Windows:

openEuler:

← Back to feed