b7820
ggml-hexagon: flash-attn opt (#19025)
optimize flash attention kernel by improving score computation and online softmax update
wip
Refactor online softmax update in flash attention kernel for improved performance
Optimize flash attention kernel by replacing float array with HVX_Vector for score computation
wip
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: