b7513
ggml-hexagon: gelu optimization (#18151)
feat: working gelu with src0 put on vtcm
feat: gelu ping-pong for both in and out
fix: fixu compile error
break: distinguish dma ddr->vtcm and vtcm->ddr operation
fix: fix dma queue size
break: update dma api to either pop src or dst ptr
fix: fix activation vtcm allocation issue for src1 when swapperd
refactor: ping-pong gelu logic to avoid unnecessary if else
dma: improved queue interface and prefetch handling
gelu: fix N+2 block prefetch
Co-authored-by: Max Krasnyansky maxk@qti.qualcomm.com
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: