b7845
ggml-cpu: aarm64: q6_K repack gemm and gemv (and generic) implementations (i8mm) #18860 (#18888)
Boilerplate for q6_K repack
q6_K repack to q6_Kx8 implementation
Signed-off-by: Alberto Cabrera alberto.cabrera@liquid.ai
q6_K generic gemv and gemm
wip, gemm_q6_K 8x8
Still WIP: loading of q8s, q6h and q6l
first working version of q6_K gemm
Moved q6 loads outside of sb block, Unrolled inner loop
Replaced modulo with mask
First implementation of GEMV
ggml_vdotq_s32 -> vdotq_s32
Reduce width of accumulators in q6_K gemv
Bsums instead of calc bias. Preload scales to use vget_lane. Unroll.
Reuse scales in GEMM (same GEMV opt)
Added todos for bsum and different qh repack
Arch fallback
VSLIQ for merging qh adn ql
Removed TODO, already tested
Apply suggestions
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
- Removed unused import
Signed-off-by: Alberto Cabrera alberto.cabrera@liquid.ai Co-authored-by: Georgi Gerganov ggerganov@gmail.com
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: