b8057
ggml-cpu: FA add GEMM microkernel (#19422)
ggml-cpu: FA add GEMM microkernel
add guard for sizeless vector types
fix case where DV % GGML_F32_EPR !=0
move memset out of the loop
move another memset out of the loop
use RM=4 for arm
simd_gemm: convert everything to int
convert everything to size_t to avoid warnings
fixup
add pragma for ignoring aggressive loop optimizations
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: