b7703
model: try to improve Qwen3 Next (#18683)
qwen3next: simplify qkvz projection
use ggml_swiglu_split
revert swiglu_split, but remove redundant repeat()
fix missing reshape
rm 2 redundant transposes
move mul_mat(k,q) to outside of chunking
rm redundant cont
improve g_cs_chunk
add comments about no cont
use std::pair instead of ggml_concat
vectorize key_gdiff calculation
rm unused tensor
avoid ggml_concat inside loop
bring back ggml_concat as it may not work on other backend
nits
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: