Back to feed

b7703

Jan 11, 2026
Meta/llama.cppCLIvb7703

model: try to improve Qwen3 Next (#18683)

  • qwen3next: simplify qkvz projection

  • use ggml_swiglu_split

  • revert swiglu_split, but remove redundant repeat()

  • fix missing reshape

  • rm 2 redundant transposes

  • move mul_mat(k,q) to outside of chunking

  • rm redundant cont

  • improve g_cs_chunk

  • add comments about no cont

  • use std::pair instead of ggml_concat

  • vectorize key_gdiff calculation

  • rm unused tensor

  • avoid ggml_concat inside loop

  • bring back ggml_concat as it may not work on other backend

  • nits

macOS/iOS:

Linux:

Windows:

openEuler: