Back to feed

b7819

Jan 23, 2026
Meta/llama.cppCLIvb7819

graph : utilize ggml_build_forward_select() to avoid reallocations (#18898)

  • graph : avoid branches between embedding and token inputs

  • models : make deepstack graphs (e.g. Qwen3 VL) have constant topology

  • ci : enable -DGGML_SCHED_NO_REALLOC=ON for server CI

  • cont : pad token embeddings to n_embd_inp

macOS/iOS:

Linux:

Windows:

openEuler: