Back to feed

b9460

Jun 1, 2026
Meta/llama.cppCLIvb9460

llama: limit max outputs of llama_context (#23861)

  • llama: save more VRAM by reserving n_outputs == n_seqs when possible

  • add n_outputs_per_seq

  • move n_outputs_max to server-context

  • change ubatch to batch everywhere

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI: