Back to feed

b8658

Apr 3, 2026
Meta/llama.cppCLIvb8658

server: save and clear idle slots on new task (--clear-idle) (#20993)

  • server: clear idle slots KV from VRAM (LLAMA_KV_KEEP_ONLY_ACTIVE)

  • server: move idle slot KV clearing to slot release

The save "cost" is now paid by the finishing request.

  • server: add --kv-clear-idle flag, enable by default

  • server: skip clearing last idle slot, clear on launch

  • server: test --no-kv-clear-idle flag

  • server: simplify on-release clearing loop

  • server: remove on-release KV clearing, keep launch-only

  • cont : clean-up

  • tests: update log strings after --clear-idle rename

  • tests: use debug tags instead of log message matching

  • test: fix Windows CI by dropping temp log file unlink


Co-authored-by: Georgi Gerganov ggerganov@gmail.com

macOS/iOS:

Linux:

Windows:

openEuler: