Back to feed

b7492

Dec 21, 2025
Meta/llama.cppCLIvb7492

server: add auto-sleep after N seconds of idle (#18228)

  • implement sleeping at queue level

  • implement server-context suspend

  • add test

  • add docs

  • optimization: add fast path

  • make sure to free llama_init

  • nits

  • fix use-after-free

  • allow /models to be accessed during sleeping, fix use-after-free

  • don't allow accessing /models during sleep, it is not thread-safe

  • fix data race on accessing props and model_meta

  • small clean up

  • trailing whitespace

  • rm outdated comments

macOS/iOS:

Linux:

Windows:

openEuler: