b7492
server: add auto-sleep after N seconds of idle (#18228)
implement sleeping at queue level
implement server-context suspend
add test
add docs
optimization: add fast path
make sure to free llama_init
nits
fix use-after-free
allow /models to be accessed during sleeping, fix use-after-free
don't allow accessing /models during sleep, it is not thread-safe
fix data race on accessing props and model_meta
small clean up
trailing whitespace
rm outdated comments
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: