Back to feed

b8646

Apr 3, 2026
Meta/llama.cppCLIvb8646

rpc : reuse compute graph buffers (#21299)

Reuse the buffer for the ggml context which is used for creating the compute graph on the server side. This partially addresses a memory leak created by the CUDA backend due to using buffer addresses as cache keys.

ref: #21265 ref: #20315

macOS/iOS:

Linux:

Windows:

openEuler: