b9110

May 11, 2026

Meta/llama.cppCLIvb9110

docs: fix metrics endpoint description in server README (#22879)

docs: fix metrics endpoint description in server README

Required model query parameter for router mode described.

Removed metrics:

llamacpp:kv_cache_usage_ratio
llamacpp:kv_cache_tokens

Added metrics:

llamacpp:prompt_seconds_total
llamacpp:tokens_predicted_seconds_total
llamacpp:n_decode_total
llamacpp:n_busy_slots_per_decode

server: fix metrics type for n_busy_slots_per_decode metric

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

← Back to feed