Back to feed

b9110

May 11, 2026
Meta/llama.cppCLIvb9110

docs: fix metrics endpoint description in server README (#22879)

  • docs: fix metrics endpoint description in server README

Required model query parameter for router mode described.

Removed metrics:

  • llamacpp:kv_cache_usage_ratio
  • llamacpp:kv_cache_tokens

Added metrics:

  • llamacpp:prompt_seconds_total
  • llamacpp:tokens_predicted_seconds_total
  • llamacpp:n_decode_total
  • llamacpp:n_busy_slots_per_decode
  • server: fix metrics type for n_busy_slots_per_decode metric

macOS/iOS:

Linux:

Android:

Windows:

openEuler: