Back to feed

b8906

Apr 23, 2026
Meta/llama.cppCLIvb8906

server: (anthropic API) fix prefix caching (#21793)

When testing claude code against llama.cpp, I noticed that only n_past 18577 was used even when context was 60k or more. The log in llama-server says:

slot update_slots: id  3 | task 10342 | old: ... ; cch= | defa0;You are
slot update_slots: id  3 | task 10342 | new: ... ; cch= | 1c8b4;

I observed that the cch value changed every time. Reading about that, the x-anthropic-billing-header system message seems to be specially handled inside of the anthropic api. I could remove it, but there is a meaningful string sometimes included at the end. So instead, I just replace the changing cch checksum with fffff.

I'm treating this as an anthropic message body API detail - I think this is the right way to do this, but by all means please correct me!

It's always 5 hexadecimal characters, but I've written the replacement defensively in case they change the protocol.

macOS/iOS:

Linux:

Android:

Windows:

openEuler: