b7925

Feb 3, 2026

Meta/llama.cppCLIvb7925

CUDA: use mmvq for mul-mat-id for small batch sizes (#18958)

CUDA: use mmvq for mul-mat-id for small batch sizes
add mmvq too
Fix perf issue on ampere. Use mmvf mm-id only for non-nvidia GPUs
templatize multi_token_path

macOS/iOS:

Linux:

Windows:

openEuler:

← Back to feed