Back to feed

b7382

Dec 13, 2025
Meta/llama.cppCLIvb7382

[!WARNING] Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

vulkan: Multi-pass softmax for large number of cols (#17892)

When the number of cols is large, split each row across multiple workgroups. There are three phases that communicate partial results through temp buffers: (1) compute max partials (2) take max of partials, compute sum(exp(x-max)) partials (3) sum partials, compute scaled result

macOS/iOS:

Linux:

Windows:

openEuler: