Back to feed

b7616

Jan 2, 2026
Meta/llama.cppCLIvb7616

vulkan: Optimize GGML_OP_CUMSUM (#18417)

  • vulkan: Optimize GGML_OP_CUMSUM

There are two paths: The preexisting one that does a whole row per workgroup in a single shader, and one that splits each row into multiple blocks and does two passes. The first pass computes partials within a block, the second adds the block partials to compute the final result. The multipass shader is used when there are a small number of large rows.

In the whole-row shader, handle multiple elements per invocation.

  • use 2 ELEM_PER_THREAD for AMD/Intel

  • address feedback

macOS/iOS:

Linux:

Windows:

openEuler: