Back to feed

b8315

Mar 13, 2026
Meta/llama.cppCLIvb8315

vulkan: fix SSM_CONV PP scaling with large ubatch sizes (#20379)

  • vulkan: optimize SSM_CONV workgroup dispatch for large ubatch

Tile tokens into 2D workgroups (32x16) to reduce workgroup launch overhead at large ubatch sizes. Add vec4 fast path for nc=4 (common d_conv size). Fixes PP performance degradation with ubatch > 512.

Ref: ggml-org/llama.cpp#18725

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • vulkan: remove unused shared memory declaration in SSM_CONV

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com


Co-authored-by: Progeny Alpha ProgenyAlpha@users.noreply.github.com Co-authored-by: Claude Opus 4.6 noreply@anthropic.com

macOS/iOS:

Linux:

Windows:

openEuler: