Back to feed

b8190

Mar 3, 2026
Meta/llama.cppCLIvb8190

ggml webgpu: fix workgroup dispatch limit for large batch sizes (#19965)

  • ggml-webgpu: fix workgroup dispatch limit for large batch sizes

WebGPU limits workgroup sizes to 65535 per dimension. Large MUL_MAT operations with batch sizes exceedeing this limi would fail.

  • add compute_2d_workgroups() helper to split total workgroup ID across X/Y dimensions

  • update mul_mat_reg_tile.wgsl to reconstruct linear workgroup ID from 2D dispatch

  • update mul_mat_subgroup_matrix.wgsl to reconstruct linear workgroup ID from 2D dispatch

  • update mul_mat.wgsl to compute global index from 2D workgroup coordinates

  • refactor all three mul_mat dispatch paths to use the shared helper

  • ggml-webgpu: add bounds checking for over-dispatched workgroups

2D workgroup dispatch can over-dispatch when total workgroups don't divide evenly into the 65535 per-dimension limit. Extra workgroups would compute invalid batch indices, causing memory corruption.

  • add batch_idx bound check to mul_mat_reg_tile.wgsl and mul_mat_subgroup_matrix.wgsl to prevent over-dispatched workgroups from accessing invalid memory

  • fixes test failures with large batch sizes (eg., bs=[128, 1024])

  • ggml-webgpu: add back TODO for spliting large sizes into batches

  • Optimize 2d workgroup provisioning

  • Set some parameters that increase speed


Co-authored-by: Reese Levine reeselevine1@gmail.com

macOS/iOS:

Linux:

Windows:

openEuler: