Back to feed

b8392

Mar 17, 2026
Meta/llama.cppCLIvb8392

kleidiai : fix MUL_MAT support for batched (3D) inputs (#20620)

  • kleidiai : fix MUL_MAT support for batched (3D) inputs

The supports_op() check incorrectly rejected MUL_MAT operations with 3D inputs (ne[2] > 1), but the actual compute_forward_qx() implementation handles batched inputs correctly via a loop over ne12.

This caused models with Q4_0/Q8_0 weights to crash during graph scheduling when n_seq_max > 1, because weights were placed in KLEIDIAI buffers during loading (tested with 2D inputs) but the runtime used 3D inputs.

Also relax the buffer check to allow supports_op() to be called during weight loading when src[0]->buffer is NULL.

Fixes #20608

  • Kleidiai support_ops should only return true for 3D inputs, not also 4D

macOS/iOS:

Linux:

Windows:

openEuler: