Back to feed

b9266

May 21, 2026
Meta/llama.cppCLIvb9266

llama-graph: fix null-buffer crash in llm_graph_input_attn_kv_iswa for SWA-only models (#23131)

When a model has zero non-SWA attention layers (e.g. a SWA-only slice of Gemma 4), the base KV cache has no layer tensors. The input tensors (self_k_idxs, self_v_idxs, self_kq_mask) are created as graph input nodes but never consumed by any compute node, so the backend scheduler never allocates a buffer for them. Calling mctx->get_base()->set_input_k_idxs() on an unallocated tensor then hits GGML_ASSERT(buffer) at ggml-backend.cpp:194.

The same scenario applies symmetrically: if a model had zero SWA layers, the SWA tensors would be unallocated.

Fix: guard both the base and SWA set_input calls with null/buffer checks, matching the pattern already used by llm_graph_input_mem_hybrid_iswa::set_input (line ~674) which has the comment: 'base tensors may not be allocated if there are no non-SWA attention layers'.

Also fix can_reuse() in the same class to skip the ne[0] and kq_mask checks for unallocated tensors, preventing a null-dereference on the reuse path.

macOS/iOS:

Linux:

Android:

Windows:

openEuler: