Back to feed

b8480

Mar 23, 2026
Meta/llama.cppCLIvb8480

CANN: add RoPE cache preload before ACL graph capture (#20747)

ACL graph capture disallows host-to-device memcpy and device memory malloc/free on the captured stream. Pre-load the RoPE cache before capture so that:

  • Host-to-device copies and allocations run on the non-captured stream
  • Cache metadata is populated and memory pool is warmed up
  • During capture, only on-device computations are recorded; host-side and allocation branches are skipped

macOS/iOS:

Linux:

Windows:

openEuler: