Back to feed

b8184

Mar 1, 2026
Meta/llama.cppCLIvb8184

vulkan: improve partial offloading performance on AMD (#19976)

  • vulkan: fix and enable cpy_tensor_async function

  • use transfer_queue for async transfers on AMD, synchronize with timeline semaphore

  • update offload_op logic

  • fix missing transfer submission

  • disable async transfer queue on AMD GCN

  • revert op batch size change

  • fix cpy_tensor_async checks

macOS/iOS:

Linux:

Windows:

openEuler: