Back to feed

b9291

May 22, 2026
Meta/llama.cppCLIvb9291

SYCL: improve MoE prefill throughput (#23142)

  • change k_copy_src1_to_contiguous so that uses a precomputed contiguous mapping where all rows "owned" by an expert are in one slice with a know starts and ends
  • switch the O(n_as * n_routed_rows) contraption to a counting sort-based procedure with O(n_as + n_routed_rows) complexity

macOS/iOS:

Linux:

Android:

Windows:

openEuler: