Back to feed

b7275

Dec 4, 2025
Meta/llama.cppCLIvb7275

[!WARNING] Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

metal: TRI, FILL, EXPM1, SOFTPLUS (#16623)

  • feat(wip): Port initial TRI impl from pervious work

The kernel does not work and is not optimized, but the code compiles and runs, so this will be the starting point now that the core op has been merged.

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

  • fix: Remove argument for constant val override

This was added in the original draft, but later removed. With this, the kernel now passes tests.

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

  • feat: Move the ttype conditional to templating to avoid conditional in kernel

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

  • fix: Type fixes

Signed-off-by: Gabe Goodhart ghart@us.ibm.com Co-authored-by: Georgi Gerganov ggerganov@gmail.com

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

  • feat: Add softplus for metal

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

  • feat: Add EXPM1 for metal

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

  • feat: Add FILL for metal

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

  • refactor: Branchless version of tri using _ggml_vec_tri_cmp as a mask

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

  • fix: Remove unused arguments

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

  • refactor: Use select instead of branch for softplus non-vec

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart ghart@us.ibm.com


Signed-off-by: Gabe Goodhart ghart@us.ibm.com Co-authored-by: Georgi Gerganov ggerganov@gmail.com

macOS/iOS:

Linux:

Windows: