Back to feed

b9075

May 8, 2026
Meta/llama.cppCLIvb9075

cuda: fuse snake activation (mul, sin, sqr, mul, add) (#22667)

  • cuda: fuse snake activation (mul, sin, sqr, mul, add)

Add ggml_cuda_op_snake_fused with F32 / F16 / BF16 templates. The matcher recognizes the naive 5 op decomposition emitted by audio decoders (BigVGAN, Vocos) for snake activation y = x + sin(a*x)^2 * inv_b and rewrites it to a single elementwise kernel.

Add test_snake_fuse comparing CPU naive vs CUDA fused across F32 / F16 / BF16.

  • cuda: address review feedback from @am17an

Use ggml_cuda_cast for F32/F16/BF16 conversions and rename kernel_snake to snake_kernel to match upstream conventions.

  • cuda: snake fusion fastdiv on T_len, Suggested-by: @am17an

  • Update tests/test-backend-ops.cpp

Co-authored-by: Aman Gupta amangupta052@gmail.com

  • cuda: snake fusion check add->type matches x->type

Address review feedback from @am17an

  • cuda: snake fusion check add->type matches x->type

Moved for readability (equivalent) Address review feedback from @am17an


Co-authored-by: Aman Gupta amangupta052@gmail.com

macOS/iOS:

Linux:

Android:

Windows:

openEuler: