b8749
ggml-webgpu: address quantization precision and backend lifecycle managment (#21521)
ggml(webgpu): fix the busy-polls in Emscripten in the waitAny after #20618, and remove the busy webgpu log
Merge with upstream
Fix GET_ROWS packed integer NaN when using f16 as memory buffer in shader quants
Update Unary wgsl EXP and EXPM1 for f16 stability
Fix GET_ROWS IQ4_XS strcut for NaN f16 canonicalization
Fix numerical percision for unary sqrt when working with f16
Fix NaN canonicalization for packed integers using f16
Update err threshold for binary div ops when using f16
backend: Keep one Dawn/WebGPU instance alive for the lifetime of the static backend
clean: uncomment existing code logs
clean: clean the unncessary debug info
Refactor and generalize dequant helpers
Remove deprecated quant structs
Refactor shader defines to reduce repetition
Remove error override for F16 type
fix: fix the accidential removal of the proper initialization of ctx
clean: clean legacy and format code
fix: did not modify tests ops
Co-authored-by: Jeremy J. Hartmann jeremy@mtion.tv
macOS/iOS:
- macOS Apple Silicon (arm64)
- macOS Apple Silicon (arm64, KleidiAI enabled)
- macOS Intel (x64)
- iOS XCFramework
Linux:
- Ubuntu x64 (CPU)
- Ubuntu arm64 (CPU)
- Ubuntu s390x (CPU)
- Ubuntu x64 (Vulkan)
- Ubuntu arm64 (Vulkan)
- Ubuntu x64 (ROCm 7.2)
- Ubuntu x64 (OpenVINO)
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: