b8811
ggml-webgpu: compute pass batching and removing profiling overhead (#21873)
Update register tiling matmul to use f32 accumulation
fix profiling code
Fix register tiling matmul for chrome, i'm blaming dawn
Update batch tuning value for iOS
compile fix
Fix use of new load function
Move to a single query set for GPU profiling
Move to batching compute passes when not profiling
Refactor build_multi
remove iOS throttling now that we're batching compute passes
macOS/iOS:
- macOS Apple Silicon (arm64)
- macOS Apple Silicon (arm64, KleidiAI enabled)
- macOS Intel (x64)
- iOS XCFramework
Linux:
- Ubuntu x64 (CPU)
- Ubuntu arm64 (CPU)
- Ubuntu s390x (CPU)
- Ubuntu x64 (Vulkan)
- Ubuntu arm64 (Vulkan)
- Ubuntu x64 (ROCm 7.2)
- Ubuntu x64 (OpenVINO)
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: