b7871
HIP: add mmf for CDNA (#18896)
refactor mmf rows_per_block
speed up compile
pass cdna compile
fix cuda error
clean up mmf
f32 mmf
clean float mma
fix mmf error
faster mmf
extend tile k
fix compile error
Revert "extend tile k"
This reverts commit 4d2ef3d483932659801a59a5af0b6b48f6ffd5c7.
fix smem overflow
speed up compiling mmf
speed up compile for hip
512 block for cdna
config pad size
fix as comment
update select logic
move some code to cuh
fix as comment
correct cdna3 config
Co-authored-by: zhang hui you@example.com
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: