AI Changelog Aggregator

hexagon: optimize HMX matmul operations (#21071)

optimize hmx_mat_mul functions by calculating row and column tiles upfront
refactor core_dot_chunk_fp16 to use size_t for tile counts and improve readability
wip
set scale outside of loop
wip
refactor core_mma_chunk_fp16 and mat_mul_qk_0_d16a32 to use size_t for tile counts
wip
wip
refactor transfer_output_chunk_fp16_to_fp32 to use size_t for dimensions
refactor core_dot_chunk_fp16 to use size_t for tile row stride calculation
wip
refactor hmx_mat_mul functions to use hvx_vec_splat_f16 for column scales initialization
refactor hmx_mat_mul_permuted_w16a32_batched to streamline scale setting and locking
refactor core_dot_chunk_fp16 to improve tile stride calculations for output
refactor hmx_mat_mul functions to use Q6_V_vsplat_R for column scales initialization
fix compiling error
wip
optimize row and column tile indexing in core_mma_chunk_fp16 function
wip
Revert "wip"

This reverts commit cde679eff79c4a28dd2d89d32f710015e09592b6.

macOS/iOS:

Linux:

Windows:

openEuler: