b9045
mtmd: add granite-speech support (ibm-granite/granite-4.0-1b-speech) (#22101)
- mtmd: add granite-speech support (ibm-granite/granite-4.0-1b-speech)
Conformer encoder with Shaw relative position encoding, QFormer projector, log-mel spectrogram with frame stacking.
Encoder uses GLU gating, folded batch norm, and SSM depthwise conv. QFormer compresses encoder output via windowed cross-attention (window=15, queries=3) into the LLM embedding space.
Audio preprocessing: reflect-padded STFT, 80-bin mel filterbank, dynamic range compression, 2x frame stacking (80->160 mel).
GGUF converter handles batch norm folding at export time, fused K/V split, and Conv1d weight reshaping.
Tested against HF transformers reference: token-for-token match on 30s/60s audio clips with greedy decoding.
mtmd: rename gs_ prefixed tensors to generic/architecture names
mtmd: use tensor_mapping.py for all granite_speech tensors
convert: fold GraniteSpeechTextModel into GraniteModel
mtmd: replace n_layer hack with explicit has_standard_layers flag
mtmd: replace hardcoded magic numbers with GGUF hparams for granite speech
mtmd: align KEY_A_ define spacing
convert: register GraniteModel for GraniteSpeechForConditionalGeneration
convert: fix ty type-check for GraniteSpeechMmprojModel registration
mtmd: align TN_ define spacing
mtmd: use generic layer loop for granite speech tensor loading
mtmd: merge qformer_proj_layer into clip_layer
mtmd: granite_speech remove redundant ggml_build_forward_expand on inputs
mtmd: granite_speech add comment explaining why build_attn is not used
mtmd: granite_speech hard-code eps in cpp, remove from GGUF metadata
gguf: add spacing between granite_speech tensor mapping blocks
mtmd: make generic audio layer_norm_eps read optional
mtmd: granite_speech keep encoder eps in GGUF, only hard-code projector eps
mtmd: align defines and struct fields in clip-impl.h and clip-model.h
mtmd: fix alignment and ordering issues across granite speech files
convert: granite_speech use filter_tensors instead of modify_tensors for skipping
macOS/iOS:
- macOS Apple Silicon (arm64)
- macOS Apple Silicon (arm64, KleidiAI enabled)
- macOS Intel (x64)
- iOS XCFramework
Linux:
- Ubuntu x64 (CPU)
- Ubuntu arm64 (CPU)
- Ubuntu s390x (CPU)
- Ubuntu x64 (Vulkan)
- Ubuntu arm64 (Vulkan)
- Ubuntu x64 (ROCm 7.2)
- Ubuntu x64 (OpenVINO)
- Ubuntu x64 (SYCL FP32)
- Ubuntu x64 (SYCL FP16)
Android:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: