b7689
mtmd: Add Gemma3n multimodal support with MobileNetV5 vision encoder (#18256)
Add Gemma3nVisionModel - MobileNetV5 vision encoder convertor to convert_hf_to_gguf.py. Add gemma3n to vision projectors in gguf-py/gguf/constants.py.
Add mobilenetv5 impl
Fix comments, remove unused vars
Fix permute and remove transpose of projection weights
Fix comments, remove debugging prints from hf_to_gguf
- Hard-code image_mean = 0 and image_std = 1
- Use available tensor mapping logic
- Remove redundant chat template replacement of soft tokens placeholder with media placeholder
- Move mobilenetv5 helpers declarations to
clip_graph_mobilenetv5struct and definitions to mobilenetv5.cpp 2.Remove unusedclip_is_gemma3nfunc declarations and definitions
- Move mobilenetv5 helpers declarations to
- Remove redundant
rescale_image_u8_to_f32func and usenormalize_image_u8_to_f32with zero mean and unit std - Calculate n_patches using image_size / patch_size
Remove obsolete comments
- convert_hf_to_gguf.py & constants.py & tensor_mapping.py: Use explicit mapping: Custom map for double indexed blocks and tensor_mapping.py for rest
- convert_hf_to_gguf.py: Unsqueeze Stem Bias and Layer scale tensors to correct shape while converting to gguf
- mobilenetv5.cpp: Remove explicit reshaping of Stem Bias and Layer scale which are now handled while converting to gguf, replace fprintf with LOG_*
- clip.cpp: Remove unused embedding and hard_emb_norm tensor loading
- Rename tensors to v.conv..., v.blk..., v.msfa... to better align with already existing terminology
Fix stem conv bias name
Remove explicit handling of bias term for stem conv
- Change order of addition in "project_per_layer_inputs" to support broadcasting of vision inp_per_layer
- Simplify the vision embeddings path of "get_per_layer_inputs" to output [n_embd_altup, n_layer, 1], broadcastable
clean up conversion script
fix code style
also preserve audio tensors
trailing space
split arch A and V
rm unused gemma3 func
fix alignment
Co-authored-by: Xuan Son Nguyen son@huggingface.co
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: