b7689

Jan 10, 2026

Meta/llama.cppCLIvb7689

mtmd: Add Gemma3n multimodal support with MobileNetV5 vision encoder (#18256)

Add Gemma3nVisionModel - MobileNetV5 vision encoder convertor to convert_hf_to_gguf.py. Add gemma3n to vision projectors in gguf-py/gguf/constants.py.
Add mobilenetv5 impl
Fix comments, remove unused vars
Fix permute and remove transpose of projection weights
Fix comments, remove debugging prints from hf_to_gguf
1. Hard-code image_mean = 0 and image_std = 1

Use available tensor mapping logic
Remove redundant chat template replacement of soft tokens placeholder with media placeholder

1. Move mobilenetv5 helpers declarations to clip_graph_mobilenetv5 struct and definitions to mobilenetv5.cpp 2.Remove unused clip_is_gemma3n func declarations and definitions

Remove redundant rescale_image_u8_to_f32 func and use normalize_image_u8_to_f32 with zero mean and unit std
Calculate n_patches using image_size / patch_size

Remove obsolete comments
- convert_hf_to_gguf.py & constants.py & tensor_mapping.py: Use explicit mapping: Custom map for double indexed blocks and tensor_mapping.py for rest

convert_hf_to_gguf.py: Unsqueeze Stem Bias and Layer scale tensors to correct shape while converting to gguf
mobilenetv5.cpp: Remove explicit reshaping of Stem Bias and Layer scale which are now handled while converting to gguf, replace fprintf with LOG_*
clip.cpp: Remove unused embedding and hard_emb_norm tensor loading

- Rename tensors to v.conv..., v.blk..., v.msfa... to better align with already existing terminology
Fix stem conv bias name
Remove explicit handling of bias term for stem conv
- Change order of addition in "project_per_layer_inputs" to support broadcasting of vision inp_per_layer

Simplify the vision embeddings path of "get_per_layer_inputs" to output [n_embd_altup, n_layer, 1], broadcastable

clean up conversion script
fix code style
also preserve audio tensors
trailing space
split arch A and V
rm unused gemma3 func
fix alignment

Co-authored-by: Xuan Son Nguyen son@huggingface.co

macOS/iOS:

Linux:

Windows:

openEuler:

← Back to feed