AI Changelog Aggregator

mtmd: Add DeepSeekOCR Support (#17400)

mtmd: llama.cpp DeepSeekOCR support init commit
loading sam tensors
mtmd: fix vision model processing
deepseek-ocr clip-vit model impl
mtmd: add DeepSeek-OCR LM support with standard attention
mtmd: successfully runs DeepSeek-OCR LM in llama-cli
mtmd: Fix RoPE type for DeepSeek-OCR LM.
loading LM testing Vision model loading
sam warmup working
sam erroneous return corrected
clip-vit: corrected cls_embd concat
clip-vit: model convert qkv_proj split
corrected combining of image encoders' results
fix: update callback for ffn_moe_weighted and add callback for attn_out in deepseek2 model
concat image_newline and image_seperator tokens
visual_model warmup (technically) works
window partitioning using standard ggml ops
sam implementation without using CPU only ops
clip: fixed warnings
Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr
mtmd: fix get_rel_pos
mtmd: fixed the wrong scaler for get_rel_pos
image encoding technically works but the output can't be checked singe image decoding fails
mtmd: minor changed
mtmd: add native resolution support
- image encoding debugged

issues fixed mainly related wrong config like n_patches etc.
configs need to be corrected in the converter

mtmd: correct token order
- dynamic resizing

changes are concerning PR https://github.com/sfallah/llama.cpp/pull/4

mtmd: quick fix token order
mtmd: fix danling pointer
mtmd: SAM numerically works
mtmd: debug CLIP-L (vit_pre_ln)
mtmd: debug CLIP-L & first working DeepSeek-OCR model
mtmd : add --dsocr-mode CLI argument for DeepSeek-OCR resolution control & all native resolution modes work
mtmd: simplify SAM patch embedding
mtmd: adapt Pillow image resizing function
mtmd: simplify DeepSeek-OCR dynamic resolution preprocessing
mtmd: remove --dsocr-mode argument
mtmd: refactor code & remove unused helper functions
mtmd: fix tensor names for image newlines and view separator
clean up
reverting automatically removed spaces
reverting automatically removed spaces
mtmd: fixed bad ocr check in Deepseek2 (LM)
mtmd: support combined QKV projection in buid_vit
using common build_attn in sam
corrected code-branch when flash-attn disabled enabling usage of --flash-attn option
mtmd: minor fix
minor formatting and style
fixed flake8 lint issues
minor editorconfig-check fixes
minor editorconfig-check fixes
mtmd: simplify get_rel_pos
mtmd: make sam hparams configurable
mtmd: add detailed comments for resize_bicubic_pillow
mtmd: fixed wrong input setting
mtmd: convert model in FP16
mtmd: minor fix
mtmd: remove tweak to llama-mtmd-cli & deepseek-ocr template
fix: test-1.jpg ORC issue with small (640) resolution setting min-resolution base (1024) max large (1280) for dynamic-resolution
minor: editconfig-check fix
merge with changes from https://github.com/ggml-org/llama.cpp/pull/17909 added new opt to tests.sh to disable flash-attn
minor: editconfig-check fix
testing deepseek-ocr quick and dirty test script comparing results of Qwen2.5-VL vs DeepSeek-OCR
quick and (potential) dirty merge with https://github.com/ggml-org/llama.cpp/pull/17909
refactoring, one single builder function and static helpers
added deepseek-ocr test to tests.sh
minor formatting fixes
check with fixed expected resutls
minor formatting
editorconfig-check fix
merge with changes from https://github.com/ggml-org/llama.cpp/pull/18042
minor

added GLM-4.6V to big tests
added missing deps for python test

convert: minor fix
mtmd: format code
convert: quick fix
convert: quick fix
minor python formatting
fixed merge build issue
merge resolved

fixed issues in convert
tested several deepseek models

minor fix
minor
Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com

- removed clip_is_deepseekocr

removed redundant RESIZE_ALGO_BICUBIC_PILLOW resize-algo
simplified image-preprocessing
removed/simplified debug functions

- cleaning commented out code
fixing instabilities issues reintroducing resize_bicubic_pillow
- use f16 model for deepseek-ocr test

ignore llama-arch test for deepseek-ocr

rename fc_w --> mm_fc_w
add links to OCR discussion
cleaner loading code
add missing .weight to some tensors
add default jinja template (to be used by server)
move test model to ggml-org
rolling back upscale change
Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com

Co-authored-by: bluebread hotbread70127@gmail.com Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com Co-authored-by: Xuan Son Nguyen son@huggingface.co Co-authored-by: Xuan-Son Nguyen thichthat@gmail.com

macOS/iOS:

Linux:

Windows:

openEuler: