AI Changelog Aggregator

[!WARNING] Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

llama : remove quantization sanity check (#17788)

llama : remove quantization sanity check

This commit removes the quantization sanity check for attention layers.

The motivation for this is that there are model that are hybrid models that have recurrent layers, experts layers, and attention layers. For these models the current check fails as the experts layers are not taking into account. After consideration, it was decided that this check is not strictly necessary, and can be removed to allow for more flexible model architectures.

llama : remove unused pruned_attention_w and is_clip_model vars

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Intel (x64)
iOS XCFramework

Linux:

Ubuntu x64 (CPU)
Ubuntu x64 (Vulkan)
Ubuntu s390x (CPU)

Windows:

Windows x64 (CPU)
Windows arm64 (CPU)
Windows x64 (CUDA)
Windows x64 (Vulkan)
Windows x64 (SYCL)
Windows x64 (HIP)