Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b7005
cuda/vulkan : bicubic interpolation (#17022) * vulkan : implement upscale with bicubic interpolation * cuda : implement upscale with bicubic interpolation * tests : add ggml_interpolate with GGML_SCALE_MODE_BICUBIC to backend tests * adapt OpenCL backend to not support the OP in that case so tests don't fail * print scale mode & flags in test-backend-ops
b7003
mtmd : fix embedding size for image input (#17123)
b7002
vulkan: fix memory allocations (#17122)
b6999
server : handle failures to restore host cache (#17078) * server : handle failures to restore host cache * server : add tests for the prompt cache
b6996
vulkan: iGPU memory reporting fix (#17110) * vulkan: use all device-local heaps for memory availability reporting Co-authored-by: Giuseppe Scrivano <[email protected]> * use all available heaps for iGPU memory reporting * Allow multiple memory types per buffer request for devices with split heaps --------- Co-authored-by: Giuseppe Scrivano <[email protected]>
b6995
vulkan: fix mmq out of bounds reads (#17108) * vulkan: fix mmq out of bounds reads, streamline outdated matmul host code * fix mul_mat_id quantization call * Fix compiler warnings
b6994
vulkan: fuse mul_mat_id + mul (#17095) * vulkan: fuse mul_mat_id + mul This comes up in qwen3 moe. * split mul_mat_id fusion tests into a separate class
b6993
metal : retain src and dst buffers during async ops (#17101)
b6992
arg: add --cache-list argument to list cached models (#17073) * arg: add --cache-list argument to list cached models * new manifest naming format * improve naming * Update common/arg.cpp Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: Georgi Gerganov <[email protected]>
b6990
vulkan: Use spec constants for conv2d s/d/p and kernel W/H (#16978) * vulkan: Use spec constants for conv2d s/d/p and kernel W/H Also add some additional unroll hints, which seems to help. * lock around map lookup