Skip to content

Releases: ggml-org/llama.cpp

b7003

09 Nov 18:10
b8595b1

Choose a tag to compare

mtmd : fix embedding size for image input (#17123)

b7002

09 Nov 16:52
392e09a

Choose a tag to compare

vulkan: fix memory allocations (#17122)

b6999

09 Nov 12:54
cb1adf8

Choose a tag to compare

server : handle failures to restore host cache (#17078)

* server : handle failures to restore host cache

* server : add tests for the prompt cache

b6996

09 Nov 09:57
7f3e9d3

Choose a tag to compare

vulkan: iGPU memory reporting fix (#17110)

* vulkan: use all device-local heaps for memory availability reporting

Co-authored-by: Giuseppe Scrivano <[email protected]>

* use all available heaps for iGPU memory reporting

* Allow multiple memory types per buffer request for devices with split heaps

---------

Co-authored-by: Giuseppe Scrivano <[email protected]>

b6995

09 Nov 09:56
8a3519b

Choose a tag to compare

vulkan: fix mmq out of bounds reads (#17108)

* vulkan: fix mmq out of bounds reads, streamline outdated matmul host code

* fix mul_mat_id quantization call

* Fix compiler warnings

b6994

09 Nov 09:41
80a6cf6

Choose a tag to compare

vulkan: fuse mul_mat_id + mul (#17095)

* vulkan: fuse mul_mat_id + mul

This comes up in qwen3 moe.

* split mul_mat_id fusion tests into a separate class

b6993

09 Nov 06:56
0750a59

Choose a tag to compare

metal : retain src and dst buffers during async ops (#17101)

b6992

08 Nov 22:14
aa3b7a9

Choose a tag to compare

arg: add --cache-list argument to list cached models (#17073)

* arg: add --cache-list argument to list cached models

* new manifest naming format

* improve naming

* Update common/arg.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

---------

Co-authored-by: Georgi Gerganov <[email protected]>

b6990

08 Nov 20:44
53d7d21

Choose a tag to compare

vulkan: Use spec constants for conv2d s/d/p and kernel W/H (#16978)

* vulkan: Use spec constants for conv2d s/d/p and kernel W/H

Also add some additional unroll hints, which seems to help.

* lock around map lookup

b6989

08 Nov 14:06
eeee367

Choose a tag to compare

server: fix correct time_ms calculation in prompt_progress (#17093)

* fix: correct time_ms calculation in send_partial_response

The time_ms field was incorrectly calculated. The division was happening
before the subtraction leading to incorrect values.

Before: (ggml_time_us() - slot.t_start_process_prompt / 1000) After:
(ggml_time_us() - slot.t_start_process_prompt) / 1000

* docs : document time_ms field in prompt_progress