Releases · ggml-org/llama.cpp

10 Nov 09:37

1032256

b7005

cuda/vulkan : bicubic interpolation (#17022)

* vulkan : implement upscale with bicubic interpolation

* cuda : implement upscale with bicubic interpolation

* tests : add ggml_interpolate with GGML_SCALE_MODE_BICUBIC to backend tests

* adapt OpenCL backend to not support the OP in that case so tests don't fail

* print scale mode & flags in test-backend-ops

Assets 16

09 Nov 18:10

github-actions

b7003

b8595b1

b7003

mtmd : fix embedding size for image input (#17123)

Assets 16

09 Nov 16:52

github-actions

b7002

392e09a

b7002

vulkan: fix memory allocations (#17122)

Assets 16

09 Nov 12:54

github-actions

b6999

cb1adf8

b6999

server : handle failures to restore host cache (#17078)

* server : handle failures to restore host cache

* server : add tests for the prompt cache

Assets 16

09 Nov 09:57

github-actions

b6996

7f3e9d3

b6996

vulkan: iGPU memory reporting fix (#17110)

* vulkan: use all device-local heaps for memory availability reporting

Co-authored-by: Giuseppe Scrivano <[email protected]>

* use all available heaps for iGPU memory reporting

* Allow multiple memory types per buffer request for devices with split heaps

---------

Co-authored-by: Giuseppe Scrivano <[email protected]>

Assets 16

09 Nov 09:56

github-actions

b6995

8a3519b

b6995

vulkan: fix mmq out of bounds reads (#17108)

* vulkan: fix mmq out of bounds reads, streamline outdated matmul host code

* fix mul_mat_id quantization call

* Fix compiler warnings

Assets 16

09 Nov 09:41

github-actions

b6994

80a6cf6

b6994

vulkan: fuse mul_mat_id + mul (#17095)

* vulkan: fuse mul_mat_id + mul

This comes up in qwen3 moe.

* split mul_mat_id fusion tests into a separate class

Assets 16

09 Nov 06:56

github-actions

b6993

0750a59

b6993

metal : retain src and dst buffers during async ops (#17101)

Assets 16

08 Nov 22:14

github-actions

b6992

aa3b7a9

b6992

arg: add --cache-list argument to list cached models (#17073)

* arg: add --cache-list argument to list cached models

* new manifest naming format

* improve naming

* Update common/arg.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 16

08 Nov 20:44

github-actions

b6990

53d7d21

b6990

vulkan: Use spec constants for conv2d s/d/p and kernel W/H (#16978)

* vulkan: Use spec constants for conv2d s/d/p and kernel W/H

Also add some additional unroll hints, which seems to help.

* lock around map lookup

Assets 16

Releases: ggml-org/llama.cpp

b7005

Uh oh!

b7003

Uh oh!

b7002

Uh oh!

b6999

Uh oh!

b6996

Uh oh!

b6995

Uh oh!

b6994

Uh oh!

b6993

Uh oh!

b6992

Uh oh!

b6990

Uh oh!