Releases · ggml-org/llama.cpp

08 Nov 10:31

c1b1876

b6987

CUDA: skip fusion for repeating adds in bias (#17080)

Assets 16

08 Nov 10:27

github-actions

b6986

b8a5cfd

b6986

vulkan: Increase BK to 32; use BK/4 for non-CM mul_mm.comp (#16636)

Signed-off-by: Stefan Savic <[email protected]>
Co-authored-by: Stefan Savic <[email protected]>

Assets 16

08 Nov 10:13

github-actions

b6985

08416eb

b6985

ggml: disable vxe for cross-compilation by default (#16966)

Otherwise compilation will fail due to enabling -mvx -mzvector
and not setting corresponding -march options.

Assets 16

08 Nov 09:09

github-actions

b6984

b4e335d

b6984

vulkan: fuse rms_norm + mul + rope (+ view + set_rows) (#16977)

This change combines the rms_norm+mul and rope+view+set_rows fusions to
allow fusing the whole sequence together. This comes up in Qwen3, Bailing,
and some other models.

Assets 16

08 Nov 09:43

github-actions

b6983

d6fe40f

b6983

vulkan: Fix test-thread-safety crashes (#17024)

The std::map pipeline_flash_attn_f32_f16 could be searched and inserted at the
same time, which needs to hold the lock. To be safe, hold the lock for all of
ggml_vk_load_shaders.

Assets 16

08 Nov 08:47

github-actions

b6982

e14e842

b6982

CUDA: fix MMQ stream-k fixup ne1 indices (#17089)

Assets 16

08 Nov 04:02

github-actions

b6981

647b960

b6981

ggml webgpu: faster matrix multiplication/matrix-vector multiplicatio…

Assets 16

08 Nov 00:10

github-actions

b6980

299f5d7

b6980

CUDA: properly handle nb00=nb02 case for cpy (#17081)

Assets 16

07 Nov 22:57

github-actions

b6979

ac76d36

b6979

vulkan : refactor buffer handling in vk_op_f32 (#16840)

* vulkan : refactor/simplify buffer handling in vk_op_* functions

* Combine UMA handling into ggml_vk_tensor_subbuffer

Assets 16

07 Nov 22:50

github-actions

b6978

6515610

b6978

CUDA: fix should_use_mmvf for ne11 == 1 (#17085)

* CUDA: fix should_use_mmvf for ne11 == 1

* Apply suggestion from @am17an

Co-authored-by: Aman Gupta <[email protected]>

---------

Co-authored-by: Aman Gupta <[email protected]>

Assets 16

Releases: ggml-org/llama.cpp

b6987

Uh oh!

b6986

Uh oh!

b6985

Uh oh!

b6984

Uh oh!

b6983

Uh oh!

b6982

Uh oh!

b6981

Uh oh!

b6980

Uh oh!

b6979

Uh oh!

b6978

Uh oh!