Skip to content

Releases: ggml-org/llama.cpp

b6987

08 Nov 10:31
c1b1876

Choose a tag to compare

CUDA: skip fusion for repeating adds in bias (#17080)

b6986

08 Nov 10:27
b8a5cfd

Choose a tag to compare

vulkan: Increase BK to 32; use BK/4 for non-CM mul_mm.comp (#16636)

Signed-off-by: Stefan Savic <[email protected]>
Co-authored-by: Stefan Savic <[email protected]>

b6985

08 Nov 10:13
08416eb

Choose a tag to compare

ggml: disable vxe for cross-compilation by default (#16966)

Otherwise compilation will fail due to enabling -mvx -mzvector
and not setting corresponding -march options.

b6984

08 Nov 09:09
b4e335d

Choose a tag to compare

vulkan: fuse rms_norm + mul + rope (+ view + set_rows) (#16977)

This change combines the rms_norm+mul and rope+view+set_rows fusions to
allow fusing the whole sequence together. This comes up in Qwen3, Bailing,
and some other models.

b6983

08 Nov 09:43
d6fe40f

Choose a tag to compare

vulkan: Fix test-thread-safety crashes (#17024)

The std::map pipeline_flash_attn_f32_f16 could be searched and inserted at the
same time, which needs to hold the lock. To be safe, hold the lock for all of
ggml_vk_load_shaders.

b6982

08 Nov 08:47
e14e842

Choose a tag to compare

CUDA: fix MMQ stream-k fixup ne1 indices (#17089)

b6981

08 Nov 04:02
647b960

Choose a tag to compare

ggml webgpu: faster matrix multiplication/matrix-vector multiplicatio…

b6980

08 Nov 00:10
299f5d7

Choose a tag to compare

CUDA: properly handle nb00=nb02 case for cpy (#17081)

b6979

07 Nov 22:57
ac76d36

Choose a tag to compare

vulkan : refactor buffer handling in vk_op_f32 (#16840)

* vulkan : refactor/simplify buffer handling in vk_op_* functions

* Combine UMA handling into ggml_vk_tensor_subbuffer

b6978

07 Nov 22:50
6515610

Choose a tag to compare

CUDA: fix should_use_mmvf for ne11 == 1 (#17085)

* CUDA: fix should_use_mmvf for ne11 == 1

* Apply suggestion from @am17an

Co-authored-by: Aman Gupta <[email protected]>

---------

Co-authored-by: Aman Gupta <[email protected]>