Releases · ggml-org/llama.cpp

08 Nov 00:10

299f5d7

b6980

CUDA: properly handle nb00=nb02 case for cpy (#17081)

Assets 16

07 Nov 22:57

github-actions

b6979

ac76d36

b6979

vulkan : refactor buffer handling in vk_op_f32 (#16840)

* vulkan : refactor/simplify buffer handling in vk_op_* functions

* Combine UMA handling into ggml_vk_tensor_subbuffer

Assets 16

07 Nov 22:50

github-actions

b6978

6515610

b6978

CUDA: fix should_use_mmvf for ne11 == 1 (#17085)

* CUDA: fix should_use_mmvf for ne11 == 1

* Apply suggestion from @am17an

Co-authored-by: Aman Gupta <[email protected]>

---------

Co-authored-by: Aman Gupta <[email protected]>

Assets 16

07 Nov 22:35

github-actions

b6977

7956bb4

b6977

bench : cache the llama_context state at computed depth (#16944)

* bench : cache llama_context state at depth

* cont : handle failures to restore the old state

* cont : print information when the state is being reused

Assets 16

07 Nov 22:25

github-actions

b6976

9008027

b6976

hparams : add n_embd_inp() to support extended embed (#16928)

* add n_embd_full to support extended embed

* don't change output

* rename to n_embd_inp

* restore n_embd where applicable

Assets 16

07 Nov 21:13

github-actions

b6975

16bcc12

b6975

kv-cache : pad the cache size to 256 for performance (#17046)

* kv-cache : pad the size of the small SWA cache for performance

* context : pad the total context to 256

* cont : future-proof the swa pad

* server : adjust test params to new logic

Assets 16

07 Nov 19:00

github-actions

b6974

9eb9a13

b6974

Revert "ggml-cpu: detect correct cpu flags for arm64 (#16229) (#16239…

Assets 16

07 Nov 18:31

github-actions

b6973

7c23f3f

b6973

ggml-cpu: detect correct cpu flags for arm64 (#16229) (#16239)

When using GCC 9 and GCC 12 on the arm64 platform of ubuntu 2004,
the command "gcc -mcpu=native -E -v -" fails to detect the correct CPU flags,
which results in compilation failures for certain extended instructions,
but the correct CPU flags can be obtained by using gcc -march.

Signed-off-by: lizhenneng <[email protected]>
Co-authored-by: lizhenneng <[email protected]>

Assets 16

07 Nov 11:13

github-actions

b6972

8c0d6bb

b6972

server : print the samplers chain for each request (#17070)

Assets 16

07 Nov 11:11

github-actions

b6971

5c9a18e

b6971

common: move download functions to download.(cpp|h) (#17059)

* common: move download functions to download.(cpp|h)

* rm unused includes

* minor cleanup

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 16

Releases: ggml-org/llama.cpp

b6980

Uh oh!

b6979

Uh oh!

b6978

Uh oh!

b6977

Uh oh!

b6976

Uh oh!

b6975

Uh oh!

b6974

Uh oh!

b6973

Uh oh!

b6972

Uh oh!

b6971

Uh oh!