Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b6980
CUDA: properly handle nb00=nb02 case for cpy (#17081)
b6979
vulkan : refactor buffer handling in vk_op_f32 (#16840) * vulkan : refactor/simplify buffer handling in vk_op_* functions * Combine UMA handling into ggml_vk_tensor_subbuffer
b6978
CUDA: fix should_use_mmvf for ne11 == 1 (#17085) * CUDA: fix should_use_mmvf for ne11 == 1 * Apply suggestion from @am17an Co-authored-by: Aman Gupta <[email protected]> --------- Co-authored-by: Aman Gupta <[email protected]>
b6977
bench : cache the llama_context state at computed depth (#16944) * bench : cache llama_context state at depth * cont : handle failures to restore the old state * cont : print information when the state is being reused
b6976
hparams : add n_embd_inp() to support extended embed (#16928) * add n_embd_full to support extended embed * don't change output * rename to n_embd_inp * restore n_embd where applicable
b6975
kv-cache : pad the cache size to 256 for performance (#17046) * kv-cache : pad the size of the small SWA cache for performance * context : pad the total context to 256 * cont : future-proof the swa pad * server : adjust test params to new logic
b6974
Revert "ggml-cpu: detect correct cpu flags for arm64 (#16229) (#16239…
b6973
ggml-cpu: detect correct cpu flags for arm64 (#16229) (#16239) When using GCC 9 and GCC 12 on the arm64 platform of ubuntu 2004, the command "gcc -mcpu=native -E -v -" fails to detect the correct CPU flags, which results in compilation failures for certain extended instructions, but the correct CPU flags can be obtained by using gcc -march. Signed-off-by: lizhenneng <[email protected]> Co-authored-by: lizhenneng <[email protected]>
b6972
server : print the samplers chain for each request (#17070)
b6971
common: move download functions to download.(cpp|h) (#17059) * common: move download functions to download.(cpp|h) * rm unused includes * minor cleanup --------- Co-authored-by: Georgi Gerganov <[email protected]>