Summary
An attacker‐supplied GGUF model vocabulary can trigger a buffer overflow in llama.cpp’s vocabulary‐loading code. Specifically, the helper _try_copy in llama.cpp/src/vocab.cpp: llama_vocab::impl::token_to_piece() casts a very large size_t token length into an int32_t, causing the length check (if (length < (int32_t)size)) to be bypassed. As a result, memcpy is still called with that oversized size, letting a malicious model overwrite memory beyond the intended buffer. This can lead to arbitrary memory corruption and potential code execution.
Details
The vulnerability lies in the function:
llama.cpp/src/vocab.cpp
llama_vocab::impl::token_to_piece(llama_token token,
char * buf,
int32_t length,
int32_t lstrip,
bool special) const
Specifically, the inline helper _try_copy performs a signed comparison against a potentially oversized size_t without handling cases where size_t exceeds INT32_MAX. When that happens, the cast to int32_t wraps into a negative value, causing the length check to be bypassed and leading to an unchecked memcpy.
// File: llama.cpp/src/vocab.cpp (around line 2570)
auto _try_copy = [=](const char * token, size_t size) -> int32_t {
// 1) Skip up to `lstrip` leading spaces in the token string.
for (int32_t i = 0; i < lstrip && size && *token == ' '; ++i) {
token++;
size--;
}
// 2) Bound check (VULNERABLE):
// - `length` is the maximum number of bytes the caller promised `buf` can hold (signed int32_t).
// - `size` is the unsigned token length (size_t). If size > INT32_MAX, casting to int32_t overflows
// and produces a negative value.
if (length < (int32_t) size) {
// Intention: return a negative error code when the token is too large to fit.
// But when size > INT32_MAX:
// (int32_t)size becomes a negative integer (e.g. size_t=2,147,483,648 → (int32_t)=−2,147,483,648).
// Then (length < negative) is always false, so this branch is skipped.
return -(int32_t) size;
}
// 3) Unchecked memcpy (VULNERABLE):
// At this point, even if `size` is far larger than `length`, the code will reach this memcpy,
// because the prior check falsely evaluated to false when (int32_t)size wrapped negative.
// This copies `size` bytes into `buf`, overrunning the buffer whenever size > length.
memcpy(buf, token, size);
// 4) Return the number of bytes copied (signed).
// Note: this cast also overflows if size > INT32_MAX, but the overflow has already happened.
return (int32_t) size;
};
Why This Check Fails for Extremely Large Tokens:
- Unsigned size vs. Signed length:
- size is
size_t (e.g., 64-bit on most platforms).
- length is
int32_t (maximum positive value = 2,147,483,647).
- Cast Overflow:
- If
token_text.size() > INT32_MAX, then (int32_t) size wraps into a negative value (two’s-complement). For example:
size_t size = 2,147,483,648 // one more than INT32_MAX
(int32_t)size → −2,147,483,648
- The comparison if (length < (int32_t) size) becomes effectively if (small_positive < large_negative), which is always false.
- Unchecked
memcpy
- Because the bound check is bypassed, the code executes memcpy(buf, token, size).
- Even though buf only has room for length bytes,
memcpy uses the full (very large) size, causing a buffer overflow to the tune of billions of bytes.
Callers and Code Paths
Any “token → string” conversion can overflow if token_text.size() > INT32_MAX. Notable call sites include:
- Model loading (each GGUF token string passes through
token_to_piece())
- Detokenization (
llama_vocab::impl::detokenize(...))
- Grammar routines (
llama_grammar_apply_impl, llama_grammar_accept_impl)
- Sampling & infill (
llama_sampler_infill_apply, etc.)
- Public API (
llama_token_to_piece(...))
As soon as llama.cpp loads the oversized token, it will crash with a buffer‐overflow in _try_copy().

Impact
Summary
An attacker‐supplied GGUF model vocabulary can trigger a buffer overflow in llama.cpp’s vocabulary‐loading code. Specifically, the helper
_try_copyinllama.cpp/src/vocab.cpp: llama_vocab::impl::token_to_piece()casts a very largesize_ttoken length into anint32_t, causing the length check (if (length < (int32_t)size)) to be bypassed. As a result, memcpy is still called with that oversized size, letting a malicious model overwrite memory beyond the intended buffer. This can lead to arbitrary memory corruption and potential code execution.Details
The vulnerability lies in the function:
Specifically, the inline helper
_try_copyperforms a signed comparison against a potentially oversizedsize_twithout handling cases wheresize_texceedsINT32_MAX. When that happens, the cast toint32_twraps into a negative value, causing the length check to be bypassed and leading to an unchecked memcpy.Why This Check Fails for Extremely Large Tokens:
size_t(e.g., 64-bit on most platforms).int32_t(maximum positive value =2,147,483,647).token_text.size() > INT32_MAX, then (int32_t) size wraps into a negative value (two’s-complement). For example:memcpymemcpyuses the full (very large) size, causing a buffer overflow to the tune of billions of bytes.Callers and Code Paths
Any “token → string” conversion can overflow if
token_text.size() > INT32_MAX. Notable call sites include:token_to_piece())llama_vocab::impl::detokenize(...))llama_grammar_apply_impl,llama_grammar_accept_impl)llama_sampler_infill_apply, etc.)llama_token_to_piece(...))As soon as llama.cpp loads the oversized token, it will crash with a buffer‐overflow in _try_copy().

Impact
Vulnerability Type
_try_copy().Attack Vector
token_text.size()exceedsINT32_MAX.size_tbypasses the length check and triggers an uncheckedmemcpy.Affected Component
llama_vocab::impl::token_to_piece(), which is invoked by:llama_grammar_apply_impl,llama_grammar_accept_impl)llama_sampler_infill_apply, etc.)llama_token_to_piece())Severity
Consequences
Who Is Impacted
Mitigation & Recommendations
_try_copyso thatlengthandsizeare compared in an unsigned context, for example:sizevalues aboveINT32_MAXcannot bypass the bound check.