Skip to content

Commit afbb4f4

Browse files
xw285cornellfacebook-github-bot
authored andcommitted
vllm fix check on max vocab size (#22471)
Summary: the tokenizer.vocab_size and model.vocab_size can be different. For QWen model, the tokenizer max token id is 151643 and the model config is `"vocab_size": 151936`. If we send an id between 151643 and 151936, it'll fail. Though in reality the tokenizer will just put ''. It's probably still valid to send the ids in between, because the model can legitimately produce such token id. Test Plan: Send 151860 and it's passing. Send 152860 and it complained about invalid token. Rollback Plan: Reviewed By: tensormeta, houseroad Differential Revision: D79840114
1 parent c55bc1d commit afbb4f4

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

vllm/v1/engine/processor.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -382,7 +382,7 @@ def _validate_model_input(
382382
else:
383383
tokenizer = self.tokenizer.get_lora_tokenizer(lora_request)
384384
max_input_id = max(prompt_ids, default=0)
385-
if max_input_id > tokenizer.max_token_id:
385+
if max_input_id > self.model_config.get_vocab_size() - 1:
386386
raise ValueError(
387387
f"Token id {max_input_id} is out of vocabulary")
388388

0 commit comments

Comments
 (0)