vllm fix check on max vocab size (#22471)

xw285cornell · facebook-github-bot · commit afbb4f4e14f1 · 2025-08-17T12:25:22.000-07:00
Summary:

the tokenizer.vocab_size and model.vocab_size can be different. For QWen model, the tokenizer max token id is 151643 and the model config is `"vocab_size": 151936`. If we send an id between 151643 and 151936, it'll fail. Though in reality the tokenizer will just put ''. 

It's probably still valid to send the ids in between, because the model can legitimately produce such token id.

Test Plan:
Send 151860 and it's passing. Send 152860 and it complained about invalid token.

Rollback Plan:

Reviewed By: tensormeta, houseroad

Differential Revision: D79840114
diff --git a/vllm/v1/engine/processor.py b/vllm/v1/engine/processor.py
@@ -382,7 +382,7 @@ def _validate_model_input(
         else:
             tokenizer = self.tokenizer.get_lora_tokenizer(lora_request)
             max_input_id = max(prompt_ids, default=0)
-            if max_input_id > tokenizer.max_token_id:
+            if max_input_id > self.model_config.get_vocab_size() - 1:
                 raise ValueError(
                     f"Token id {max_input_id} is out of vocabulary")