Skip to content

Conversation

@rahul-tuli
Copy link
Member

@rahul-tuli rahul-tuli commented Sep 29, 2025

Eagle3 drafters were incorrectly inheriting the verifier's quantization
configuration instead of using their own, causing KeyError when loading
unquantized drafter weights with quantized verifiers.

This implements a clean inheritance pattern where:

  • Base LlamaDecoderLayer has configurable get_quant_config() method
  • Eagle3 LlamaDecoderLayer overrides to use drafter's quantization config
  • Uses existing VllmConfig.get_quantization_config() infrastructure

…ance

Eagle3 drafters were incorrectly inheriting the verifier's quantization
configuration instead of using their own, causing KeyError when loading
unquantized drafter weights with quantized verifiers.

This implements a clean inheritance pattern where:
- Base LlamaDecoderLayer has configurable get_quant_config() method
- Eagle3 LlamaDecoderLayer overrides to use drafter's quantization config
- Uses existing VllmConfig._get_quantization_config() infrastructure

Fixes speculative decoding with quantized verifier + unquantized drafter.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: [email protected]

Signed-off-by: Rahul Tuli <[email protected]>
@rahul-tuli
Copy link
Member Author

Landed on vllm main!

@rahul-tuli rahul-tuli closed this Sep 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant