Skip to content

Conversation

@babusid
Copy link

@babusid babusid commented Nov 6, 2025

This updated implementation of longrope allows for the consideration of long_factors and short_factors, which are scaling dictionaries provided via HF configs for MSFT's Phi3+ models. In the HF canonical implementation of longrope, once the sequence length exceeds a certain pre-configured dimension, you must use a different set of ext_factors than you were previously. This patch enables this by packing both sets of scaling factors into one argument, and selecting which to use dynamically within the returned prim_func.

The HF implementation of this can be found here:
https://github.com/huggingface/transformers/blob/7b325cd573e40bbb12951b8446176c96e8b1afaa/src/transformers/modeling_rope_utils.py#L521

The link above points directly to the switching logic between long and short factors, which has been replicated in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant