Adjusted Longrope embedding function to match Huggingface Implementation #18422
+75
−32
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This updated implementation of longrope allows for the consideration of
long_factorsandshort_factors, which are scaling dictionaries provided via HF configs for MSFT's Phi3+ models. In the HF canonical implementation of longrope, once the sequence length exceeds a certain pre-configured dimension, you must use a different set ofext_factorsthan you were previously. This patch enables this by packing both sets of scaling factors into one argument, and selecting which to use dynamically within the returnedprim_func.The HF implementation of this can be found here:
https://github.com/huggingface/transformers/blob/7b325cd573e40bbb12951b8446176c96e8b1afaa/src/transformers/modeling_rope_utils.py#L521
The link above points directly to the switching logic between long and short factors, which has been replicated in this PR.