-
Notifications
You must be signed in to change notification settings - Fork 31.3k
Move original_max_position_embeddings to rope params
#42513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…e TODs from Joao
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
…me we init rope comute fn
|
run-slow: phi3, phi, llama, mistral, mistral, qwen2_vl, deepseek_v3, qwen2, gemma2, gemma3 |
|
This comment contains models: ["models/deepseek_v3", "models/gemma2", "models/gemma3", "models/llama", "models/mistral", "models/phi", "models/phi3", "models/qwen2", "models/qwen2_vl"] |
|
run-slow: phi3, phi, llama, mistral, mistral, qwen2_vl, deepseek_v3, qwen2, gemma2, gemma3 |
CI ResultsModel CI Report❌ Failed tests
|
| original_max_position_embeddings (`int`, *optional*): | ||
| Used with 'dynamic', 'longrope' and 'llama3'. The original max position embeddings used during | ||
| Used with 'yarn', 'longrope' and 'llama3'. The original max position embeddings used during | ||
| pretraining. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dynamic uses config.max_position_embedding and doesn't require us to set explicit original_max_position_embeddings in rope dict
| def test_model_rope_scaling_frequencies(self): | ||
| """Tests the frequency properties of the different RoPE scaling types on the model RoPE layer.""" | ||
| config, _ = self.model_tester.prepare_config_and_inputs_for_common() | ||
| config.layer_types = ["full_attention", "sliding_attention"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the below code sets rope params for two layer types, but the dummy config doesn't always get init with both. This line makes sure that layer types are in line with rope params
| rope_type = self.rope_type | ||
| original_inv_freq = self.original_inv_freq | ||
| prefix = "" | ||
| original_max_position_embeddings = self.config.rope_parameters["original_max_position_embeddings"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
safe to assume it already exists. We move original_max_position_embeddings to its correct location at config init time
|
run-slow: phi3, phi, llama, mistral, mistral, qwen2_vl, deepseek_v3, qwen2, gemma2, gemma3 |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: gemma3 |
|
run-slow: phi3, phi, llama, mistral, mistral, qwen2_vl, deepseek_v3, qwen2, gemma2, gemma3 |
What does this PR do?
As per title, resolves the TODO from Joao and moves patching for
original_max_position_embeddingsinside rope dict standardization. That way,original_max_position_embeddingsis moved to the correct field once at init time and we can delete similar patches from individual rope func. Note that this is not a breaking change, instead we move near-duplicate code to a single placecc @hmellor