Skip to content

Conversation

@zucchini-nlp
Copy link
Member

@zucchini-nlp zucchini-nlp commented Dec 1, 2025

What does this PR do?

As per title, resolves the TODO from Joao and moves patching for original_max_position_embeddings inside rope dict standardization. That way, original_max_position_embeddings is moved to the correct field once at init time and we can delete similar patches from individual rope func. Note that this is not a breaking change, instead we move near-duplicate code to a single place

cc @hmellor

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp
Copy link
Member Author

run-slow: phi3, phi, llama, mistral, mistral, qwen2_vl, deepseek_v3, qwen2, gemma2, gemma3

@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2025

This comment contains run-slow, running the specified jobs:

models: ["models/deepseek_v3", "models/gemma2", "models/gemma3", "models/llama", "models/mistral", "models/phi", "models/phi3", "models/qwen2", "models/qwen2_vl"]
quantizations: []

@zucchini-nlp
Copy link
Member Author

run-slow: phi3, phi, llama, mistral, mistral, qwen2_vl, deepseek_v3, qwen2, gemma2, gemma3

@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2025

CI Results

Workflow Run ⚙️

Model CI Report

❌ Failed tests

  • gemma3:
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_can_load_with_global_device_set
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_can_load_with_global_device_set
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_bc_torch_dtype
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_can_load_ignoring_mismatched_shapes
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_can_load_with_device_context_manager
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_can_use_safetensors
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_cannot_load_with_meta_device_context_manager
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_causal_lm_can_accept_training_kwargs
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_config
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_config_attn_implementation_setter
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_correct_missing_keys
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_cpu_offload
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_disk_offload_bin
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_disk_offload_safetensors
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_eager_matches_sdpa_generate
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_eager_matches_sdpa_inference_00_fp16_pad_left_sdpa_kernels
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_eager_matches_sdpa_inference_01_fp16_pad_left
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_eager_matches_sdpa_inference_02_fp16_pad_left_no_attn_mask_sdpa_kernels
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_eager_matches_sdpa_inference_03_fp16_pad_left_no_attn_mask
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_eager_matches_sdpa_inference_04_fp16_pad_right_sdpa_kernels
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_eager_matches_sdpa_inference_05_fp16_pad_right
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_eager_matches_sdpa_inference_06_fp16_pad_right_no_attn_mask_sdpa_kernels
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_eager_matches_sdpa_inference_07_fp16_pad_right_no_attn_mask
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_eager_matches_sdpa_inference_08_fp32_pad_left_sdpa_kernels
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_eager_matches_sdpa_inference_09_fp32_pad_left
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_eager_matches_sdpa_inference_10_fp32_pad_left_no_attn_mask_sdpa_kernels
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_eager_matches_sdpa_inference_11_fp32_pad_left_no_attn_mask
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_eager_matches_sdpa_inference_12_fp32_pad_right_sdpa_kernels
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_eager_matches_sdpa_inference_13_fp32_pad_right
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_eager_matches_sdpa_inference_14_fp32_pad_right_no_attn_mask_sdpa_kernels
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_eager_matches_sdpa_inference_15_fp32_pad_right_no_attn_mask
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_eager_matches_sdpa_inference_16_bf16_pad_left_sdpa_kernels
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_eager_matches_sdpa_inference_17_bf16_pad_left
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_eager_matches_sdpa_inference_18_bf16_pad_left_no_attn_mask_sdpa_kernels
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_eager_matches_sdpa_inference_19_bf16_pad_left_no_attn_mask
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_eager_matches_sdpa_inference_20_bf16_pad_right_sdpa_kernels
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_eager_matches_sdpa_inference_21_bf16_pad_right
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_eager_matches_sdpa_inference_22_bf16_pad_right_no_attn_mask_sdpa_kernels
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_eager_matches_sdpa_inference_23_bf16_pad_right_no_attn_mask
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_eager_matches_sdpa_inference_24_fp32_pad_left_output_attentions
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_generation_beyond_sliding_window_tiny_model
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_load_save_without_tied_weights
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_model_rope_scaling_frequencies
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_model_weights_reload_no_missing_tied_weights
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_save_load
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_sdpa_can_compile_dynamic
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_sdpa_can_dispatch_non_composite_models
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_sdpa_can_dispatch_on_flash
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_automodelforcausallm
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_bc_torch_dtype
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_can_load_from_already_mapped_keys
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_can_load_ignoring_mismatched_shapes
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_can_load_with_device_context_manager
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_can_use_safetensors
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_cannot_load_with_meta_device_context_manager
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_config_attn_implementation_setter
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_eager_matches_sdpa_generate
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_eager_matches_sdpa_inference_00_fp16_pad_left_sdpa_kernels
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_eager_matches_sdpa_inference_01_fp16_pad_left
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_eager_matches_sdpa_inference_02_fp16_pad_left_no_attn_mask_sdpa_kernels
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_eager_matches_sdpa_inference_03_fp16_pad_left_no_attn_mask
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_eager_matches_sdpa_inference_04_fp16_pad_right_sdpa_kernels
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_eager_matches_sdpa_inference_05_fp16_pad_right
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_eager_matches_sdpa_inference_06_fp16_pad_right_no_attn_mask_sdpa_kernels
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_eager_matches_sdpa_inference_07_fp16_pad_right_no_attn_mask
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_eager_matches_sdpa_inference_08_fp32_pad_left_sdpa_kernels
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_eager_matches_sdpa_inference_09_fp32_pad_left
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_eager_matches_sdpa_inference_10_fp32_pad_left_no_attn_mask_sdpa_kernels
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_eager_matches_sdpa_inference_11_fp32_pad_left_no_attn_mask
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_eager_matches_sdpa_inference_12_fp32_pad_right_sdpa_kernels
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_eager_matches_sdpa_inference_13_fp32_pad_right
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_eager_matches_sdpa_inference_14_fp32_pad_right_no_attn_mask_sdpa_kernels
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_eager_matches_sdpa_inference_15_fp32_pad_right_no_attn_mask
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_eager_matches_sdpa_inference_16_bf16_pad_left_sdpa_kernels
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_eager_matches_sdpa_inference_17_bf16_pad_left
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_eager_matches_sdpa_inference_18_bf16_pad_left_no_attn_mask_sdpa_kernels
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_eager_matches_sdpa_inference_19_bf16_pad_left_no_attn_mask
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_eager_matches_sdpa_inference_20_bf16_pad_right_sdpa_kernels
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_eager_matches_sdpa_inference_21_bf16_pad_right
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_eager_matches_sdpa_inference_22_bf16_pad_right_no_attn_mask_sdpa_kernels
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_eager_matches_sdpa_inference_23_bf16_pad_right_no_attn_mask
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_eager_matches_sdpa_inference_24_fp32_pad_left_output_attentions
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_load_save_without_tied_weights
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_model_weights_reload_no_missing_tied_weights
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_reverse_loading_mapping
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_save_load
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_sdpa_can_compile_dynamic
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_sdpa_can_dispatch_non_composite_models
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3IntegrationTest::test_dynamic_sliding_window_is_default
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3IntegrationTest::test_export_text_only
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3IntegrationTest::test_generation_beyond_sliding_window_1_sdpa
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3IntegrationTest::test_generation_beyond_sliding_window_2_eager
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3IntegrationTest::test_model_1b_text_only
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3IntegrationTest::test_model_4b_batch
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3IntegrationTest::test_model_4b_batch_crops
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3IntegrationTest::test_model_4b_bf16
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3IntegrationTest::test_model_4b_crops
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3IntegrationTest::test_model_4b_multiimage

  • phi3:
    tests/models/phi3/test_modeling_phi3.py::Phi3IntegrationTest::test_model_phi3_mini_128k_instruct_logits

Comment on lines 583 to 585
original_max_position_embeddings (`int`, *optional*):
Used with 'dynamic', 'longrope' and 'llama3'. The original max position embeddings used during
Used with 'yarn', 'longrope' and 'llama3'. The original max position embeddings used during
pretraining.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dynamic uses config.max_position_embedding and doesn't require us to set explicit original_max_position_embeddings in rope dict

def test_model_rope_scaling_frequencies(self):
"""Tests the frequency properties of the different RoPE scaling types on the model RoPE layer."""
config, _ = self.model_tester.prepare_config_and_inputs_for_common()
config.layer_types = ["full_attention", "sliding_attention"]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the below code sets rope params for two layer types, but the dummy config doesn't always get init with both. This line makes sure that layer types are in line with rope params

rope_type = self.rope_type
original_inv_freq = self.original_inv_freq
prefix = ""
original_max_position_embeddings = self.config.rope_parameters["original_max_position_embeddings"]
Copy link
Member Author

@zucchini-nlp zucchini-nlp Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

safe to assume it already exists. We move original_max_position_embeddings to its correct location at config init time

@zucchini-nlp
Copy link
Member Author

run-slow: phi3, phi, llama, mistral, mistral, qwen2_vl, deepseek_v3, qwen2, gemma2, gemma3

@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: gemma3

@zucchini-nlp
Copy link
Member Author

run-slow: phi3, phi, llama, mistral, mistral, qwen2_vl, deepseek_v3, qwen2, gemma2, gemma3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants