Skip to content

Conversation

@Aznix07
Copy link
Contributor

@Aznix07 Aznix07 commented Nov 4, 2025

What does this PR do?

Fixes a KeyError: 'rope_parameters_factor' in the GPT-OSS weight conversion script that was preventing users from converting GPT-OSS model weights to HuggingFace format.

Fixes #42003

Problem

After PR #39847 standardized RoPE parameter handling across models, the GPT-OSS conversion script was still trying to access individual parameter keys that no longer exist in the original config:

# Old code - these keys don't exist anymore
"beta_fast": float(original_config.pop("rope_ntk_beta")), 
"beta_slow": float(original_config.pop("rope_ntk_alpha")),
"factor": float(original_config.pop("rope_parameters_factor")),

Reproduction

huggingface-cli download openai/gpt-oss-20b
python src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py \
    --input_dir path/to/original \
    --output_dir ./output

# KeyError: 'rope_parameters_factor'

Solution

Added proper handling for both config formats:

# Handle both old and new config formats for rope_parameters
    if "rope_parameters" in original_config:
        # New format: rope_parameters already exists as a dict
        rope_parameters = original_config.pop("rope_parameters")
        # Ensure rope_type is set
        if "rope_type" not in rope_parameters:
            rope_parameters["rope_type"] = "yarn"
    else:
        # Old format: construct rope_parameters from individual keys
        # Fallback to default values if rop_parameters is missing
        rope_parameters = {
            # "beta_fast": float(original_config.pop("rope_ntk_beta")),
            # "beta_slow": float(original_config.pop("rope_ntk_alpha")),
            # "factor": float(original_config.pop("rope_parameters_factor")),
            "rope_type": "yarn",
            "truncate": False,
            "original_max_position_embeddings": 4096,
            "beta_fast": 32.0,
            "beta_slow": 1.0,
            "factor": 32.0,
        }

Testing

✅ Verified conversion script no longer throws KeyError
✅ Tested fallback logic with both config formats
✅ Default values match configuration_gpt_oss.py

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the [contributor guideline, Pull Request section?
  • Was this discussed/approved via a Github issue or the [forum]? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes? (N/A - bug fix in conversion script)
  • Did you write any new necessary tests? (Added manual verification test)

Who can review?

@zucchini-nlp

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am quite lost here since I didn't work with GPT-OSS. To make sure, the original_config is an HF config object iiuc and we need to make sure it is Yarn if no params are saved in config?

Comment on lines 172 to 174
# "beta_fast": float(original_config.pop("rope_ntk_beta")),
# "beta_slow": float(original_config.pop("rope_ntk_alpha")),
# "factor": float(original_config.pop("rope_parameters_factor")),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we hardcode the values safely? I think it was better to obtain from the config

@Aznix07
Copy link
Contributor Author

Aznix07 commented Nov 6, 2025

Thanks for reviewing @zucchini-nlp!

Let me clarify:

  1. original_config is the raw config loaded from the checkpoint's config.json file (Not an HF config object yet)
  2. After PR 🚨 [v5] Refactor RoPE for layer types #39847, the HF config class expects rope_parameters as a dict, but the conversion script needs to handle both old and new checkpoint formats since the changes are backward compatible

You're absolutely correct. We should try to obtain values from config with defaults as fallback. I'll update to use .pop(key, default) to handle cases where these keys might not exist in the original checkpoint, and remove the commented lines.

Updating now..

@github-actions
Copy link
Contributor

github-actions bot commented Nov 6, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: gpt_oss

@Aznix07
Copy link
Contributor Author

Aznix07 commented Nov 6, 2025

Updated! changes made:

  • Removed all commented code
  • Now using .pop(key, default_value) to try obtaining values from the original checkpoint first
  • Falls back to defaults if keys don't exist

Ready for another look!

@shubhagr-qc
Copy link

Thanks @Aznix07 for making efforts to solve the issue raised. I noticed that the issue is mostly because model config.json contains "rope_scaling_factor" as the key instead of "rope_parameters_factor" which the script is looking for.

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still not clear why the original_config has rope paramaters if it is not in HF format yet. I believe there are some configs already with "rope_parameters" to be converted

Lgtm then!

@Aznix07
Copy link
Contributor Author

Aznix07 commented Nov 6, 2025

Thank you @zucchini-nlp for the review and approval!

@shubhagr-qc Thanks for identifying rope_scaling_factor. The current implementation should handle it safely with the default fallback. If there are other edge cases, we can address them in a follow-up PR.

@zucchini-nlp zucchini-nlp enabled auto-merge (squash) November 6, 2025 12:35
@zucchini-nlp zucchini-nlp merged commit fe5ca9d into huggingface:main Nov 6, 2025
14 checks passed
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Aznix07 Aznix07 deleted the fix-gpt-oss-conversion branch November 7, 2025 05:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

KeyError: 'rope_parameters_factor' when using convert_gpt_oss_weights_to_hf.py for openai/gpt-oss-20b

4 participants