Skip to content

Conversation

@ScottTodd
Copy link
Member

Motivation

On #2368, we found that a design decision made back on #965 resulted in the default release version suffix +rocmsdk{formatted_date} sorting to "more recent" than nightly release version suffixes like +rocm7.10.0a20251124.

This changes the default and dev release versions to use a lower sorting 'devrocm' version suffix instead of 'rocmsdk'.

Technical Details

See the spec at https://packaging.python.org/en/latest/specifications/version-specifiers/#local-version-identifiers.

Notes on how we use the versions:

  • When a workflow is triggered without specifying rocm_version, the default is used since --version-suffix is not set:

    - name: Determine optional arguments passed to `build_prod_wheels.py`
    if: ${{ inputs.rocm_version }}
    run: |
    pip install packaging
    python build_tools/github_actions/determine_version.py \
    --rocm-version ${{ inputs.rocm_version }}

  • A related scenario is possible (and will be more common in the future) when a "dev release" with a version like torch-2.9.1+rocm7.11.0.dev0.c9ae912c3887ae8538f12a99704fc3973a186e83 is installed:

    >>> from packaging.version import Version, parse
    
    # Dev release is newer than a final release (!)
    # This is because the dev release has more "segments" in the local version.
    >>> Version("2.9.1+rocm7.11.0.dev0") > Version("2.9.1+rocm7.11.0")
    True
    
    # Alpha (nightly) release is older than a final release (expected)
    # This is because the numeric '0' always sorts greater than the lexicographically
    # sorted '0a2025'.
    >>> Version("2.9.1+rocm7.11.0a2025") > Version("2.9.1+rocm7.11.0")
    False
  • More versions to consider:

    # in sorted order
    2.9.0+rocmsdk20251202            # CURRENT build_prod_wheels.py default
    2.9.0+rocm7.10.0.dev0-efed3c3    # possible dev release
    2.9.0+rocm7.10.0a20251124        # CURRENT nightly "alpha" release
    2.9.0+devrocm7.10.0.dev0-efed3c3 # possible dev release
    from packaging.version import Version
    >>> version_1 = Version("2.9.0+rocmsdk20251202")
    >>> version_2 = Version("2.9.0+rocm7.10.0.dev0-efed3c3")
    >>> version_3 = Version("2.9.0+rocm7.10.0a20251124")
    >>> version_4 = Version("2.9.0+devrocm7.10.0.dev0-efed3c3")
    >>> version_1 > version_2
    True
    >>> version_2 > version_3
    True
    >>> version_3 > version_4
    True

Test Plan

New unit tests show the sorting behavior.

Copy link
Contributor

@HereThereBeDragons HereThereBeDragons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think i am missing a few explanations here. e.g. why you change to devrocm.
and we probably can put the derive_suffix just as part of the build_prod_wheels?

type: string
rocm_version:
description: ROCm version to pip install
description: ROCm version to pip install (e.g. "7.10.0a20251124")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if i see it correctly these changes are already included in #2394

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I made this PR first but decided the description changes would be a good fit there too. I'll remove them from this PR.

self.assertEqual(
optional_build_prod_arguments,
"--rocm-sdk-version ==7.0.0.dev0+515115ea2cb85a0b71b5507ce56a627d14c7ae73 --version-suffix +rocm7.0.0.dev0-515115ea2cb85a0b71b5507ce56a627d14c7ae73",
"--rocm-sdk-version ==7.0.0.dev0+515115ea2cb85a0b71b5507ce56a627d14c7ae73 --version-suffix +devrocm7.0.0.dev0-515115ea2cb85a0b71b5507ce56a627d14c7ae73",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no? i dont think this is correct to change to devrocm. i think it needs to stay rocm

# | ----------- | ------------------ | -------------------------- |
# | final | 7.10.0 | +rocm7.10.0 |
# | nightly | 7.10.0a20251124 | +rocm7.10.0a20251124 |
# | dev | 7.10.0.dev0+efed3c | +devrocm7.10.0.dev0-efed3c |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as i wrote above, i dont understand why devrocm is used. we are not using it.

e.g.
torch-2.10.0a0+rocm7.10.0.dev0.c7bfa586d12394ce5585c974d969985a72145a92-cp311-cp311-win_amd64.whl

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That dev version is a problem:

>>> from packaging.version import Version, parse
>>> Version("2.10.0a0+rocm7.10.0.dev0.c7bfa58") > Version("2.10.0a0+rocm7.10.0")
True

Final versions should be considered newer than dev versions, but the text after the + follows the rules of https://packaging.python.org/en/latest/specifications/version-specifiers/#local-version-identifiers

Additionally a local version with a great number of segments will always compare as greater than a local version with fewer segments, as long as the shorter local version’s segments match the beginning of the longer local version’s segments exactly.

Nightly versions do not have the same issue:

>>> from packaging.version import Version, parse
>>> Version("2.10.0a0+rocm7.10.0a20251203") > Version("2.10.0a0+rocm7.10.0")
False

The difference there is the 0a20251203 component is newer than 0 due to

When comparing a numeric and lexicographic segment, the numeric section always compares as greater than the lexicographic segment.

We could do this:

-7.10.0.dev0.c7bfa58
+7.10.0dev0.c7bfa58

but then a dev version would still be newer than an alpha/nightly version:

>>> from packaging.version import Version, parse
>>> Version("2.10.0a+7.10.0dev0.c7bfa58") > Version("2.10.0a0+rocm7.10.0a20251203")
True

Back on #965, @marbre said about using rocmsdk instead of rocm for dev versions:

The only downside of the proposed local version identifier is that the ones used for the dev packages result in a version that is evaluated larger as the ones of the nightly packages. With this, dev packages and nightly releases cannot be easily mixed and we might want to revise this later.

So here I'm flipping that "evaluated larger" to "evaluated smaller" by picking devrocm which is lower than rocm and rocmsdk.

"--version-suffix",
default=f"+rocmsdk{formatted_date}",
help="PyTorch version suffix",
default=f"+devrocm{formatted_date}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

devrocm

default=f"+rocmsdk{formatted_date}",
help="PyTorch version suffix",
default=f"+devrocm{formatted_date}",
help="PyTorch version suffix (favor computing with build_tools/github_actions/determine_version.py and passing explicitly)",
Copy link
Contributor

@HereThereBeDragons HereThereBeDragons Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from my understanding it was a bit different what we wanted to do:

  • if --rocm-sdk-version is given, derive the suffix ourselves (just add +rocm as a prefix)
  • if --rocm-sdk-version is not given, extract it from the py venv?

or did i miss here something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I can/should still make that change to remove the default here and infer from the installed packages.

Could move the logic from build_tools/github_actions/determine_version.py to this script, though then we might want to copy/paste the same logic for JAX and other packages...

# |
# |--- devrocm sorts older than rocm

parsed_version = parse(rocm_version)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does parse() do here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://packaging.pypa.io/en/stable/version.html#packaging.version.parse

I use it pretty interchangeably with the Version() class constructor

ScottTodd added a commit that referenced this pull request Dec 4, 2025
…2394)

We've had a few developers trigger "nightly" releases that should have
been "dev" releases and we're still working on putting more technical
barriers in place to prevent that. See for example
#2227 (comment) and
#2280 (comment).

Until then, this adds documentation showing the canonical triggering
patterns for "dev" releases and adds stronger language to workflow input
descriptions guiding developers towards the safe defaults.

See also #2392 that will make
misconfigured release builds cause fewer issues.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: TODO

Development

Successfully merging this pull request may close these issues.

3 participants