Import/develop/ro cm amdsmi/pr 843 #2203
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
The amd-smi reset -l command was failing on MI300X/MI300A systems with an AttributeError when trying to clean local GPU data. This prevented users from using the process isolation feature properly, which is critical for security in data center environments where GPUs are shared between different users.
Technical Details
Fixed the set_gpu() function in [amdsmi_commands.py] by replacing direct attribute access with getattr() calls to safely handle missing attributes. For complex types (tuples/ints), we now extract values into local variables before use, ensuring consistent access throughout each code block. This defensive approach prevents crashes when different subcommands (like reset) pass args objects with different attribute sets.
JIRA ID
Resolves SWDEV-498649
Test Plan
Tested all reset command variations locally (-l, -c, -f, etc.) and verified they work without throwing AttributeError. Also tested set commands to ensure the changes don't break existing functionality. The fix is particularly important for MI300X/MI300A systems with partition features, though testing was done on consumer GPUs to verify general correctness.
Test Result
All tested commands execute successfully without errors. The amd-smi reset -l -g <gpu_id> command now properly cleans local GPU data instead of crashing. Other commands like amd-smi static and amd-smi set continue to work as expected, confirming backward compatibility.
Submission Checklist