Skip to content
Open
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 24 additions & 3 deletions mlir/utils/performance/tuningRunner.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,10 @@ def verify_kernel_with_perfconfig(perfconfig, config, paths: Paths, options: Opt
] + mlir_cpu_runner_args

if options.debug:
print('Running commands:', file=sys.stderr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is only in verify_kernel_with_perfconfig(), what if verify="none"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If verify="none" then we don't verify, so there is no command to print

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are some commands to print, the tuning commands. rocmlir-gen and tuning-driver

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find myself adding a print for those when something fails in tuning changes, it'd be nice to have them if -debug is passed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you mean in tune_mlir_kernels right? I can add additional prints there, I end up doing the same, adding prints there if something fails

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

print(rocmlir_gen_command, file=sys.stderr)
print(rocmlir_driver_command, file=sys.stderr)
print(profiler_command, file=sys.stderr)

prevdir = os.getcwd()
with tempfile.TemporaryDirectory() as tmpdir:
Expand Down Expand Up @@ -123,7 +126,9 @@ def get_winning_config(tuning_output, test_vector, config, all_data, paths: Path
max_tflops = -np.inf
min_ns = np.inf
winning_config = "None"
for i, result in enumerate(tuning_output):
num_tuning_outputs = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just fail here if "len(tuning_output.splitlines())==0"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why but I tried that and it didn't work

for i, result in enumerate(tuning_output.splitlines()):
num_tuning_outputs += 1
Comment on lines +129 to +131
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manual counter num_tuning_outputs is redundant since enumerate already provides the count. After the loop completes, 'i + 1' would give the total count (or check if the loop never executed). Consider using 'len(tuning_output.splitlines())' before the loop or checking if the loop executed by tracking whether winning_config was updated.

Copilot uses AI. Check for mistakes.
result = result.decode('utf-8').strip()
if not options.quiet and not options.compact_print and i > 0 and i % 100 == 0:
print(
Expand Down Expand Up @@ -165,6 +170,8 @@ def get_winning_config(tuning_output, test_vector, config, all_data, paths: Path
f"Tested {i} configs, best perf {max_tflops} TFlops {min_ns} ns on perf_config {winning_config}",
file=sys.stderr)

if (num_tuning_outputs == 0):
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Unnecessary parentheses around the condition. Python style guidelines recommend removing parentheses from simple if conditions.

Suggested change
if (num_tuning_outputs == 0):
if num_tuning_outputs == 0:

Copilot uses AI. Check for mistakes.
raise RuntimeError('tuning output is empty')
return winning_config, max_tflops
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to print the command lines here, instead of the verify function



Expand Down Expand Up @@ -194,7 +201,16 @@ def tune_mlir_kernels(configs, conf_class, paths: Paths, options: Options):
stdin=kernel_gen.stdout,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
kernel_gen.stdout.close()

# Wait for both processes to finish.
tuning_loop_stdout, _ = tuning_loop.communicate()
kernel_gen.communicate()
Comment on lines +211 to +214
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kernel_gen.stdout is closed before waiting for kernel_gen to complete. Line 204 was previously 'kernel_gen.stdout.close()' but is now removed. Without closing kernel_gen.stdout before calling communicate(), tuning_loop may not receive EOF and could hang. The original close() call should be restored before the communicate() calls.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong but I think this incorrect. As far as I understand, communicate will read all data from stdout/stderr until EOF. So if we add the close before the communicate it would actually fail, because there is nothing to read (file descriptor was closed)


# Make sure both processes finished successfully.
if kernel_gen.returncode != 0:
raise RuntimeError(f'rocmlir-gen command failed: {kernel_gen_command}')
if tuning_loop.returncode != 0:
raise RuntimeError(f'rocmlir-tuning-driver command failed: {paths.mlir_paths.rocmlir_tuning_driver_path} {tuning_driver_args}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we use exceptions, can we add a catch somewhere to print everything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My goal with this is to print just that error message, what else is there to print?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think letting RuntimeError without a catch generates ugly messages. I'd prefer something like "error: ..." but consider this a nit.

Copy link
Contributor Author

@pabloantoniom pabloantoniom Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the exception it prints the backtrace and a good error. This is what I get:

Traceback (most recent call last):
  File "/home/pamartin/rocMLIR/build/./bin/tuningRunner.py", line 531, in <module>
    sys.exit(main())
  File "/home/pamartin/rocMLIR/build/./bin/tuningRunner.py", line 494, in main
    winners, all_data = tune_mlir_kernels(configs, conf_class, paths, options)
  File "/home/pamartin/rocMLIR/build/./bin/tuningRunner.py", line 216, in tune_mlir_kernels
    raise RuntimeError(f'rocmlir-gen command failed: {kernel_gen_command}')
RuntimeError: rocmlir-gen command failed: /home/pamartin/rocMLIR/build/bin/rocmlir-gen -operation gemm -t f16 -out_datatype f16 --arch gfx950:sramecc+:xnack- --num_cu 256 -g 1 -m 1 -k 768 -n 768 -transA=False -transB=False --kernel-repeats 10 --perf_config= --device=0

else:
# pipe to rocmlir_gen --emit-tuning-key
tuning_key = subprocess.Popen(
Expand All @@ -211,8 +227,13 @@ def tune_mlir_kernels(configs, conf_class, paths: Paths, options: Options):
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)

# Wait and make sure the process finished successfully.
tuning_loop_stdout, _ = tuning_loop.communicate()
if tuning_loop.returncode != 0:
raise RuntimeError(f'rocmlir-tuning-driver command failed: {paths.mlir_paths.rocmlir_tuning_driver_path} {tuning_driver_args}')

# Tune, printing progress as we go to avoid CI timeouts
winning_config, max_tflops = get_winning_config(tuning_loop.stdout, test_vector, config,
winning_config, max_tflops = get_winning_config(tuning_loop_stdout, test_vector, config,
all_data, paths, options)

if options.verify_mode != "none":
Expand Down
Loading