-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Labels
bugSomething isn't workingSomething isn't working
Description
The region parameter seems to be hardcoded in the RayConfigGenerator, which makes it impossible to use the Ray backend in regions other than us-east-1.
To reproduce
Run the example from the tutorial in any region other than us-east-1.
from autogluon.cloud import TabularCloudPredictor
import pandas as pd
train_data = pd.read_csv('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')
predictor_init_args = {"label": "class"} # init args you would pass to AG TabularPredictor
predictor_fit_args = {"train_data": train_data, "time_limit": 120} # fit args you would pass to AG TabularPredictor
cloud_predictor = TabularCloudPredictor(
cloud_output_path=BUCKET, backend="ray_aws",
)
cloud_predictor.fit(
predictor_init_args=predictor_init_args,
predictor_fit_args=predictor_fit_args,
instance_type="ml.m5.2xlarge", # Check out supported instance and pricing here: https://aws.amazon.com/sagemaker/pricing/
wait=True, # Set this to False to make it an unblocking call and immediately return
)This crashes because the login credentials are retrieved for us-east-1 (hard-coded default) and the image is tried to be pulled from eu-west-1 (current region).
2025-02-20 13:24:03,364 VINFO command_runner.py:371 -- Running `aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 763104351884.dkr.ecr.us-east-1.amazonaws.com`
...
2025-02-20 13:24:06,605 VINFO command_runner.py:371 -- Running `docker pull 763104351884.dkr.ecr.eu-west-1.amazonaws.com/autogluon-training:1.2.0-cpu-py311`
Error response from daemon: Head "https://763104351884.dkr.ecr.eu-west-1.amazonaws.com/v2/autogluon-training/manifests/1.2.0-cpu-py311": no basic auth credentials
Shared connection to 34.247.195.30 closed.
2025-02-20 13:24:08,218 ERR updater.py:164 -- New status: update-failed
2025-02-20 13:24:08,218 ERR updater.py:166 -- !!!
2025-02-20 13:24:08,218 VERR updater.py:176 -- Exception details: {'show_color': None, 'message': 'SSH command failed.'}
2025-02-20 13:24:08,220 ERR updater.py:178 -- Full traceback: Traceback (most recent call last):
File "/local/home/shchuro/uv_envs/cloud/lib/python3.11/site-packages/ray/autoscaler/_private/updater.py", line 159, in run
self.do_update()
File "/local/home/shchuro/uv_envs/cloud/lib/python3.11/site-packages/ray/autoscaler/_private/updater.py", line 451, in do_update
self.cmd_runner.run_init(
File "/local/home/shchuro/uv_envs/cloud/lib/python3.11/site-packages/ray/autoscaler/_private/command_runner.py", line 722, in run_init
self.run(
File "/local/home/shchuro/uv_envs/cloud/lib/python3.11/site-packages/ray/autoscaler/_private/command_runner.py", line 493, in run
return self.ssh_command_runner.run(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local/home/shchuro/uv_envs/cloud/lib/python3.11/site-packages/ray/autoscaler/_private/command_runner.py", line 379, in run
return self._run_helper(
^^^^^^^^^^^^^^^^^
File "/local/home/shchuro/uv_envs/cloud/lib/python3.11/site-packages/ray/autoscaler/_private/command_runner.py", line 298, in _run_helper
raise click.ClickException(fail_msg) from None
click.exceptions.ClickException: SSH command failed.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working