Skip to content

Backend "ray_aws" cannot be used in regions other than us-east-1 #178

@shchur

Description

@shchur

The region parameter seems to be hardcoded in the RayConfigGenerator, which makes it impossible to use the Ray backend in regions other than us-east-1.

To reproduce

Run the example from the tutorial in any region other than us-east-1.

from autogluon.cloud import TabularCloudPredictor
import pandas as pd
train_data = pd.read_csv('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')
predictor_init_args = {"label": "class"}  # init args you would pass to AG TabularPredictor
predictor_fit_args = {"train_data": train_data, "time_limit": 120}  # fit args you would pass to AG TabularPredictor
cloud_predictor = TabularCloudPredictor(
    cloud_output_path=BUCKET, backend="ray_aws",
)
cloud_predictor.fit(
    predictor_init_args=predictor_init_args,
    predictor_fit_args=predictor_fit_args,
    instance_type="ml.m5.2xlarge",  # Check out supported instance and pricing here: https://aws.amazon.com/sagemaker/pricing/
    wait=True,  # Set this to False to make it an unblocking call and immediately return
)

This crashes because the login credentials are retrieved for us-east-1 (hard-coded default) and the image is tried to be pulled from eu-west-1 (current region).

2025-02-20 13:24:03,364	VINFO command_runner.py:371 -- Running `aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 763104351884.dkr.ecr.us-east-1.amazonaws.com`
...
2025-02-20 13:24:06,605	VINFO command_runner.py:371 -- Running `docker pull 763104351884.dkr.ecr.eu-west-1.amazonaws.com/autogluon-training:1.2.0-cpu-py311`
Error response from daemon: Head "https://763104351884.dkr.ecr.eu-west-1.amazonaws.com/v2/autogluon-training/manifests/1.2.0-cpu-py311": no basic auth credentials
Shared connection to 34.247.195.30 closed.
2025-02-20 13:24:08,218	ERR updater.py:164 -- New status: update-failed
2025-02-20 13:24:08,218	ERR updater.py:166 -- !!!
2025-02-20 13:24:08,218	VERR updater.py:176 -- Exception details: {'show_color': None, 'message': 'SSH command failed.'}
2025-02-20 13:24:08,220	ERR updater.py:178 -- Full traceback: Traceback (most recent call last):
  File "/local/home/shchuro/uv_envs/cloud/lib/python3.11/site-packages/ray/autoscaler/_private/updater.py", line 159, in run
    self.do_update()
  File "/local/home/shchuro/uv_envs/cloud/lib/python3.11/site-packages/ray/autoscaler/_private/updater.py", line 451, in do_update
    self.cmd_runner.run_init(
  File "/local/home/shchuro/uv_envs/cloud/lib/python3.11/site-packages/ray/autoscaler/_private/command_runner.py", line 722, in run_init
    self.run(
  File "/local/home/shchuro/uv_envs/cloud/lib/python3.11/site-packages/ray/autoscaler/_private/command_runner.py", line 493, in run
    return self.ssh_command_runner.run(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local/home/shchuro/uv_envs/cloud/lib/python3.11/site-packages/ray/autoscaler/_private/command_runner.py", line 379, in run
    return self._run_helper(
           ^^^^^^^^^^^^^^^^^
  File "/local/home/shchuro/uv_envs/cloud/lib/python3.11/site-packages/ray/autoscaler/_private/command_runner.py", line 298, in _run_helper
    raise click.ClickException(fail_msg) from None
click.exceptions.ClickException: SSH command failed.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions