GPU provisioning respected by SkyPilot job, but not isolated in SSH

### Environment
Server infra: Minikube Kubernetes
SkyPilot component tested: API Server deployment + Job execution
GPU hardware on node: NVIDIA RTX 4090 × 2

### Description
When testing with the following SkyPilot task config, jobs work as expected:
- Job runs on exactly 1 GPU
- Kubernetes Pod resource limit is also set to 1 GPU (nvidia.com/gpu: 1)

Task Config Used (Quickstart example): 
```yaml
resources:
  # Optional; if left out, automatically pick the cheapest cloud.
  infra: aws
  accelerators: RTX4090:1

# Working directory (optional) containing the project codebase.
# Its contents are synced to ~/sky_workdir/ on the cluster.
workdir: .

# Typical use: pip install -r requirements.txt
# Invoked under the workdir (i.e., can use its files).
setup: |
  echo "Running setup."

# Typical use: make use of resources, such as running training.
# Invoked under the workdir (i.e., can use its files).
run: |
  echo "Hello, SkyPilot!"
  conda env list
```

### Problem
Although job scheduling respects the GPU provisioning limit, isolation is not enforced at the user session or container runtime level:
- After connecting to the SkyPilot cluster via SSH, running nvidia-smi shows both GPUs (GPU0, GPU1).
- When executing a Python process inside the provisioned Pod container via SSH session, running DDP (Distributed Data Parallel) or other multi-GPU utilizing Python commands ends up using all 2 GPUs, not the 1 GPU provisioned or limited by SkyPilot. (Is this related to the NVIDIA GPU Operator?)

Has anyone encountered the same or a similar issue during local Kubernetes testing (like Minikube)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GPU provisioning respected by SkyPilot job, but not isolated in SSH #8116

Environment

Description

Problem

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GPU provisioning respected by SkyPilot job, but not isolated in SSH #8116

Description

Environment

Description

Problem

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions