Skip to content

Workflow runs stuck in queued state for far too long #4287

@yaxiongzhao-genesis

Description

@yaxiongzhao-genesis

Checks

Controller Version

0.13.0

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

1. Install action runner controller with helm:

[27/10/25 2:10:52] ➜  playground git:(main) cat actions-runner-controler/values-arc.yml 
githubConfigSecret: arc-github-pat-secret%
helm upgrade --install arc \
    --values actions-runner-controler/values-arc.yml \
    oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set-controller

2. Install runner scale set using helm:

[27/10/25 3:05:00] ➜  playground git:(main) cat actions-runner-controler/values-runner-scale-set.yml 

githubConfigSecret: arc-github-pat-secret
githubConfigUrl: https://github.com/yaxiongzhao-genesis/gs-core
runnerScaleSetName: gpu-runners

# Optional: Configure minimum and maximum runners
minRunners: 1
# maxRunners: 10

# Optional: Configure runner timeout
runnerTimeout: 10800 # 3 hours

template:
  spec:
    containers:
      - name: runner
        image: ghcr.io/actions/actions-runner:latest
        command: ["/home/runner/run.sh"]
        # GPU resource requests and limits
        resources:
          requests:
            nvidia.com/gpu: 1
          limits:
            nvidia.com/gpu: 1

helm diff upgrade --install gpu-runners \
oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set \
--values actions-runner-controler/values-runner-scale-set.yml

3. Launch a demo workflow by manually dispatching the workflow on github:
 action.yml
[27/10/25 3:07:32] ➜  gs-core git:(main) cat .github/workflows/demo-self-hosted-runner.yml 
name: Demo - self-hosted runner

# Trigger the workflow on push to main branch or manual trigger
on:
  workflow_dispatch:
  push:
    branches:
      - main
jobs:
  hello-world:
    # Run on CoreWeave ARC runner with GPU support
    runs-on: gpu-runners

    steps:
      - uses: actions/checkout@v4
      - name: Print system information
        run: |
          echo "🚀 Hello World from CoreWeave!"
          echo "================================"
          echo "Runner: $(hostname)"
          echo "User: $(whoami)"
          echo "Date: $(date)"
          echo "PWD: $(pwd)"
          echo "OS: $(uname -a)"

      - name: Check GPU availability
        run: |
          echo "🔍 Checking GPU availability..."
          if command -v nvidia-smi &> /dev/null; then
            echo "✅ nvidia-smi found - GPU support available!"
            nvidia-smi
          else
            echo "❌ nvidia-smi not found - checking for GPU devices..."
            ls -la /dev/nvidia* 2>/dev/null || echo "No NVIDIA devices found"
          fi

4. Watch the job to be picked up, and it hasn't been picked up for up to 30 minutes

Describe the bug

The workflow runs not picked up by action runners

Describe the expected behavior

The action runners pick up the workflow runs

arc-gha-rs-controller-6865f855c9-vhjhx-logs.txt

gpu-runners-7c4844cc-listener-logs.txt

Additional Context

N/A

Controller Logs

https://gist.github.com/yaxiongzhao-genesis/8cb38e74945d99c48c6c468e5381629b

Runner Pod Logs

https://gist.github.com/yaxiongzhao-genesis/2acff0fb2404faebc40d350b6213bb06

https://gist.github.com/yaxiongzhao-genesis/bb77ddf0fdee1d206c81362cadba59d7

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggha-runner-scale-setRelated to the gha-runner-scale-set modeneeds triageRequires review from the maintainers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions