Skip to content

Conversation

@Edwinhr716
Copy link

What type of PR is this?

/kind feature

What this PR does / why we need it:

Adds autoKueue, for automated deployment of Topology Aware Scheduling

Which issue(s) this PR fixes:

Fixes #7347

Special notes for your reviewer:

Does this PR introduce a user-facing change?


@k8s-ci-robot
Copy link
Contributor

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Oct 23, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Edwinhr716
Once this PR has been reviewed and has the lgtm label, please assign gabesaba for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 23, 2025
@netlify
Copy link

netlify bot commented Oct 23, 2025

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 37fb1e9
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-kueue/deploys/68fa8544d3ab8e000877caf4

@k8s-ci-robot
Copy link
Contributor

@Edwinhr716: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kueue-verify-main 37fb1e9 link true /test pull-kueue-verify-main

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Contributor

@kannon92 kannon92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the expectation from Kueue team on this "feature"?

Ideally we would have tests on this. Otherwise I worry this will regress over time as TAS features get worked on.

"helm.sh/hook-delete-policy": before-hook-creation
data:
resources.yaml: |-
apiVersion: kueue.x-k8s.io/v1beta1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
apiVersion: kueue.x-k8s.io/v1beta1
apiVersion: kueue.x-k8s.io/v1beta2

nit I think it should already work and seems better to use new versions going forward.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: {{ .Release.Name }}-kueue-hook-crb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: {{ .Release.Name }}-kueue-hook-crb
name: {{ .Release.Name }}-autokueue-hook-crb

And the name for the role I would say -autokueue-hook-clusterrole

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: {{ .Release.Name }}-kueue-hook-clusterrole
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: {{ .Release.Name }}-kueue-hook-clusterrole
name: {{ .Release.Name }}-autokueue-hook-clusterrole

wdyt?

serviceAccountName: {{ .Release.Name }}-kueue-hook-sa
containers:
- name: kubectl-apply
image: bitnami/kubectl:latest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use some image that is hosted by k8s registry?

Please explore https://explore.ggcr.dev/?repo=registry.k8s.io

there is registry.k8s.io/kubectl hosted.

|-----|------|---------|-------------|
| autoKueue.tasLevels | list | `[{name: cloud.provider.com/topology-block}]` | Defines the TAS levels |
| autoKueue.nodeLabel | object | `{cloud.provider.com/node-group: "tas-group"}` | Sets the Resource flavor node label |
| autoKueue.clusterQueueName | string | `cq` | The name of the cluster queue that will be created |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| autoKueue.clusterQueueName | string | `cq` | The name of the cluster queue that will be created |
| autoKueue.clusterQueueName | string | `default` | The name of the cluster queue that will be created |

wdyt?

kind: ResourceFlavor
apiVersion: kueue.x-k8s.io/v1beta1
metadata:
name: "tas-flavor"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: "tas-flavor"
name: "tas-gpu-default"

since we may need other flavors for other accelerators.
wdyt?

@mimowo
Copy link
Contributor

mimowo commented Oct 24, 2025

What is the expectation from Kueue team on this "feature"?

I think this is a useful feature, but some demo and discussion on wg-batch will be useful in two weeks, wdyt?

Ideally we would have tests on this. Otherwise I worry this will regress over time as TAS features get worked on.

Indeed we need some tests. Unit tests is the bare minimum as we have for helm.

Ideally I would like to do #5145, wdyt @tenzen-y ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add helm chart to deploy resources required by TAS

4 participants