generated from kubernetes/kubernetes-template-project
-
Notifications
You must be signed in to change notification settings - Fork 451
Open
Labels
kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.
Description
What happened:
when enable waitForPodsReady feature, submit batch job, the jobframework JobReconciler don't update PodsReady condition timely when update status failed, the error message is as follows
{"level":"error","ts":"2025-10-23T09:54:18.123509681Z","caller":"jobframework/reconciler.go:523","msg":"Updating workload status","controller":"job","controllerGroup":"batch","controllerKind":"Job","Job":{"name":"k8s-job-20m-mxy","namespa
ce":"default"},"namespace":"default","name":"k8s-job-20m-mxy","reconcileID":"23c12b4b-786f-42f5-bd58-a10213f56e19","job":"default/k8s-job-20m-mxy","gvk":"batch/v1, Kind=Job","error":"Internal error occurred: failed calling webhook \"vwork
load.kb.io\": Post \"https://kueue-webhook-service.kueue-system.svc:443/validate-kueue-x-k8s-io-v1beta1-workload?timeout=10s\": dial tcp 169.169.109.130:443: connect: connection refused","stacktrace":"sigs.k8s.io/kueue/pkg/controller/jobf
ramework.(*JobReconciler).ReconcileGenericJob\n\t/Users/didi/ml/kueue/pkg/controller/jobframework/reconciler.go:523\nsigs.k8s.io/kueue/pkg/controller/jobframework.(*genericReconciler).Reconcile\n\t/Users/didi/ml/kueue/pkg/controller/jobfr
amework/reconciler.go:1522\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/Users/didi/ml/kueue/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controlle
r-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/Users/didi/ml/kueue/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:340\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Contro
ller[...]).processNextWorkItem\n\t/Users/didi/ml/kueue/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:300\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.1\n\t/Users/didi
/ml/kueue/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:202"}
What you expected to happen:
set PodsReady condition to True timely when updata status failed
How to reproduce it (as minimally and precisely as possible):
for example, in the above situation, access to the webhook sometimes works and sometimes doesn't.
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version): - Kueue version: master
- Cloud provider or hardware configuration:
- OS (e.g:
cat /etc/os-release): - Kernel (e.g.
uname -a): - Install tools:
- Others:
Metadata
Metadata
Assignees
Labels
kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.