-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-54173][K8S] Add support for Deployment API on K8s #52867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
cc @sunchao also @holdenk @dongjoon-hyun |
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
Outdated
Show resolved
Hide resolved
|
Thanks @dongjoon-hyun for the review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ForVic ! Looks mostly good to me (already reviewed internally). Just one question around the new conf.
|
|
||
| val KUBERNETES_EXECUTOR_POD_DELETION_COST = | ||
| ConfigBuilder("spark.kubernetes.executor.podDeletionCost") | ||
| .doc("Value to set for the controller.kubernetes.io/pod-deletion-cost" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can emphasize that this must be set when dynamic allocation is enabled and the deployment-based allocator is enabled.
Also, do we need this configuration at all? can we just set this to some large number such as 100 when deployment-based allocator and dynamic allocation is enabled?
What changes were proposed in this pull request?
Adds support for K8s
DeploymentAPI to allocate pods.Why are the changes needed?
Allocating individual pods is not ideal, and we can allocate with higher level APIs. #33508 helps this by adding an interface for arbitrary allocators and adds a statefulset allocator. However, dynamic allocation only works if you have implemented a PodDisruptionBudget associated with the decommission label. Since Deployment uses ReplicaSet, which supports
pod-deletion-costannotation, we can avoid needing to create a separate PDB resource, and allow dynamic allocation (w/ shuffle tracking) by adding a low deletion cost to executors we are scaling down. When we scale the Deployment, it will choose to scale down the pods with the low deletion cost.Does this PR introduce any user-facing change?
Yes, adds user-facing configs
How was this patch tested?
New unit tests + passing existing unit tests + tested in a cluster with shuffle tracking and dynamic allocation enabled
Was this patch authored or co-authored using generative AI tooling?
No