-
Notifications
You must be signed in to change notification settings - Fork 96
docs: add labels/requirements improvements design #1245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| * Choose **Option B (Requirements)** if the label is static now but may theoretically be expanded/relaxed to be non-static/multi-valued in the future. | ||
| For example `topology.kubernetes.io/region` - today we do not support cross-region node provisioning but we theoretically could. | ||
|
|
||
| **TODO**: Not sure the above is that well-defined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do folks agree with above... rules reasonable enough?
| Labels for which there is one correct value for the whole cluster, but that value may change over time as the cluster topology changes (usually driven by changes to the AKS ManagedCluster data model). | ||
| Examples: `kubernetes.azure.com/ebpf-dataplane`, `kubernetes.azure.com/network-name` | ||
|
|
||
| **TODO**: Fill this in... answer for ebpf-dataplane may need to be special...? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling out TODO to myself, need to fix
| | node.kubernetes.io/instance-type | ✅ | ✅ | ✅ | ✅ | | | ||
| | topology.kubernetes.io/region | ✅ | ✅ | ✅ | ✅ | | | ||
| | topology.kubernetes.io/zone | ✅ | ✅ | ✅ | ✅ | | | ||
| | kubernetes.azure.com/cluster | ✅ | ✅ | ❌ | ✅ | TODO: Today user can overwrite what AKS sets, we need to block | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im tracking this now I can remove this TODO
| **Note**: Well-known requirements that aren't directly about the instance SKU selection and are instead about provisioning details such as OS or zone require special handling in the code to ensure | ||
| that provisioning is done considering them. This is often accomplished via [createOfferings](https://github.com/Azure/karpenter-provider-azure/blob/main/pkg/providers/instancetype/instancetypes.go#L215). | ||
|
|
||
| **TODO: Do we agree on this? Is this a principled enough stance?** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts?
|
|
||
| We will allow these labels to be set on the NodePool/NodeClaim (in addition to the labels we already support as schedulable outlined in the tables above): | ||
| * kubernetes.azure.com/ebpf-dataplane | ||
| * kubernetes.azure.com/cluster-health-monitor-checker-synthetic (due to its high usage...) **TODO:** Learn what this is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling out TODO to myself
| * kubernetes.azure.com/node-image-version | ||
| * kubernetes.azure.com/fips_enabled | ||
| * kubernetes.azure.com/os-sku | ||
| * kubernetes.azure.com/mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we may not need to add this if we just require users always set it on their pool then we can schedule against it anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though if we don't make this a requirement, and user pods request mode: user, we will set mode: user but won't be able to correctly simulate their pods will land on the pool, so it may be worth making it schedulable anyway...
|
|
||
| ### Decision 5: How will we get labels that Karpenter doesn't currently write set on nodes? | ||
|
|
||
| #### Option A: Wait for Machine API migration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI that public Machine API doc should be published soon. Feel free to note something like "see more details in Machine API doc".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capturing comments I have so far
| ## Overview | ||
|
|
||
| This document examines Karpenter and AKS managed labels and discusses what is needed to bring what we currently do more | ||
| in line with AKS does. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| in line with AKS does. | |
| in line with what AKS does. |
(and maybe single line?)
|
|
||
| There are 2 main topics: | ||
| * Label parity with AKS, setting labels that we don't currently set. | ||
| * Support for scheduling simulation on labels we don't currently support, but should to enable easier migrations from cluster autoscaler to Karpenter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * Support for scheduling simulation on labels we don't currently support, but should to enable easier migrations from cluster autoscaler to Karpenter. | |
| * Support for scheduling simulation on labels we don't currently support, but should; to enable easier migrations from cluster autoscaler to Karpenter. |
| * Can be set on Pod when not set on NodePool/NodeClaim | ||
| * Can be set on NodePool/NodeClaim | ||
| * May trigger behavior other than just setting a label (inform VM size, zone, etc) | ||
| 2. Labels written to the nodes but that Karpenter doesn't understand (can't schedule) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth distinguishing between those passed to kubelet by the provider vs. those set via other mechanisms?
|
|
||
| Today, Karpenter has the following categories of labels: | ||
|
|
||
| 1. Labels Karpenter understands and can schedule (Requirements) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rather "target for scheduling"?
| than _just_ information about the instance type itself. For example information about the OS and the instances deployment circumstances (zone, capacity reservation, etc) | ||
| also make sense as well-known requirements. | ||
|
|
||
| We currently shy away from including things that aren't directly related to the instance type, OS, or how it is provisioned in the set of well-known requirements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An example of such thing would be helpful, maybe imageFamily?
| * Historical reasons (including following what upstream/AWS and other providers do). | ||
| * Avoiding having too many default requirements which can cause cpu/memory burden. | ||
|
|
||
| **Note**: Well-known requirements that aren't directly about the instance SKU selection and are instead about provisioning details such as OS or zone require special handling in the code to ensure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"instance SKU" -> "instance type" or "VM SKU"
| also make sense as well-known requirements. | ||
|
|
||
| We currently shy away from including things that aren't directly related to the instance type, OS, or how it is provisioned in the set of well-known requirements. | ||
| This is because: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me the primary reason is still conceptual: requirements (in NodePool or workload) are primarily a filter / selector (a collection of constraints on the existing set), where other things (e.g. settings in NodeClass) contribute to the desired state for the chosen alternative. Zone, Capacity Type, and OS can be thought of as part of this filter / selector. (Possibly to slightly different extent in different providers, e.g. I think on AWS capacity type and maybe even platform/OS are part of the internal capacity pool definition. It looks like both Azure and AWS have instance types that only support one OS ....)
This separation is also useful / easier to reason about from UX perspective and explains why requirements (=filter/selector) generally target simple-valued intrinsic instance type properties (though could match multiple values, and support negation), where other things (=settings) can have complex/structured values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capturing more comments/questions
|
|
||
| * Karpenter core writes all NodeClaim labels to the node, see: [syncNode](https://github.com/kubernetes-sigs/karpenter/blob/38e728c99b530f660f685d53d01f2e9ec9696668/pkg/controllers/nodeclaim/lifecycle/registration.go#L128). | ||
| * Karpenter core translates most requirements from the NodeClaim spec into labels, see [PopulateNodeClaimDetails](https://github.com/kubernetes-sigs/karpenter/blob/38e728c99b530f660f685d53d01f2e9ec9696668/pkg/controllers/nodeclaim/lifecycle/launch.go#L129) | ||
| * Note that `.Labels()` does exclude multi-valued requirements and labels defined in `WellKnownLabels` (registered by the cloud provider). See [Labels()](https://github.com/kubernetes-sigs/karpenter/blob/38e728c99b530f660f685d53d01f2e9ec9696668/pkg/scheduling/requirements.go#L270). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
".Labels() does exclude multi-valued requirements" - is that true? It excludes those that have .Any() produce an empty value, which would only exclude DoesNotExist? It looks like requirement with "In" would use the first value (even if there are multiple), and - IMO the most surprisingly from user perspective - generate random integer label values for "Exists/Gt/Lt" ... So when it does appear to skip multi-valued it is probably due to WellKnownLabels etc.?
".Labels() does exclude ... labels defined in WellKnownLabels (registered by the cloud provider)" - probably should say "including those registered by cloud provider", some come from core. It also excludes labels under RestrictedLabelDomains ("prohibited by kubelet or reserved by karpenter", e.g. "kubernetes.io", "karpenter.sh", "karpenter.azure.com", ...) - unless under LabelDomainExceptions ("not used in a context where they may be passed as argument to kubelet", e.g. "node.kubernetes.io").
| ### Where can I see more about how AKS manages labels? | ||
|
|
||
| https://github.com/Azure/karpenter-poc/issues/1517#issuecomment-3470039930 discusses this further. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should not link to a private repo from a public one?
| WellKnownLabels (like: `kubernetes.azure.com/cluster`): Infinite nodes if the label doesn't match the expected value - otherwise one node. | ||
| Non-WellKnownLabels that we don't write (like: `kubernetes.azure.com/network-name`): Results in `"incompatible requirements, label \"kubernetes.azure.com/network-name\" does not have known values` | ||
| Non-WellKnownLabels that we do write (like `kubernetes.azure.com/ebpf-dataplane`): Results in `incompatible requirements, label \"kubernetes.azure.com/kubelet-identity-clientid\" does not have known values"` - because the simulation engine in Karpenter doesn't know we're going to write it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bulleted list would probably work better here?
| WellKnownLabels (like: `kubernetes.azure.com/cluster`): Infinite nodes if the label doesn't match the expected value - otherwise one node. | |
| Non-WellKnownLabels that we don't write (like: `kubernetes.azure.com/network-name`): Results in `"incompatible requirements, label \"kubernetes.azure.com/network-name\" does not have known values` | |
| Non-WellKnownLabels that we do write (like `kubernetes.azure.com/ebpf-dataplane`): Results in `incompatible requirements, label \"kubernetes.azure.com/kubelet-identity-clientid\" does not have known values"` - because the simulation engine in Karpenter doesn't know we're going to write it. | |
| * WellKnownLabels (like: `kubernetes.azure.com/cluster`): Infinite nodes if the label doesn't match the expected value - otherwise one node. | |
| * Non-WellKnownLabels that we don't write (like: `kubernetes.azure.com/network-name`): Results in `"incompatible requirements, label \"kubernetes.azure.com/network-name\" does not have known values` | |
| * Non-WellKnownLabels that we do write (like `kubernetes.azure.com/ebpf-dataplane`): Results in `incompatible requirements, label \"kubernetes.azure.com/kubelet-identity-clientid\" does not have known values"` - because the simulation engine in Karpenter doesn't know we're going to write it. |
| * Note that `WellKnownLabels` seems to serve two purposes: | ||
| 1. It prevents Karpenter core from automatically including those labels on the node object (though note that AKS Karpenter provider ends up including most of them on the node object anyway). | ||
| 2. It allows pods to ask for that label when the NodeClaim/requirements don't actually have it, and still have scheduling take place (I don't fully understand where/why this is needed but I can see the code for it [here](https://github.com/kubernetes-sigs/karpenter/blob/main/pkg/scheduling/requirements.go#L175)) | ||
| * Requirements come from InstanceType via [computeRequirements](https://github.com/Azure/karpenter-provider-azure/blob/1f327c6d7ac62b3a3a1ad83c0f14f95001ab4ae8/pkg/providers/instancetype/instancetype.go#L135) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Narrative-wise, one would expect this to be about NodeClaim spec requirements - which come from multiple sources (including NodePool requirements and labels) and don't include requirements derived from instance type? (Those get translated into labels by cloud provider ...)
| * Karpenter core writes all NodeClaim labels to the node, see: [syncNode](https://github.com/kubernetes-sigs/karpenter/blob/38e728c99b530f660f685d53d01f2e9ec9696668/pkg/controllers/nodeclaim/lifecycle/registration.go#L128). | ||
| * Karpenter core translates most requirements from the NodeClaim spec into labels, see [PopulateNodeClaimDetails](https://github.com/kubernetes-sigs/karpenter/blob/38e728c99b530f660f685d53d01f2e9ec9696668/pkg/controllers/nodeclaim/lifecycle/launch.go#L129) | ||
| * Note that `.Labels()` does exclude multi-valued requirements and labels defined in `WellKnownLabels` (registered by the cloud provider). See [Labels()](https://github.com/kubernetes-sigs/karpenter/blob/38e728c99b530f660f685d53d01f2e9ec9696668/pkg/scheduling/requirements.go#L270). | ||
| * Labels that get skipped by core (either due to `WellKnownLabels` or having multiple values) are added to the NodeClaim by AKS Karpenter provider in [vmInstanceToNodeClaim](https://github.com/Azure/karpenter-provider-azure/blob/b9c8c82edb289ac5b281c85b0851b5a0c45bc4bb/pkg/cloudprovider/cloudprovider.go#L476). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This happens before core NodeClaim controllers update labels, right? (The wording may be confusing regarding order ...) These labels are populated on Create by the provider from single-valued instance type requirements, and by special logic for multi-valued ones. The associated requirements don't get added to NodeClaim as requirements, and thus don't get observed on NodeClaim by NodeClaim controllers, right?
|
|
||
| ## FAQ | ||
|
|
||
| ### How do the labels get onto the Node? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following describes the sequence in the reverse order, which makes sense, but probably worth noting explicitely
Does this change impact docs?
Release Note