Skip to content

Conversation

@matthchr
Copy link
Member

Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: #
  • No

Release Note

NONE

* Choose **Option B (Requirements)** if the label is static now but may theoretically be expanded/relaxed to be non-static/multi-valued in the future.
For example `topology.kubernetes.io/region` - today we do not support cross-region node provisioning but we theoretically could.

**TODO**: Not sure the above is that well-defined
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do folks agree with above... rules reasonable enough?

Labels for which there is one correct value for the whole cluster, but that value may change over time as the cluster topology changes (usually driven by changes to the AKS ManagedCluster data model).
Examples: `kubernetes.azure.com/ebpf-dataplane`, `kubernetes.azure.com/network-name`

**TODO**: Fill this in... answer for ebpf-dataplane may need to be special...?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling out TODO to myself, need to fix

| node.kubernetes.io/instance-type | ✅ | ✅ | ✅ | ✅ | |
| topology.kubernetes.io/region | ✅ | ✅ | ✅ | ✅ | |
| topology.kubernetes.io/zone | ✅ | ✅ | ✅ | ✅ | |
| kubernetes.azure.com/cluster | ✅ | ✅ | ❌ | ✅ | TODO: Today user can overwrite what AKS sets, we need to block |
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im tracking this now I can remove this TODO

**Note**: Well-known requirements that aren't directly about the instance SKU selection and are instead about provisioning details such as OS or zone require special handling in the code to ensure
that provisioning is done considering them. This is often accomplished via [createOfferings](https://github.com/Azure/karpenter-provider-azure/blob/main/pkg/providers/instancetype/instancetypes.go#L215).

**TODO: Do we agree on this? Is this a principled enough stance?**
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts?


We will allow these labels to be set on the NodePool/NodeClaim (in addition to the labels we already support as schedulable outlined in the tables above):
* kubernetes.azure.com/ebpf-dataplane
* kubernetes.azure.com/cluster-health-monitor-checker-synthetic (due to its high usage...) **TODO:** Learn what this is
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling out TODO to myself

* kubernetes.azure.com/node-image-version
* kubernetes.azure.com/fips_enabled
* kubernetes.azure.com/os-sku
* kubernetes.azure.com/mode
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we may not need to add this if we just require users always set it on their pool then we can schedule against it anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though if we don't make this a requirement, and user pods request mode: user, we will set mode: user but won't be able to correctly simulate their pods will land on the pool, so it may be worth making it schedulable anyway...


### Decision 5: How will we get labels that Karpenter doesn't currently write set on nodes?

#### Option A: Wait for Machine API migration
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI that public Machine API doc should be published soon. Feel free to note something like "see more details in Machine API doc".

Copy link
Collaborator

@tallaxes tallaxes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capturing comments I have so far

## Overview

This document examines Karpenter and AKS managed labels and discusses what is needed to bring what we currently do more
in line with AKS does.
Copy link
Collaborator

@tallaxes tallaxes Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
in line with AKS does.
in line with what AKS does.

(and maybe single line?)


There are 2 main topics:
* Label parity with AKS, setting labels that we don't currently set.
* Support for scheduling simulation on labels we don't currently support, but should to enable easier migrations from cluster autoscaler to Karpenter.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Support for scheduling simulation on labels we don't currently support, but should to enable easier migrations from cluster autoscaler to Karpenter.
* Support for scheduling simulation on labels we don't currently support, but should; to enable easier migrations from cluster autoscaler to Karpenter.

* Can be set on Pod when not set on NodePool/NodeClaim
* Can be set on NodePool/NodeClaim
* May trigger behavior other than just setting a label (inform VM size, zone, etc)
2. Labels written to the nodes but that Karpenter doesn't understand (can't schedule)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth distinguishing between those passed to kubelet by the provider vs. those set via other mechanisms?


Today, Karpenter has the following categories of labels:

1. Labels Karpenter understands and can schedule (Requirements)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rather "target for scheduling"?

than _just_ information about the instance type itself. For example information about the OS and the instances deployment circumstances (zone, capacity reservation, etc)
also make sense as well-known requirements.

We currently shy away from including things that aren't directly related to the instance type, OS, or how it is provisioned in the set of well-known requirements.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example of such thing would be helpful, maybe imageFamily?

* Historical reasons (including following what upstream/AWS and other providers do).
* Avoiding having too many default requirements which can cause cpu/memory burden.

**Note**: Well-known requirements that aren't directly about the instance SKU selection and are instead about provisioning details such as OS or zone require special handling in the code to ensure
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"instance SKU" -> "instance type" or "VM SKU"

also make sense as well-known requirements.

We currently shy away from including things that aren't directly related to the instance type, OS, or how it is provisioned in the set of well-known requirements.
This is because:
Copy link
Collaborator

@tallaxes tallaxes Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me the primary reason is still conceptual: requirements (in NodePool or workload) are primarily a filter / selector (a collection of constraints on the existing set), where other things (e.g. settings in NodeClass) contribute to the desired state for the chosen alternative. Zone, Capacity Type, and OS can be thought of as part of this filter / selector. (Possibly to slightly different extent in different providers, e.g. I think on AWS capacity type and maybe even platform/OS are part of the internal capacity pool definition. It looks like both Azure and AWS have instance types that only support one OS ....)

This separation is also useful / easier to reason about from UX perspective and explains why requirements (=filter/selector) generally target simple-valued intrinsic instance type properties (though could match multiple values, and support negation), where other things (=settings) can have complex/structured values.

Copy link
Collaborator

@tallaxes tallaxes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capturing more comments/questions


* Karpenter core writes all NodeClaim labels to the node, see: [syncNode](https://github.com/kubernetes-sigs/karpenter/blob/38e728c99b530f660f685d53d01f2e9ec9696668/pkg/controllers/nodeclaim/lifecycle/registration.go#L128).
* Karpenter core translates most requirements from the NodeClaim spec into labels, see [PopulateNodeClaimDetails](https://github.com/kubernetes-sigs/karpenter/blob/38e728c99b530f660f685d53d01f2e9ec9696668/pkg/controllers/nodeclaim/lifecycle/launch.go#L129)
* Note that `.Labels()` does exclude multi-valued requirements and labels defined in `WellKnownLabels` (registered by the cloud provider). See [Labels()](https://github.com/kubernetes-sigs/karpenter/blob/38e728c99b530f660f685d53d01f2e9ec9696668/pkg/scheduling/requirements.go#L270).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

".Labels() does exclude multi-valued requirements" - is that true? It excludes those that have .Any() produce an empty value, which would only exclude DoesNotExist? It looks like requirement with "In" would use the first value (even if there are multiple), and - IMO the most surprisingly from user perspective - generate random integer label values for "Exists/Gt/Lt" ... So when it does appear to skip multi-valued it is probably due to WellKnownLabels etc.?

".Labels() does exclude ... labels defined in WellKnownLabels (registered by the cloud provider)" - probably should say "including those registered by cloud provider", some come from core. It also excludes labels under RestrictedLabelDomains ("prohibited by kubelet or reserved by karpenter", e.g. "kubernetes.io", "karpenter.sh", "karpenter.azure.com", ...) - unless under LabelDomainExceptions ("not used in a context where they may be passed as argument to kubelet", e.g. "node.kubernetes.io").

Comment on lines +287 to +289
### Where can I see more about how AKS manages labels?

https://github.com/Azure/karpenter-poc/issues/1517#issuecomment-3470039930 discusses this further.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not link to a private repo from a public one?

Comment on lines +283 to +285
WellKnownLabels (like: `kubernetes.azure.com/cluster`): Infinite nodes if the label doesn't match the expected value - otherwise one node.
Non-WellKnownLabels that we don't write (like: `kubernetes.azure.com/network-name`): Results in `"incompatible requirements, label \"kubernetes.azure.com/network-name\" does not have known values`
Non-WellKnownLabels that we do write (like `kubernetes.azure.com/ebpf-dataplane`): Results in `incompatible requirements, label \"kubernetes.azure.com/kubelet-identity-clientid\" does not have known values"` - because the simulation engine in Karpenter doesn't know we're going to write it.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bulleted list would probably work better here?

Suggested change
WellKnownLabels (like: `kubernetes.azure.com/cluster`): Infinite nodes if the label doesn't match the expected value - otherwise one node.
Non-WellKnownLabels that we don't write (like: `kubernetes.azure.com/network-name`): Results in `"incompatible requirements, label \"kubernetes.azure.com/network-name\" does not have known values`
Non-WellKnownLabels that we do write (like `kubernetes.azure.com/ebpf-dataplane`): Results in `incompatible requirements, label \"kubernetes.azure.com/kubelet-identity-clientid\" does not have known values"` - because the simulation engine in Karpenter doesn't know we're going to write it.
* WellKnownLabels (like: `kubernetes.azure.com/cluster`): Infinite nodes if the label doesn't match the expected value - otherwise one node.
* Non-WellKnownLabels that we don't write (like: `kubernetes.azure.com/network-name`): Results in `"incompatible requirements, label \"kubernetes.azure.com/network-name\" does not have known values`
* Non-WellKnownLabels that we do write (like `kubernetes.azure.com/ebpf-dataplane`): Results in `incompatible requirements, label \"kubernetes.azure.com/kubelet-identity-clientid\" does not have known values"` - because the simulation engine in Karpenter doesn't know we're going to write it.

* Note that `WellKnownLabels` seems to serve two purposes:
1. It prevents Karpenter core from automatically including those labels on the node object (though note that AKS Karpenter provider ends up including most of them on the node object anyway).
2. It allows pods to ask for that label when the NodeClaim/requirements don't actually have it, and still have scheduling take place (I don't fully understand where/why this is needed but I can see the code for it [here](https://github.com/kubernetes-sigs/karpenter/blob/main/pkg/scheduling/requirements.go#L175))
* Requirements come from InstanceType via [computeRequirements](https://github.com/Azure/karpenter-provider-azure/blob/1f327c6d7ac62b3a3a1ad83c0f14f95001ab4ae8/pkg/providers/instancetype/instancetype.go#L135)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Narrative-wise, one would expect this to be about NodeClaim spec requirements - which come from multiple sources (including NodePool requirements and labels) and don't include requirements derived from instance type? (Those get translated into labels by cloud provider ...)

* Karpenter core writes all NodeClaim labels to the node, see: [syncNode](https://github.com/kubernetes-sigs/karpenter/blob/38e728c99b530f660f685d53d01f2e9ec9696668/pkg/controllers/nodeclaim/lifecycle/registration.go#L128).
* Karpenter core translates most requirements from the NodeClaim spec into labels, see [PopulateNodeClaimDetails](https://github.com/kubernetes-sigs/karpenter/blob/38e728c99b530f660f685d53d01f2e9ec9696668/pkg/controllers/nodeclaim/lifecycle/launch.go#L129)
* Note that `.Labels()` does exclude multi-valued requirements and labels defined in `WellKnownLabels` (registered by the cloud provider). See [Labels()](https://github.com/kubernetes-sigs/karpenter/blob/38e728c99b530f660f685d53d01f2e9ec9696668/pkg/scheduling/requirements.go#L270).
* Labels that get skipped by core (either due to `WellKnownLabels` or having multiple values) are added to the NodeClaim by AKS Karpenter provider in [vmInstanceToNodeClaim](https://github.com/Azure/karpenter-provider-azure/blob/b9c8c82edb289ac5b281c85b0851b5a0c45bc4bb/pkg/cloudprovider/cloudprovider.go#L476).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This happens before core NodeClaim controllers update labels, right? (The wording may be confusing regarding order ...) These labels are populated on Create by the provider from single-valued instance type requirements, and by special logic for multi-valued ones. The associated requirements don't get added to NodeClaim as requirements, and thus don't get observed on NodeClaim by NodeClaim controllers, right?


## FAQ

### How do the labels get onto the Node?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following describes the sequence in the reverse order, which makes sense, but probably worth noting explicitely

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants