Skip to content

Conversation

@frank-spano
Copy link
Contributor

@frank-spano frank-spano commented Sep 17, 2025

What does this PR do?

Pulls latest release version of kubernetes and kind to use in test suite when "latest" is passed as kubernetes version.

Which scenarios this will impact?

Right now only TestKindSuite, invoked by the datadog-agent repo

Motivation

Outage detailed in https://datadoghq.atlassian.net/browse/CONTINT-4708

Additional Notes

Passing job: https://gitlab.ddbuild.io/DataDog/datadog-agent/-/jobs/1129993702

CONTINT-4708: Starting dynamic resolution of 'latest' Kubernetes version
CONTINT-4708: Fetching Kubernetes versions from Docker Hub API: https://hub.docker.com/v2/repositories/kindest/node/tags?page_size=100
CONTINT-4708: Successfully fetched 100 tags from Docker Hub
CONTINT-4708: Found 99 valid Kubernetes versions, latest is: 1.34.0
CONTINT-4708: Selected Kind version v0.30.0 for Kubernetes 1.34.0
CONTINT-4708: Final resolved config - KindVersion: v0.30.0, NodeImage: v1.34.0@sha256:7416a61b42b1662ca6ca89f02028ac133a309a2a30ba309614e8ec94d976dc5a, UsePublicRegistry: true
Using AMI ami-090c309e8ced8ecc2
 for stack ci-1129993702-4670-e2e-kindsuite-2f0369522723a4c3
CONTINT-4708: Starting dynamic resolution of 'latest' Kubernetes version
CONTINT-4708: Fetching Kubernetes versions from Docker Hub API: https://hub.docker.com/v2/repositories/kindest/node/tags?page_size=100
CONTINT-4708: Successfully fetched 100 tags from Docker Hub
CONTINT-4708: Found 99 valid Kubernetes versions, latest is: 1.34.0
CONTINT-4708: Selected Kind version v0.30.0 for Kubernetes 1.34.0
CONTINT-4708: Final resolved config - KindVersion: v0.30.0, NodeImage: v1.34.0@sha256:7416a61b42b1662ca6ca89f02028ac133a309a2a30ba309614e8ec94d976dc5a, UsePublicRegistry: true

@frank-spano frank-spano requested a review from a team as a code owner September 17, 2025 18:23
@frank-spano frank-spano changed the title Frank.spano/contint 4708 Add dynamic resolution of kind version Sep 30, 2025
@frank-spano frank-spano added the enhancement New feature or request label Sep 30, 2025
client := &http.Client{Timeout: 30 * time.Second}

// Fetch releases from GitHub API
githubURL := "https://api.github.com/repos/kubernetes-sigs/kind/releases"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you evaluate this link option ? possibly it results in some alpha version or what and so we still need to check whole list, but may be it solves our needs ? 🤔

https://github.com/kubernetes-sigs/kind/releases/latest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm it does redirect to the latest version, which I should think probably be a proper release version? On the other hand, I don't think iterating through the list of releases is that big of a deal. Let me think about it, maybe we can the link you provided, and if it doesn't resolve to a proper release version we can call the api for all releases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that would be a good in-between solution.

Copy link
Member

@chouetz chouetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: is this new process something that will be required in the CI? It's adding 3 new external dependencies (so 3 additional possible reasons to fail) in an already fragile process. I don't know if it's an option to ignore all possible errors in this process and just raise warnings and fallback to using the map and the internal mirror?


nodeImage := fmt.Sprintf("%s/%s:%s", env.InternalDockerhubMirror(), kindNodeImageName, kindVersionConfig.NodeImageVersion)
var nodeImage string
if kindVersionConfig.UsePublicRegistry {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you confirm this is always false when undefined (which is the case for all versions defined in the KindConfig map? I would like to prevent an external dependency if it's not necessary

func GetKindVersionConfig(kubeVersion string) (*KindConfig, error) {
// Handle "latest" as a special case
if kubeVersion == "latest" {
return getLatestKindVersionConfig()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we catch the error here, and just raise a warning in case of failure in the child methods?

Copy link
Member

@KevinFairise2 KevinFairise2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think supporting pulling the latest automatically is a good option but only for local development.
In automated tests running in the CI we do not want this kind of "magic". Because if a new Kubernetes version break the tests, then we will end up with a CI breaking because of an external reason. It will also break all the oldest branches that are maybe not even supposed to support these newest Kind version.
The proper way to do that would be to follow what Steven mentioned here: https://datadoghq.atlassian.net/browse/CONTINT-4708?focusedCommentId=2395613
We want to know that we something is broken with newest version of Kind, but we do not want that to happen randomly in the CI. We want to catch it on the PR that introduce testing on that new version

var nodeImage string
if kindVersionConfig.UsePublicRegistry {
// Use public Docker Hub for latest/dynamic versions not available in internal mirror
nodeImage = fmt.Sprintf("docker.io/%s:%s", kindNodeImageName, kindVersionConfig.NodeImageVersion)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the internal mirror should be transparent, you can target an arbitrary dockerhub image through the mirror. If it is not there yet it should be pulled.
In any case using docker.io is not an option because we will get rate limited very soon. If that's for local development only, this is acceptable, not in the CI

// getKindVersionForKubernetes determines the appropriate Kind version for a given Kubernetes version
// Based on Kind release compatibility: https://github.com/kubernetes-sigs/kind/releases
// Used as fallback if dynamic resolution fails
func getKindVersionForKubernetes(kubeVersion *semver.Version) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using the map just above that was exactly made to do that AFAIU?

@frank-spano
Copy link
Contributor Author

@KevinFairise2 I agree with your comments that introducing the latest magic could be problematic. In your mind how would updating the matrix Steven mentioned look? A background job? Or each time the CI runs? Would we also want to auto update this file to test the latest version https://github.com/DataDog/datadog-agent/blob/main/.gitlab/e2e/e2e.yml#L237 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants