-
Notifications
You must be signed in to change notification settings - Fork 305
✨ Support Node Auto Placement and Node AF/AAF #3655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Bump VMOP including Node AF/AAF support - Add NodeAutoPlacement Feature Gate (cherry picked from commit 700c8ae)
(cherry picked from commit cfeb862)
Removes the extra cases for VMG creation, such that VMG is created for: 1. Multiple zones, multiple MDs with no failureDomain 2. Multiple zones, multiple MDs with failureDomain 3. Single zone, existing cluster with no failureDomain MDs Signed-off-by: Sagar Muchhal <[email protected]>
- Updates VMOP API dependency Misc VMG fixes - Use namingStrategy to calculate VM names - Use MachineDeployment names for VMG placement label - Includes all machinedeployments to generate node-pool -> zone mapping Fixes VMG webhook validation error - Adds cluster-name label to Af/AAF spec - re-adds zone topology key back to anti-aff spec Signed-off-by: Sagar Muchhal <[email protected]>
Signed-off-by: Sagar Muchhal <[email protected]>
Signed-off-by: Sagar Muchhal <[email protected]>
…gs#71) * Refine VMG controller when generate per-MD zone labels - Skip legacy already-placed VM which do not have placement info - Skip VM which do not have zone info * Apply suggestions from code review --------- Co-authored-by: Sagar Muchhal <[email protected]>
- Sync VSphereMachines during day-2 operations in VMG controller - Only wait for all intended VSphereMachines during initial Cluster creation - Use annotations in VMG for per-md-zone info Signed-off-by: Gong Zhang <[email protected]>
- Add VMG recociler unit test - Bump VMOP due to API change - Filter out VSphereMachine event except create/delete events Signed-off-by: Gong Zhang <[email protected]>
Signed-off-by: Gong Zhang <[email protected]>
Signed-off-by: Gong Zhang <[email protected]>
1160ce9 to
1cd61f9
Compare
|
/test ? |
|
@zhanggbj: The following commands are available to trigger required jobs: The following commands are available to trigger optional jobs: Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test pull-cluster-api-provider-vsphere-e2e-govmomi-blocking-main |
22604bf to
7c7c1a7
Compare
|
Hey @fabriziopandini @sbueringer , For the concern about race conditions, I have implemented checks to mitigate race conditions during initial and post-placement, and also have unit teststo cover the scenarios. |
3ed2e93 to
9b1754a
Compare
|
Pushed a small change and checking UT failures. |
Signed-off-by: Gong Zhang <[email protected]>
9b1754a to
2ac47d4
Compare
|
/test pull-cluster-api-provider-vsphere-e2e-supervisor-blocking-main |
|
/test pull-cluster-api-provider-vsphere-e2e-govmomi-blocking-main |
Since Encryption Class requires API in this newer version, bump vm-operator package Signed-off-by: Gong Zhang <[email protected]>
fd72f09 to
c41abeb
Compare
Signed-off-by: Gong Zhang <[email protected]>
Signed-off-by: Gong Zhang <[email protected]>
b8c569e to
08b8304
Compare
| continue | ||
| } | ||
| if member.Placement != nil && member.Placement.Zone != "" { | ||
| log.Info(fmt.Sprintf("MachineDeployment %s has been placed to failure domanin %s", md.Name, member.Placement.Zone), "MachineDeployment", klog.KObj(&md)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
| log.Info(fmt.Sprintf("MachineDeployment %s has been placed to failure domanin %s", md.Name, member.Placement.Zone), "MachineDeployment", klog.KObj(&md)) | |
| log.Info(fmt.Sprintf("MachineDeployment %s has been placed to failure domain %s", md.Name, member.Placement.Zone), "MachineDeployment", klog.KObj(&md)) |
| want bool | ||
| }{ | ||
| { | ||
| name: "Should create a VMG if all the expected VSphereMachines exists", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a unit test ensuring that a deleting vSphereMachine blocks.
I would suggest to use this test case as a setup and add an additional deleting machine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabriziopandini
Trying to understand this scenario, does it mean a deleting vSphereMachine won't be added to the VMGroup member list?
If yes, this case should cover it name: "Should create a VMG if all the expected VSphereMachines exists, deleting MD should be ignored"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a deleting machine should not be ignored, it should block creation of a VMG
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot recall we discussed deleting machine to block the creation of a VMG.
Is this the scaling down case? IIRC, we agreed that scaling down MD is allowed during any phase (before VMS is created or VMG is created by still during placement)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what is in the current implementation, and it makes sense to me because it makes sure that we start a placement decision only when the list of vSphereMachines is stable.
Let's please add a unit test and do not change the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabriziopandini Oh I got your points, we won't create VMG if expected VSphereMachine haven't existed yet (VM in deleting). I misread it.
Added case:
name: "Should not create a VMG if the expected VSphereMachines do not exist yet"
| clusterv1.ClusterNameLabel: cluster.Name, | ||
| }, | ||
| Annotations: map[string]string{ | ||
| fmt.Sprintf("%s/%s", vmoperator.ZoneAnnotationPrefix, "md3"): "zone1", // failureDomain for md3 is explicitly set by the user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add an annotation without the ZoneAnnotationPrefix and check it is preserved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
| Namespace: cluster.Namespace, | ||
| Name: cluster.Name, | ||
| UID: types.UID("uid"), | ||
| Labels: map[string]string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add an label different than the ClusterNameLabel and check it is preserved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
| Reason: infrav1.VSphereMachineVirtualMachineWaitingForVirtualMachineGroupV1Beta2Reason, | ||
| Message: fmt.Sprintf("Waiting for VSphereMachine's VirtualMachineGroup %s to exist", key), | ||
| }) | ||
| log.V(4).Info(fmt.Sprintf("Waiting for VirtualMachineGroup %s, requeueing", key.Name), "VirtualMachineGroup", klog.KRef(key.Namespace, key.Name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| log.V(4).Info(fmt.Sprintf("Waiting for VirtualMachineGroup %s, requeueing", key.Name), "VirtualMachineGroup", klog.KRef(key.Namespace, key.Name)) | |
| log.V(4).Info(fmt.Sprintf("Waiting for VirtualMachineGroup %s to exist, requeueing", key.Name), "VirtualMachineGroup", klog.KRef(key.Namespace, key.Name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
| } | ||
|
|
||
| // Set the zone label using the annotation of the per-md zone mapping from VirtualMachineGroup. | ||
| // This is for new VMs created during day-2 operations when Node Auto Placement is enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // This is for new VMs created during day-2 operations when Node Auto Placement is enabled. | |
| // This is for new VMs created after initial placement decision/with a failureDomain defined by the user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
| client.MatchingLabels{clusterv1.ClusterNameLabel: supervisorMachineCtx.Cluster.Name}); err != nil { | ||
| return false, err | ||
| } | ||
| othermMDNames := []string{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
| othermMDNames := []string{} | |
| otherMDNames := []string{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good eye:)
| }) | ||
|
|
||
| // control plane machine is the machine with the control plane label set | ||
| Specify("Reconcile valid control plane Machine", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a check that the CP machine is placed in the expected zone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added additional check for CP machine, that CP machine has the vmopVM.Labels[corev1.LabelTopologyZone] which is the contract btw CAPV&VMOP.
| vmGroup *vmoprv1.VirtualMachineGroup | ||
| ) | ||
|
|
||
| BeforeEach(func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we are missing a test for VMG not created yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a new case to test
"requeue the Machine waiting for VirtualMachineGroup creation"
|
|
||
| machineNotMember.Spec.Bootstrap.DataSecretName = &secretName | ||
|
|
||
| By("VirtualMachine is not created") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's check the VM actually does not exist, not only result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
|
/test pull-cluster-api-provider-vsphere-e2e-vcsim-govmomi-main |
|
@fabriziopandini @sbueringer I've addressed other comments, could yout please take a look at others firstly? Thanks! |
|
I have answered to the comment, please fix it first |
68f5b16 to
ea10895
Compare
- Add missed test cases - miscellaneous Signed-off-by: Gong Zhang <[email protected]>
ea10895 to
6d76fbd
Compare
|
|
|
@zhanggbj You have to use / test /test pull-cluster-api-provider-vsphere-e2e-supervisor-blocking-main |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What this PR does / why we need it:
Support Node Auto Placement and Node AF/AAF
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Fixes #