-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
I'm experiencing some problems while installing Calico as a NetworkPolicies provider only on Windows nodes (not a CNI plugin). The installation is done using the Tigera Operator Helm Chart v3.30.3 on an Amazon EKS v1.32 cluster having mixed nodes (Windows Server 2022 and Amazon Bottlerocket).
The installation configuration
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
# ...
name: default
spec:
calicoNetwork:
bgp: Disabled
linuxDataplane: Iptables
nodeAddressAutodetectionV4:
canReach: 8.8.8.8
windowsDataplane: HNS
cni:
ipam:
type: AmazonVPC
type: AmazonVPC
controlPlaneNodeSelector:
kubernetes.io/os: linux
controlPlaneReplicas: 2
flexVolumePath: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/
kubeletVolumePluginPath: /var/lib/kubelet
kubernetesProvider: EKS
nonPrivileged: Disabled
serviceCIDRs:
- 172.20.0.0/16
variant: Calico
windowsNodes:
cniBinDir: /Program Files/Amazon/EKS/cni
cniConfigDir: /ProgramData/Amazon/EKS/cni/config
cniLogDir: /var/log/calico/cni
status:
calicoVersion: v3.30.3
computed:
calicoNetwork:
bgp: Disabled
linuxDataplane: Iptables
nodeAddressAutodetectionV4:
canReach: 8.8.8.8
windowsDataplane: HNS
cni:
ipam:
type: AmazonVPC
type: AmazonVPC
controlPlaneNodeSelector:
kubernetes.io/os: linux
controlPlaneReplicas: 2
flexVolumePath: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/
kubeletVolumePluginPath: /var/lib/kubelet
kubernetesProvider: EKS
nonPrivileged: Disabled
serviceCIDRs:
- 172.20.0.0/16
variant: Calico
windowsNodes:
cniBinDir: /Program Files/Amazon/EKS/cni
cniConfigDir: /Program Files/Amazon/EKS/cni/config
cniLogDir: /var/log/calico/cni
conditions:
- lastTransitionTime: "2025-10-01T14:38:04Z"
message: All Objects Available
observedGeneration: 5
reason: AllObjectsAvailable
status: "False"
type: Degraded
- lastTransitionTime: "2025-10-01T14:38:04Z"
message: All objects available
observedGeneration: 5
reason: AllObjectsAvailable
status: "True"
type: Ready
- lastTransitionTime: "2025-10-01T14:38:04Z"
message: All Objects Available
observedGeneration: 5
reason: AllObjectsAvailable
status: "False"
type: Progressing
mtu: 9001
variant: CalicoThe calico-node DaemonSet on linux is deployed correctly, however, calico-node-windows is not. The calico-node-windows DaemonSet starts by deleting the kube-proxy service from the node through its init container, uninstall-calico, as shown in the following logs:
/host/etc/cni/net.d dir does not exist, skipping Calico CNI config cleanup
/host/opt/cni/bin dir does not exist, skipping Calico CNI binary cleanup
Stopping and removing Calico services if they are present...
Stopping and removing kube-proxy service if it is present...
It is recommended to run kube-proxy as kubernetes daemonset instead
Logging containerd CNI bin and conf dir paths:
bin_dir = "C:\\Program Files\\Amazon\\EKS\\cni"
conf_dir = "C:\\ProgramData\\Amazon\\EKS\\cni\\config"
Done.
On the other hand, the main container on the DaemonSet, node, shows errors in the log and it seems to be stuck:
Setting environment variables if not set...
Environment variable KUBE_NETWORK is already set: vpc.*
Environment variable CALICO_NETWORKING_BACKEND is already set: none
Environment variable DNS_SEARCH is not set. Setting it to the default value: svc.cluster.local
Environment variable VXLAN_VNI is already set: 4096
Environment variable VXLAN_MAC_PREFIX is not set. Setting it to the default value: 0E-2A
Environment variable VXLAN_ADAPTER is not set. Setting it to the default value:
hostname : The term 'hostname' is not recognized as the name of a cmdlet, function, script file, or operable program.
Check the spelling of the name, or if a path was included, verify that the path is correct and try again.
At C:\hpc\CalicoWindows\config.ps1:52 char:52
+ Set-EnvVarIfNotSet -var "NODENAME" -defaultValue $(hostname).ToLower( ...
+ ~~~~~~~~
+ CategoryInfo : ObjectNotFound: (hostname:String) ], CommandNotFoundException
+ FullyQualifiedErrorId : CommandNotFoundException
You cannot call a method on a null-valued expression.
At C:\hpc\CalicoWindows\config.ps1:52 char:52
+ Set-EnvVarIfNotSet -var "NODENAME" -defaultValue $(hostname).ToLower( ...
+ ~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) ], RuntimeException
+ FullyQualifiedErrorId : InvokeMethodOnNull
Environment variable CALICO_K8S_NODE_REF is not set. Setting it to the default value: ip-10-77-45-160.ca-central-1.compute.internal
Environment variable STARTUP_VALID_IP_TIMEOUT is not set. Setting it to the default value: 90
Environment variable IP is already set: autodetect
Environment variable FELIX_LOGSEVERITYFILE is not set. Setting it to the default value: none
Environment variable FELIX_LOGSEVERITYSYS is not set. Setting it to the default value: none
StoredLastBootTime 9/30/2025 7:57:22 PM, CurrentLastBootTime 9/30/2025 7:57:22 PM
Stored new lastBootTime 9/30/2025 7:57:22 PM
Kubelet has (re)started, (re)initialising the node...
2025-09-30 20:06:52.706 [INFO][5068] startup/startup.go 437: Early log level set to info
2025-09-30 20:06:52.706 [INFO][5068] startup/utils.go 125: Using NODENAME environment for node name ip-10-77-45-160.ca-central-1.compute.internal
2025-09-30 20:06:52.706 [INFO][5068] startup/utils.go 137: Determined node name: ip-10-77-45-160.ca-central-1.compute.internal
2025-09-30 20:06:52.706 [INFO][5068] startup/startup.go 93: Starting node ip-10-77-45-160.ca-central-1.compute.internal with version v3.30.3
2025-09-30 20:06:52.708 [WARNING][5068] startup/winutils.go 168: Ignoring kubeconfig configs for Windows HostProcess container. Using the inClusterConfig.
2025-09-30 20:06:52.710 [INFO][5068] startup/startup.go 442: Checking datastore connection
2025-09-30 20:06:52.771 [INFO][5068] startup/startup.go 466: Datastore connection verified
2025-09-30 20:06:52.771 [INFO][5068] startup/startup.go 103: Datastore is ready
2025-09-30 20:06:52.771 [WARNING][5068] startup/winutils.go 168: Ignoring kubeconfig configs for Windows HostProcess container. Using the inClusterConfig.
2025-09-30 20:06:52.811 [WARNING][5068] startup/winutils.go 144: Ignoring kubeconfig path for Windows HostProcess container. Using the inClusterConfig.
2025-09-30 20:06:52.863 [INFO][5068] startup/startup.go 834: Selected default IP pool is '192.168.0.0/16'
2025-09-30 20:06:52.921 [INFO][5068] startup/startup.go 213: Using node name: ip-10-77-45-160.ca-central-1.compute.internal
2025-09-30 20:06:52.921 [INFO][5068] startup/startup_windows.go 55: Backend networking is none, no network setup needed.
2025-09-30 20:06:52.921 [INFO][5068] startup/startup_windows.go 113: Ensure network is done.
2025-09-30 20:06:52.921 [INFO][5068] startup/utils.go 93: removed shutdown timestamp timestamp="2025-09-30T20:06:43Z"
Calico node initialisation succeeded; monitoring kubelet for restarts...
Starting Calico token refresher...
Calico token refresher running on PID 2952
2025-09-30 20:06:53.081 [WARNING][2952] cni-config-monitor/winutils.go 144: Ignoring kubeconfig path for Windows HostProcess container. Using the inClusterConfig.
2025-09-30 20:06:53.187 [INFO][2952] cni-config-monitor/token_watch.go 240: Update of CNI kubeconfig triggered based on elapsed time.
2025-09-30 20:06:53.187 [WARNING][2952] cni-config-monitor/winutils.go 144: Ignoring kubeconfig path for Windows HostProcess container. Using the inClusterConfig.
2025-09-30 20:06:53.191 [ERROR][2952] cni-config-monitor/token_watch.go 292: Failed to write CNI plugin kubeconfig file error=open C:/hpc/host/etc/cni/net.d/calico-kubeconfig: The system cannot find the path specified.
The installation documentation (step 7) talks about a second init container named install-cni, but it is not present in the manifest deployed by Tigera Operator:
calico-node-windows manifest
apiVersion: v1
kind: Pod
metadata:
labels:
app.kubernetes.io/name: calico-node-windows
k8s-app: calico-node-windows
name: calico-node-windows-9lpm9
namespace: calico-system
# ...
spec:
containers:
- args:
- $env:CONTAINER_SANDBOX_MOUNT_POINT/CalicoWindows/felix-service.ps1
env:
- name: DATASTORE_TYPE
value: kubernetes
- name: WAIT_FOR_DATASTORE
value: "true"
- name: CLUSTER_TYPE
value: k8s,operator,ecs,windows
- name: CALICO_DISABLE_FILE_LOGGING
value: "false"
- name: FELIX_DEFAULTENDPOINTTOHOSTACTION
value: ACCEPT
- name: FELIX_HEALTHENABLED
value: "true"
- name: NODENAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: FELIX_TYPHAK8SNAMESPACE
value: calico-system
- name: FELIX_TYPHAK8SSERVICENAME
value: calico-typha
- name: FELIX_TYPHACAFILE
value: /etc/pki/tls/certs/tigera-ca-bundle.crt
- name: FELIX_TYPHACERTFILE
value: /node-certs/tls.crt
- name: FELIX_TYPHAKEYFILE
value: /node-certs/tls.key
- name: VXLAN_VNI
value: "4096"
- name: VXLAN_ADAPTER
- name: FELIX_TYPHACN
value: typha-server
- name: CALICO_MANAGE_CNI
value: "false"
- name: FELIX_BPFEXTTOSERVICECONNMARK
value: "0x80"
- name: KUBE_NETWORK
value: vpc.*
- name: CALICO_NETWORKING_BACKEND
value: none
- name: IP
value: autodetect
- name: IP_AUTODETECTION_METHOD
value: can-reach=8.8.8.8
- name: IP6
value: none
- name: FELIX_IPV6SUPPORT
value: "false"
- name: FELIX_ROUTESOURCE
value: WorkloadIPs
- name: KUBERNETES_SERVICE_HOST
value: REDACTED.gr7.ca-central-1.eks.amazonaws.com
- name: KUBERNETES_SERVICE_PORT
value: "443"
image: docker.io/calico/node-windows:v3.30.3
imagePullPolicy: IfNotPresent
lifecycle:
preStop:
exec:
command:
- $env:CONTAINER_SANDBOX_MOUNT_POINT/CalicoWindows/calico-node.exe
- -shutdown
livenessProbe:
exec:
command:
- $env:CONTAINER_SANDBOX_MOUNT_POINT/CalicoWindows/calico-node.exe
- -felix-live
# ...
name: felix
readinessProbe:
exec:
command:
- $env:CONTAINER_SANDBOX_MOUNT_POINT/CalicoWindows/calico-node.exe
- -felix-ready
# ...
resources: {}
securityContext:
windowsOptions:
hostProcess: true
runAsUserName: NT AUTHORITY\system
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: c:/node-certs
name: node-certs
readOnly: true
- mountPath: c:/etc/pki/tls/certs
name: tigera-ca-bundle
readOnly: true
- mountPath: /var/lib/calico
name: var-lib-calico
- mountPath: /var/run/calico
name: var-run-calico
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-h2hdg
readOnly: true
workingDir: $env:CONTAINER_SANDBOX_MOUNT_POINT/CalicoWindows/
- args:
- $env:CONTAINER_SANDBOX_MOUNT_POINT/CalicoWindows/node-service.ps1
env:
- name: DATASTORE_TYPE
value: kubernetes
- name: WAIT_FOR_DATASTORE
value: "true"
- name: CLUSTER_TYPE
value: k8s,operator,ecs,windows
- name: CALICO_DISABLE_FILE_LOGGING
value: "false"
- name: FELIX_DEFAULTENDPOINTTOHOSTACTION
value: ACCEPT
- name: FELIX_HEALTHENABLED
value: "true"
- name: NODENAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: FELIX_TYPHAK8SNAMESPACE
value: calico-system
- name: FELIX_TYPHAK8SSERVICENAME
value: calico-typha
- name: FELIX_TYPHACAFILE
value: /etc/pki/tls/certs/tigera-ca-bundle.crt
- name: FELIX_TYPHACERTFILE
value: /node-certs/tls.crt
- name: FELIX_TYPHAKEYFILE
value: /node-certs/tls.key
- name: VXLAN_VNI
value: "4096"
- name: VXLAN_ADAPTER
- name: FELIX_TYPHACN
value: typha-server
- name: CALICO_MANAGE_CNI
value: "false"
- name: FELIX_BPFEXTTOSERVICECONNMARK
value: "0x80"
- name: KUBE_NETWORK
value: vpc.*
- name: CALICO_NETWORKING_BACKEND
value: none
- name: IP
value: autodetect
- name: IP_AUTODETECTION_METHOD
value: can-reach=8.8.8.8
- name: IP6
value: none
- name: FELIX_IPV6SUPPORT
value: "false"
- name: FELIX_ROUTESOURCE
value: WorkloadIPs
- name: KUBERNETES_SERVICE_HOST
value: REDACTED.gr7.ca-central-1.eks.amazonaws.com
- name: KUBERNETES_SERVICE_PORT
value: "443"
image: docker.io/calico/node-windows:v3.30.3
imagePullPolicy: IfNotPresent
name: node
resources: {}
securityContext:
windowsOptions:
hostProcess: true
runAsUserName: NT AUTHORITY\system
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: c:/node-certs
name: node-certs
readOnly: true
- mountPath: c:/etc/pki/tls/certs
name: tigera-ca-bundle
readOnly: true
- mountPath: /var/lib/calico
name: var-lib-calico
- mountPath: /var/run/calico
name: var-run-calico
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-h2hdg
readOnly: true
workingDir: $env:CONTAINER_SANDBOX_MOUNT_POINT/CalicoWindows/
dnsPolicy: ClusterFirst
enableServiceLinks: true
hostNetwork: true
initContainers:
- args:
- $env:CONTAINER_SANDBOX_MOUNT_POINT/uninstall-calico.ps1
env:
- name: SLEEP
value: "false"
- name: CNI_BIN_DIR
value: /host/opt/cni/bin
- name: CNI_CONF_NAME
value: 10-calico.conflist
- name: CNI_NET_DIR
value: /host/etc/cni/net.d
image: docker.io/calico/node-windows:v3.30.3
imagePullPolicy: IfNotPresent
name: uninstall-calico
resources: {}
securityContext:
windowsOptions:
hostProcess: true
runAsUserName: NT AUTHORITY\system
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-h2hdg
readOnly: true
nodeName: ip-10-77-45-160.ca-central-1.compute.internal
nodeSelector:
kubernetes.io/os: windows
preemptionPolicy: PreemptLowerPriority
priority: 2000001000
priorityClassName: system-node-critical
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: calico-node
serviceAccountName: calico-node
terminationGracePeriodSeconds: 5
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- effect: NoSchedule
operator: Exists
- effect: NoExecute
operator: Exists
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/disk-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/memory-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/pid-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/unschedulable
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/network-unavailable
operator: Exists
volumes:
- hostPath:
path: /lib/modules
type: ""
name: lib-modules
- name: node-certs
secret:
defaultMode: 420
secretName: node-certs
- hostPath:
path: /var/run/nodeagent
type: DirectoryOrCreate
name: policysync
- configMap:
defaultMode: 420
name: tigera-ca-bundle
name: tigera-ca-bundle
- hostPath:
path: /var/lib/calico
type: DirectoryOrCreate
name: var-lib-calico
- hostPath:
path: /var/run/calico
type: DirectoryOrCreate
name: var-run-calico
- hostPath:
path: /run/xtables.lock
type: FileOrCreate
name: xtables-lock
- name: kube-api-access-h2hdg
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
# ...The kube-proxy service is baked in the Windows Server 2022 AMI and it works usually fine. Is there a way to not to delete it ? Even though the documentation recommends installing kube-proxy as a DaemonSet (step 6) and provides the following manifest to install it, the Pod will get stuck forever waiting for a Calico HNS to be created as shown in the following logs:
WARNING: The names of some imported commands from the module 'hns' include unapproved verbs that might make them less
discoverable. To find the commands with unapproved verbs, run the Import-Module command again with the Verbose
parameter. For a list of approved verbs, type Get-Verb.
Running kub-proxy service.
Waiting for HNS network Calico to be created...
The only HNS on the node is the default vpcbr* HNS.
How to effectively install Calico on Windows nodes as a NetworkPolicy provider with Amazon VPC CNI as IPAM and network pluging ? Am I missing something ?
Amazon VPC CNI ConfigMap for more details
apiVersion: v1
kind: ConfigMap
metadata:
name: amazon-vpc-cni
namespace: kube-system
data:
branch-eni-cooldown: "60"
enable-network-policy-controller: "false"
enable-windows-ipam: "true"
enable-windows-prefix-delegation: "true"
minimum-ip-target: "20"
warm-ip-target: "3"
warm-prefix-target: "0"