Skip to content

Installing Calico as a NetworkPolicies provider on Windows nodes #11109

@nader-dabbabi

Description

@nader-dabbabi

I'm experiencing some problems while installing Calico as a NetworkPolicies provider only on Windows nodes (not a CNI plugin). The installation is done using the Tigera Operator Helm Chart v3.30.3 on an Amazon EKS v1.32 cluster having mixed nodes (Windows Server 2022 and Amazon Bottlerocket).

The installation configuration

apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
  # ...
  name: default
spec:
  calicoNetwork:
    bgp: Disabled
    linuxDataplane: Iptables
    nodeAddressAutodetectionV4:
      canReach: 8.8.8.8
    windowsDataplane: HNS
  cni:
    ipam:
      type: AmazonVPC
    type: AmazonVPC
  controlPlaneNodeSelector:
    kubernetes.io/os: linux
  controlPlaneReplicas: 2
  flexVolumePath: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/
  kubeletVolumePluginPath: /var/lib/kubelet
  kubernetesProvider: EKS
  nonPrivileged: Disabled
  serviceCIDRs:
  - 172.20.0.0/16
  variant: Calico
  windowsNodes:
    cniBinDir: /Program Files/Amazon/EKS/cni
    cniConfigDir: /ProgramData/Amazon/EKS/cni/config
    cniLogDir: /var/log/calico/cni
status:
  calicoVersion: v3.30.3
  computed:
    calicoNetwork:
      bgp: Disabled
      linuxDataplane: Iptables
      nodeAddressAutodetectionV4:
        canReach: 8.8.8.8
      windowsDataplane: HNS
    cni:
      ipam:
        type: AmazonVPC
      type: AmazonVPC
    controlPlaneNodeSelector:
      kubernetes.io/os: linux
    controlPlaneReplicas: 2
    flexVolumePath: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/
    kubeletVolumePluginPath: /var/lib/kubelet
    kubernetesProvider: EKS
    nonPrivileged: Disabled
    serviceCIDRs:
    - 172.20.0.0/16
    variant: Calico
    windowsNodes:
      cniBinDir: /Program Files/Amazon/EKS/cni
      cniConfigDir: /Program Files/Amazon/EKS/cni/config
      cniLogDir: /var/log/calico/cni
  conditions:
  - lastTransitionTime: "2025-10-01T14:38:04Z"
    message: All Objects Available
    observedGeneration: 5
    reason: AllObjectsAvailable
    status: "False"
    type: Degraded
  - lastTransitionTime: "2025-10-01T14:38:04Z"
    message: All objects available
    observedGeneration: 5
    reason: AllObjectsAvailable
    status: "True"
    type: Ready
  - lastTransitionTime: "2025-10-01T14:38:04Z"
    message: All Objects Available
    observedGeneration: 5
    reason: AllObjectsAvailable
    status: "False"
    type: Progressing
  mtu: 9001
  variant: Calico

The calico-node DaemonSet on linux is deployed correctly, however, calico-node-windows is not. The calico-node-windows DaemonSet starts by deleting the kube-proxy service from the node through its init container, uninstall-calico, as shown in the following logs:

/host/etc/cni/net.d dir does not exist, skipping Calico CNI config cleanup
/host/opt/cni/bin dir does not exist, skipping Calico CNI binary cleanup
Stopping and removing Calico services if they are present...
Stopping and removing kube-proxy service if it is present...
It is recommended to run kube-proxy as kubernetes daemonset instead
Logging containerd CNI bin and conf dir paths:

      bin_dir = "C:\\Program Files\\Amazon\\EKS\\cni"
      conf_dir = "C:\\ProgramData\\Amazon\\EKS\\cni\\config"
Done.

On the other hand, the main container on the DaemonSet, node, shows errors in the log and it seems to be stuck:

Setting environment variables if not set...
Environment variable KUBE_NETWORK is already set: vpc.*
Environment variable CALICO_NETWORKING_BACKEND is already set: none
Environment variable DNS_SEARCH is not set. Setting it to the default value: svc.cluster.local
Environment variable VXLAN_VNI is already set: 4096
Environment variable VXLAN_MAC_PREFIX is not set. Setting it to the default value: 0E-2A
Environment variable VXLAN_ADAPTER is not set. Setting it to the default value: 
hostname : The term 'hostname' is not recognized as the name of a cmdlet, function, script file, or operable program. 
Check the spelling of the name, or if a path was included, verify that the path is correct and try again.
At C:\hpc\CalicoWindows\config.ps1:52 char:52
+ Set-EnvVarIfNotSet -var "NODENAME" -defaultValue $(hostname).ToLower( ...
+                                                    ~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (hostname:String) ], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException
 
You cannot call a method on a null-valued expression.
At C:\hpc\CalicoWindows\config.ps1:52 char:52
+ Set-EnvVarIfNotSet -var "NODENAME" -defaultValue $(hostname).ToLower( ...
+                                                    ~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) ], RuntimeException
    + FullyQualifiedErrorId : InvokeMethodOnNull
 
Environment variable CALICO_K8S_NODE_REF is not set. Setting it to the default value: ip-10-77-45-160.ca-central-1.compute.internal
Environment variable STARTUP_VALID_IP_TIMEOUT is not set. Setting it to the default value: 90
Environment variable IP is already set: autodetect
Environment variable FELIX_LOGSEVERITYFILE is not set. Setting it to the default value: none
Environment variable FELIX_LOGSEVERITYSYS is not set. Setting it to the default value: none
StoredLastBootTime 9/30/2025 7:57:22 PM, CurrentLastBootTime 9/30/2025 7:57:22 PM
Stored new lastBootTime 9/30/2025 7:57:22 PM
Kubelet has (re)started, (re)initialising the node...
2025-09-30 20:06:52.706 [INFO][5068] startup/startup.go 437: Early log level set to info
2025-09-30 20:06:52.706 [INFO][5068] startup/utils.go 125: Using NODENAME environment for node name ip-10-77-45-160.ca-central-1.compute.internal
2025-09-30 20:06:52.706 [INFO][5068] startup/utils.go 137: Determined node name: ip-10-77-45-160.ca-central-1.compute.internal
2025-09-30 20:06:52.706 [INFO][5068] startup/startup.go 93: Starting node ip-10-77-45-160.ca-central-1.compute.internal with version v3.30.3
2025-09-30 20:06:52.708 [WARNING][5068] startup/winutils.go 168: Ignoring kubeconfig configs for Windows HostProcess container. Using the inClusterConfig.
2025-09-30 20:06:52.710 [INFO][5068] startup/startup.go 442: Checking datastore connection
2025-09-30 20:06:52.771 [INFO][5068] startup/startup.go 466: Datastore connection verified
2025-09-30 20:06:52.771 [INFO][5068] startup/startup.go 103: Datastore is ready
2025-09-30 20:06:52.771 [WARNING][5068] startup/winutils.go 168: Ignoring kubeconfig configs for Windows HostProcess container. Using the inClusterConfig.
2025-09-30 20:06:52.811 [WARNING][5068] startup/winutils.go 144: Ignoring kubeconfig path for Windows HostProcess container. Using the inClusterConfig.
2025-09-30 20:06:52.863 [INFO][5068] startup/startup.go 834: Selected default IP pool is '192.168.0.0/16'
2025-09-30 20:06:52.921 [INFO][5068] startup/startup.go 213: Using node name: ip-10-77-45-160.ca-central-1.compute.internal
2025-09-30 20:06:52.921 [INFO][5068] startup/startup_windows.go 55: Backend networking is none, no network setup needed.
2025-09-30 20:06:52.921 [INFO][5068] startup/startup_windows.go 113: Ensure network is done.
2025-09-30 20:06:52.921 [INFO][5068] startup/utils.go 93: removed shutdown timestamp timestamp="2025-09-30T20:06:43Z"
Calico node initialisation succeeded; monitoring kubelet for restarts...
Starting Calico token refresher...
Calico token refresher running on PID 2952
2025-09-30 20:06:53.081 [WARNING][2952] cni-config-monitor/winutils.go 144: Ignoring kubeconfig path for Windows HostProcess container. Using the inClusterConfig.
2025-09-30 20:06:53.187 [INFO][2952] cni-config-monitor/token_watch.go 240: Update of CNI kubeconfig triggered based on elapsed time.
2025-09-30 20:06:53.187 [WARNING][2952] cni-config-monitor/winutils.go 144: Ignoring kubeconfig path for Windows HostProcess container. Using the inClusterConfig.
2025-09-30 20:06:53.191 [ERROR][2952] cni-config-monitor/token_watch.go 292: Failed to write CNI plugin kubeconfig file error=open C:/hpc/host/etc/cni/net.d/calico-kubeconfig: The system cannot find the path specified.

The installation documentation (step 7) talks about a second init container named install-cni, but it is not present in the manifest deployed by Tigera Operator:

calico-node-windows manifest

apiVersion: v1
kind: Pod
metadata:
  labels:
    app.kubernetes.io/name: calico-node-windows
    k8s-app: calico-node-windows
  name: calico-node-windows-9lpm9
  namespace: calico-system
  # ...
spec:
  containers:
  - args:
    - $env:CONTAINER_SANDBOX_MOUNT_POINT/CalicoWindows/felix-service.ps1
    env:
    - name: DATASTORE_TYPE
      value: kubernetes
    - name: WAIT_FOR_DATASTORE
      value: "true"
    - name: CLUSTER_TYPE
      value: k8s,operator,ecs,windows
    - name: CALICO_DISABLE_FILE_LOGGING
      value: "false"
    - name: FELIX_DEFAULTENDPOINTTOHOSTACTION
      value: ACCEPT
    - name: FELIX_HEALTHENABLED
      value: "true"
    - name: NODENAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: spec.nodeName
    - name: NAMESPACE
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.namespace
    - name: FELIX_TYPHAK8SNAMESPACE
      value: calico-system
    - name: FELIX_TYPHAK8SSERVICENAME
      value: calico-typha
    - name: FELIX_TYPHACAFILE
      value: /etc/pki/tls/certs/tigera-ca-bundle.crt
    - name: FELIX_TYPHACERTFILE
      value: /node-certs/tls.crt
    - name: FELIX_TYPHAKEYFILE
      value: /node-certs/tls.key
    - name: VXLAN_VNI
      value: "4096"
    - name: VXLAN_ADAPTER
    - name: FELIX_TYPHACN
      value: typha-server
    - name: CALICO_MANAGE_CNI
      value: "false"
    - name: FELIX_BPFEXTTOSERVICECONNMARK
      value: "0x80"
    - name: KUBE_NETWORK
      value: vpc.*
    - name: CALICO_NETWORKING_BACKEND
      value: none
    - name: IP
      value: autodetect
    - name: IP_AUTODETECTION_METHOD
      value: can-reach=8.8.8.8
    - name: IP6
      value: none
    - name: FELIX_IPV6SUPPORT
      value: "false"
    - name: FELIX_ROUTESOURCE
      value: WorkloadIPs
    - name: KUBERNETES_SERVICE_HOST
      value: REDACTED.gr7.ca-central-1.eks.amazonaws.com
    - name: KUBERNETES_SERVICE_PORT
      value: "443"
    image: docker.io/calico/node-windows:v3.30.3
    imagePullPolicy: IfNotPresent
    lifecycle:
      preStop:
        exec:
          command:
          - $env:CONTAINER_SANDBOX_MOUNT_POINT/CalicoWindows/calico-node.exe
          - -shutdown
    livenessProbe:
      exec:
        command:
        - $env:CONTAINER_SANDBOX_MOUNT_POINT/CalicoWindows/calico-node.exe
        - -felix-live
      # ...
    name: felix
    readinessProbe:
      exec:
        command:
        - $env:CONTAINER_SANDBOX_MOUNT_POINT/CalicoWindows/calico-node.exe
        - -felix-ready
      # ...
    resources: {}
    securityContext:
      windowsOptions:
        hostProcess: true
        runAsUserName: NT AUTHORITY\system
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: c:/node-certs
      name: node-certs
      readOnly: true
    - mountPath: c:/etc/pki/tls/certs
      name: tigera-ca-bundle
      readOnly: true
    - mountPath: /var/lib/calico
      name: var-lib-calico
    - mountPath: /var/run/calico
      name: var-run-calico
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-h2hdg
      readOnly: true
    workingDir: $env:CONTAINER_SANDBOX_MOUNT_POINT/CalicoWindows/
  - args:
    - $env:CONTAINER_SANDBOX_MOUNT_POINT/CalicoWindows/node-service.ps1
    env:
    - name: DATASTORE_TYPE
      value: kubernetes
    - name: WAIT_FOR_DATASTORE
      value: "true"
    - name: CLUSTER_TYPE
      value: k8s,operator,ecs,windows
    - name: CALICO_DISABLE_FILE_LOGGING
      value: "false"
    - name: FELIX_DEFAULTENDPOINTTOHOSTACTION
      value: ACCEPT
    - name: FELIX_HEALTHENABLED
      value: "true"
    - name: NODENAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: spec.nodeName
    - name: NAMESPACE
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.namespace
    - name: FELIX_TYPHAK8SNAMESPACE
      value: calico-system
    - name: FELIX_TYPHAK8SSERVICENAME
      value: calico-typha
    - name: FELIX_TYPHACAFILE
      value: /etc/pki/tls/certs/tigera-ca-bundle.crt
    - name: FELIX_TYPHACERTFILE
      value: /node-certs/tls.crt
    - name: FELIX_TYPHAKEYFILE
      value: /node-certs/tls.key
    - name: VXLAN_VNI
      value: "4096"
    - name: VXLAN_ADAPTER
    - name: FELIX_TYPHACN
      value: typha-server
    - name: CALICO_MANAGE_CNI
      value: "false"
    - name: FELIX_BPFEXTTOSERVICECONNMARK
      value: "0x80"
    - name: KUBE_NETWORK
      value: vpc.*
    - name: CALICO_NETWORKING_BACKEND
      value: none
    - name: IP
      value: autodetect
    - name: IP_AUTODETECTION_METHOD
      value: can-reach=8.8.8.8
    - name: IP6
      value: none
    - name: FELIX_IPV6SUPPORT
      value: "false"
    - name: FELIX_ROUTESOURCE
      value: WorkloadIPs
    - name: KUBERNETES_SERVICE_HOST
      value: REDACTED.gr7.ca-central-1.eks.amazonaws.com
    - name: KUBERNETES_SERVICE_PORT
      value: "443"
    image: docker.io/calico/node-windows:v3.30.3
    imagePullPolicy: IfNotPresent
    name: node
    resources: {}
    securityContext:
      windowsOptions:
        hostProcess: true
        runAsUserName: NT AUTHORITY\system
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: c:/node-certs
      name: node-certs
      readOnly: true
    - mountPath: c:/etc/pki/tls/certs
      name: tigera-ca-bundle
      readOnly: true
    - mountPath: /var/lib/calico
      name: var-lib-calico
    - mountPath: /var/run/calico
      name: var-run-calico
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-h2hdg
      readOnly: true
    workingDir: $env:CONTAINER_SANDBOX_MOUNT_POINT/CalicoWindows/
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostNetwork: true
  initContainers:
  - args:
    - $env:CONTAINER_SANDBOX_MOUNT_POINT/uninstall-calico.ps1
    env:
    - name: SLEEP
      value: "false"
    - name: CNI_BIN_DIR
      value: /host/opt/cni/bin
    - name: CNI_CONF_NAME
      value: 10-calico.conflist
    - name: CNI_NET_DIR
      value: /host/etc/cni/net.d
    image: docker.io/calico/node-windows:v3.30.3
    imagePullPolicy: IfNotPresent
    name: uninstall-calico
    resources: {}
    securityContext:
      windowsOptions:
        hostProcess: true
        runAsUserName: NT AUTHORITY\system
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-h2hdg
      readOnly: true
  nodeName: ip-10-77-45-160.ca-central-1.compute.internal
  nodeSelector:
    kubernetes.io/os: windows
  preemptionPolicy: PreemptLowerPriority
  priority: 2000001000
  priorityClassName: system-node-critical
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: calico-node
  serviceAccountName: calico-node
  terminationGracePeriodSeconds: 5
  tolerations:
  - key: CriticalAddonsOnly
    operator: Exists
  - effect: NoSchedule
    operator: Exists
  - effect: NoExecute
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/disk-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/pid-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/unschedulable
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/network-unavailable
    operator: Exists
  volumes:
  - hostPath:
      path: /lib/modules
      type: ""
    name: lib-modules
  - name: node-certs
    secret:
      defaultMode: 420
      secretName: node-certs
  - hostPath:
      path: /var/run/nodeagent
      type: DirectoryOrCreate
    name: policysync
  - configMap:
      defaultMode: 420
      name: tigera-ca-bundle
    name: tigera-ca-bundle
  - hostPath:
      path: /var/lib/calico
      type: DirectoryOrCreate
    name: var-lib-calico
  - hostPath:
      path: /var/run/calico
      type: DirectoryOrCreate
    name: var-run-calico
  - hostPath:
      path: /run/xtables.lock
      type: FileOrCreate
    name: xtables-lock
  - name: kube-api-access-h2hdg
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
# ...

The kube-proxy service is baked in the Windows Server 2022 AMI and it works usually fine. Is there a way to not to delete it ? Even though the documentation recommends installing kube-proxy as a DaemonSet (step 6) and provides the following manifest to install it, the Pod will get stuck forever waiting for a Calico HNS to be created as shown in the following logs:

WARNING: The names of some imported commands from the module 'hns' include unapproved verbs that might make them less 
discoverable. To find the commands with unapproved verbs, run the Import-Module command again with the Verbose
parameter. For a list of approved verbs, type Get-Verb.
Running kub-proxy service.
Waiting for HNS network Calico to be created...

The only HNS on the node is the default vpcbr* HNS.

How to effectively install Calico on Windows nodes as a NetworkPolicy provider with Amazon VPC CNI as IPAM and network pluging ? Am I missing something ?

Amazon VPC CNI ConfigMap for more details

apiVersion: v1
kind: ConfigMap
metadata:
  name: amazon-vpc-cni
  namespace: kube-system
data:
  branch-eni-cooldown: "60"
  enable-network-policy-controller: "false"
  enable-windows-ipam: "true"
  enable-windows-prefix-delegation: "true"
  minimum-ip-target: "20"
  warm-ip-target: "3"
  warm-prefix-target: "0"

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions