Network Connectivity Issues with NodePort/LoadBalancer Services in Large EKS Cluster Using Calico (Tigera Operator, VXLAN Mode)

We are running into network connectivity issues in our EKS cluster, and would appreciate your suggestions or tuning recommendations.
### Cluster Overview:
**EKS cluster size**: 1000 worker nodes
**Node networking:** Worker nodes are launched in a secondary, non-routable private CIDR
**Calico deployment:** Installed via the Tigera operator, with provider: eks and VXLAN mode always enabled

### Workload Pattern:
We have a specific deployment/service with:
**Replicas:** 575 pods (one pod per node)
**Pod churn:** Every 15 minutes, ~30% of the pods (about 172) are terminated and replaced with new pods, completing a full rotation of all 575 pods within the 15-minute window.

### Issue Observed:
We exposed nginx ingress controller via NodePort and Istio ingress via LoadBalancer and we see frequent network connection issues:
Most EC2 nodes are marked unhealthy by the load balancer; only a few remain healthy at any given time.
This results in significant connectivity and availability problems for our service.

### Questions/Request for Suggestions:
- Are there Calico or Kubernetes configurations we can tune to improve network stability and performance under these conditions?
- Are there known limitations or best practices for Calico when running at this scale and with such high pod churn?
- Any VXLAN-specific tuning or EKS-specific considerations that might help?

### Additional Context:
**EKS version**: `1.31`
**Calico version:** `v3.29.1`
**Tigera operator version:** `v3.29.1`
**AMI**: `EKS Optimized Amazon Linux 2023`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Network Connectivity Issues with NodePort/LoadBalancer Services in Large EKS Cluster Using Calico (Tigera Operator, VXLAN Mode) #11243

Cluster Overview:

Workload Pattern:

Issue Observed:

Questions/Request for Suggestions:

Additional Context:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Network Connectivity Issues with NodePort/LoadBalancer Services in Large EKS Cluster Using Calico (Tigera Operator, VXLAN Mode) #11243

Description

Cluster Overview:

Workload Pattern:

Issue Observed:

Questions/Request for Suggestions:

Additional Context:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions