-
Notifications
You must be signed in to change notification settings - Fork 459
Use Cases Data Center Interconnect
ExaBGP enables Data Center Interconnect using EVPN, VPLS, and Layer 2 extension for multi-site deployments, disaster recovery, and workload mobility.
- Overview
- DCI Architectures
- EVPN for DCI
- VPLS for DCI
- Configuration Examples
- Troubleshooting
- See Also
Data Center Interconnect solutions need:
- Layer 2 extension: Stretch VLANs across sites for VM mobility
- Layer 3 routing: Efficient inter-site routing and traffic engineering
- Multi-tenancy: Isolated networks per customer/application
- Disaster recovery: Seamless failover between sites
- Workload mobility: Live migration of VMs/containers across sites
- Geographic distribution: Active-active or active-standby deployments
ExaBGP supports multiple DCI approaches:
| Technology | Layer | Use Case | Complexity |
|---|---|---|---|
| EVPN | L2/L3 | Modern DCI, VXLAN-based | Medium |
| VPLS | L2 | Legacy DCI, MPLS-based | Medium |
| L3VPN | L3 | IP-only DCI, no L2 extension | Low |
| Stretched Anycast | L3 | Service availability, no state | Low |
ExaBGP enables DCI by:
- Route exchange: Advertise endpoints between data centers via BGP
- Topology discovery: Collect site information via BGP-LS
- Traffic engineering: Control inter-site traffic flow
- Automation: Dynamic provisioning via API
Important: ExaBGP exchanges routes but does NOT create tunnels or manipulate forwarding tables. Your application must configure VXLAN/GRE tunnels and install routes.
Both data centers serve traffic simultaneously:
[DC1: VTEP 10.1.0.1] <--EVPN--> [DC2: VTEP 10.2.0.1]
| |
[VMs: VNI 10000] [VMs: VNI 10000]
| |
[IP: 192.168.1.0/24] [IP: 192.168.1.0/24]
Benefits:
- Load distribution across sites
- No idle capacity
- Optimal resource utilization
Challenges:
- Requires stretched Layer 2
- Potential for MAC flapping
- Need for loop prevention
One site active, other standby for DR:
[DC1: Active] <--EVPN--> [DC2: Standby]
| |
[Services] [Services (inactive)]
Benefits:
- Simple failover
- No split-brain issues
- Clear ownership
Challenges:
- Wasted standby capacity
- Slower failover than active-active
Multiple data centers in full mesh:
[DC1]
/ \
[DC2]----[DC3]
\ /
[DC4]
Benefits:
- Geographic distribution
- Resilience to site failures
- Flexible workload placement
Challenges:
- Complex routing
- Higher WAN bandwidth needs
- Increased latency
Advertise endpoints across data centers:
#!/usr/bin/env python3
import sys
# Data center configuration
DC_ID = 1
VTEP_IP = f"10.{DC_ID}.0.1"
VNI = 10000
def announce_endpoint(mac, ip):
"""Announce endpoint to remote DC"""
rd = f"{VTEP_IP}:1"
rt = f"65000:{VNI}"
print(f"announce route-distinguisher {rd} "
f"mac {mac} ip {ip} "
f"label {VNI} "
f"route-target {rt}", flush=True)
# Announce local endpoints
announce_endpoint("00:11:22:33:44:55", "192.168.1.10")
announce_endpoint("00:11:22:33:44:56", "192.168.1.11")
while True:
line = sys.stdin.readline().strip()
if not line:
breakFor Layer 3 DCI without L2 extension:
#!/usr/bin/env python3
import sys
DC_ID = 1
VTEP_IP = f"10.{DC_ID}.0.1"
VNI = 10000
def announce_prefix(prefix):
"""Announce IP prefix to remote DC (EVPN Type 5)"""
rd = f"{VTEP_IP}:1"
rt = f"65000:{VNI}"
# EVPN Type 5 via text API
print(f"announce route {prefix} "
f"next-hop {VTEP_IP} "
f"route-target {rt} "
f"label {VNI}", flush=True)
# Announce DC1 prefixes to DC2
announce_prefix("192.168.1.0/24")
announce_prefix("192.168.2.0/24")
while True:
line = sys.stdin.readline().strip()
if not line:
breakUse EVPN for gateway failover:
#!/usr/bin/env python3
import sys
import time
import requests
DC_ID = 1
VTEP_IP = f"10.{DC_ID}.0.1"
GATEWAY_IP = "192.168.1.1"
VNI = 10000
def check_gateway_health():
"""Check if local gateway is healthy"""
try:
# Check gateway reachability
response = requests.get(f"http://{GATEWAY_IP}/health", timeout=2)
return response.status_code == 200
except:
return False
def announce_gateway(mac, ip):
"""Announce gateway IP to take over"""
rd = f"{VTEP_IP}:1"
rt = f"65000:{VNI}"
print(f"announce route-distinguisher {rd} "
f"mac {mac} ip {ip} "
f"label {VNI} "
f"route-target {rt}", flush=True)
def withdraw_gateway(mac, ip):
"""Withdraw gateway announcement"""
rd = f"{VTEP_IP}:1"
print(f"withdraw route-distinguisher {rd} "
f"mac {mac} ip {ip}", flush=True)
# Monitor gateway and announce/withdraw
announced = False
GATEWAY_MAC = "00:11:22:33:44:01"
while True:
healthy = check_gateway_health()
if healthy and not announced:
announce_gateway(GATEWAY_MAC, GATEWAY_IP)
announced = True
elif not healthy and announced:
withdraw_gateway(GATEWAY_MAC, GATEWAY_IP)
announced = False
time.sleep(10)VPLS (Virtual Private LAN Service) provides Layer 2 VPN using MPLS:
#!/usr/bin/env python3
import sys
# VPLS configuration
VPLS_ID = 100
RD = "10.1.0.1:100"
RT = "65000:100"
PE_ADDRESS = "10.1.0.1"
def announce_vpls():
"""Announce VPLS endpoint"""
# VPLS via BGP (RFC 4761)
print(f"announce route-distinguisher {RD} "
f"vpls {VPLS_ID} "
f"endpoint {PE_ADDRESS} "
f"route-target {RT}", flush=True)
announce_vpls()
while True:
line = sys.stdin.readline().strip()
if not line:
breakConnect multiple data centers via VPLS:
#!/usr/bin/env python3
import sys
# Multi-site VPLS
SITES = {
'dc1': {'pe': '10.1.0.1', 'vpls_id': 100},
'dc2': {'pe': '10.2.0.1', 'vpls_id': 100},
'dc3': {'pe': '10.3.0.1', 'vpls_id': 100}
}
SITE_NAME = 'dc1' # Current site
RD_BASE = SITES[SITE_NAME]['pe']
VPLS_ID = SITES[SITE_NAME]['vpls_id']
RT = f"65000:{VPLS_ID}"
def announce_vpls_endpoint():
"""Announce VPLS endpoint for this site"""
rd = f"{RD_BASE}:{VPLS_ID}"
pe = SITES[SITE_NAME]['pe']
print(f"announce route-distinguisher {rd} "
f"vpls {VPLS_ID} "
f"endpoint {pe} "
f"route-target {RT}", flush=True)
announce_vpls_endpoint()
while True:
line = sys.stdin.readline().strip()
if not line:
breakConfiguration (/etc/exabgp/dci-evpn.conf):
process dci-controller {
run python3 /etc/exabgp/dci-evpn.py;
encoder json;
}
# Connection to local spine switches
neighbor 10.1.0.254 {
router-id 10.1.0.1;
local-address 10.1.0.1;
local-as 65001;
peer-as 65001;
family {
l2vpn evpn;
ipv4 unicast;
}
api {
processes [ dci-controller ];
}
}
# Connection to remote DC (via DCI circuit)
neighbor 10.2.0.1 {
router-id 10.1.0.1;
local-address 10.1.0.1;
local-as 65001;
peer-as 65002;
family {
l2vpn evpn;
}
api {
processes [ dci-controller ];
}
}Configuration (/etc/exabgp/dci-vpls.conf):
process vpls-controller {
run python3 /etc/exabgp/vpls-announce.py;
encoder text;
}
neighbor 10.1.0.254 {
router-id 10.1.0.1;
local-address 10.1.0.1;
local-as 65001;
peer-as 65001;
family {
l2vpn vpls;
}
api {
processes [ vpls-controller ];
}
}
neighbor 10.2.0.1 {
router-id 10.1.0.1;
local-address 10.1.0.1;
local-as 65001;
peer-as 65002;
family {
l2vpn vpls;
}
api {
processes [ vpls-controller ];
}
}Simple anycast without L2 extension:
#!/usr/bin/env python3
import sys
import time
import requests
# Anycast service IP
ANYCAST_IP = "1.2.3.4/32"
NEXT_HOP = "10.1.0.1"
def check_service_health():
"""Check local service health"""
try:
response = requests.get("http://127.0.0.1:8080/health", timeout=2)
return response.status_code == 200
except:
return False
announced = False
while True:
healthy = check_service_health()
if healthy and not announced:
print(f"announce route {ANYCAST_IP} next-hop {NEXT_HOP}", flush=True)
announced = True
elif not healthy and announced:
print(f"withdraw route {ANYCAST_IP} next-hop {NEXT_HOP}", flush=True)
announced = False
time.sleep(10)Problem: Same MAC appears in multiple locations
Debugging:
# Check for duplicate MAC
show bgp l2vpn evpn route-type 2 | grep "00:11:22:33:44:55"
# Verify Route Distinguishers are unique
exabgpcli show neighbor advertised-routesSolution:
- Ensure unique RD per VTEP
- Check for VM migration loops
- Verify no duplicate MAC addresses assigned
Problem: Slow inter-site communication
Debugging:
# Check underlay latency
ping 10.2.0.1
# Verify tunnel MTU
ip link show vxlan10
# Check for packet fragmentation
tcpdump -i eth0 -vv udp port 4789Solution:
- Optimize underlay routing
- Adjust MTU for VXLAN overhead (typically -50 bytes)
- Use jumbo frames if supported
Problem: Both sites think they're primary
Debugging:
# Check gateway announcements from both sites
show bgp l2vpn evpn | grep "192.168.1.1"Solution:
- Implement proper health checking
- Use consensus mechanism (e.g., Raft, etcd)
- Add manual override capability
Monitor DCI health:
#!/usr/bin/env python3
from prometheus_client import start_http_server, Gauge
import sys
import json
# Metrics
dci_link_status = Gauge('dci_link_status', 'DCI link status', ['site'])
dci_endpoints = Gauge('dci_endpoints_total', 'Total endpoints', ['site'])
dci_latency = Gauge('dci_latency_ms', 'Inter-site latency', ['site'])
start_http_server(9100)
# Update metrics
while True:
line = sys.stdin.readline().strip()
if not line:
break
try:
data = json.loads(line)
# Parse BGP updates and update metrics
except:
passConsider DCI bandwidth requirements:
- Control plane: BGP updates (typically < 1 Mbps)
- Data plane: Actual workload traffic (depends on applications)
- Overhead: VXLAN/GRE adds ~50 bytes per packet
- Burst handling: Account for VM migration traffic
Secure DCI links:
- Encryption: Use IPsec or MACsec for DCI links
- Authentication: Use BGP MD5 or TCP-AO
- Route filtering: Accept only expected routes
- Rate limiting: Protect against BGP route storms
π» Ghost written by Claude (Anthropic AI)
π Home
π Getting Started
π§ API
π‘οΈ Use Cases
π Address Families
βοΈ Configuration
π Operations
π Reference
- Architecture
- BGP State Machine
- Communities (RFC)
- Extended Communities
- BGP Ecosystem
- Capabilities (AFI/SAFI)
- RFC Support
π Migration
π Community
π External
- GitHub Repo β
- Slack β
- Issues β
π» Ghost written by Claude (Anthropic AI)