-
Notifications
You must be signed in to change notification settings - Fork 459
high availability patterns
ExaBGP enables dynamic service advertisement and failover through BGP, providing high availability without traditional failover mechanisms like VRRP or Pacemaker.
Dynamic Service Advertisement: Nodes advertise their availability via BGP, announcing service IP addresses they can serve. The network automatically routes traffic to available nodes based on BGP path selection.
[Web Servers] ----BGP----> [Edge Routers]
(ExaBGP) (BGP Speakers)
Characteristics:
- Web servers run ExaBGP
- Direct BGP peering with edge routers
- Service IPs on loopback interfaces
- Health checks control announcements
Pros: Simple, direct Cons: NΓM BGP sessions, router config changes for new services
[Web Servers] ----BGP----> [Route Servers] ----BGP----> [Edge Routers]
(ExaBGP) (BIRD/Quagga) (BGP Speakers)
Characteristics:
- Intermediate route servers (BIRD/Quagga)
- Star topology instead of full mesh
- Route servers select best paths
- Separates routing decisions from processes
Pros: Scalable, clean separation, easier management Cons: Additional infrastructure component
Allocate multiple virtual IPs per service:
Service A: 2001:db8:30::1, ::2, ::3
Service B: 2001:db8:40::1, ::2, ::3
Each node announces all IPs with different metrics for load distribution.
Configure service IPs on loopback interface:
# Linux
ip addr add 2001:db8:30::1/128 dev lo
ip addr add 2001:db8:30::2/128 dev lo
ip addr add 2001:db8:30::3/128 dev loThis prevents IP movement issues and enables anycast-style operation.
Each node advertises routes with calculated metrics:
Node 1 (web1):
2001:db8:30::1 - metric 100 (primary)
2001:db8:30::2 - metric 101 (backup)
2001:db8:30::3 - metric 102 (backup)
Node 2 (web2):
2001:db8:30::1 - metric 101 (backup)
2001:db8:30::2 - metric 100 (primary)
2001:db8:30::3 - metric 102 (backup)
Node 3 (web3):
2001:db8:30::1 - metric 102 (backup)
2001:db8:30::2 - metric 101 (backup)
2001:db8:30::3 - metric 100 (primary)
Result: Each IP is primarily served by different node, achieving load distribution.
When a service fails, increase metrics:
Healthy: metric 100-102
Failed: metric 1000-1002
BGP automatically converges to healthy nodes within seconds.
#!/usr/bin/env python3
import sys
import subprocess
from time import sleep
def check_service():
"""Check if service is healthy"""
try:
result = subprocess.run(
['curl', '--fail', '--max-time', '2', 'http://localhost'],
capture_output=True,
timeout=3
)
return result.returncode == 0
except:
return False
# Configuration
service_ips = [
('2001:db8:30::1', 100), # (IP, healthy_metric)
('2001:db8:30::2', 101),
('2001:db8:30::3', 102),
]
failed_metric = 1000
while True:
healthy = check_service()
for ip, base_metric in service_ips:
metric = base_metric if healthy else (failed_metric + base_metric - 100)
sys.stdout.write(
f'announce route {ip}/128 next-hop self med {metric}\n'
)
sys.stdout.flush()
sleep(10)Multiple Service Checks:
checks = [
('http', 'http://localhost:80'),
('https', 'https://localhost:443'),
('app', 'http://localhost:8080/health'),
]
def check_all_services():
return all(check_url(url) for name, url in checks)Gradual Degradation:
def calculate_metric(base_metric, health_score):
"""
health_score: 0.0 (dead) to 1.0 (perfect)
Returns adjusted metric
"""
if health_score < 0.3:
return 1000 # Remove from service
elif health_score < 0.7:
return base_metric + 50 # Reduced preference
else:
return base_metric # Normal operationRetry Logic:
def check_with_retry(check_func, retries=3):
for i in range(retries):
if check_func():
return True
sleep(1)
return Falseneighbor 192.0.2.1 {
router-id 10.0.0.1;
local-address 192.0.2.2;
local-as 65000;
peer-as 65000;
family {
ipv4 unicast;
ipv6 unicast;
}
api {
processes [healthcheck];
}
}
process healthcheck {
run /usr/local/bin/healthcheck.py;
encoder text;
}protocol bgp web1 {
local as 65000;
neighbor 10.0.0.1 as 65000;
ipv4 {
import filter {
# Accept service announcements
if net ~ [ 2001:db8:30::/48+ ] then accept;
reject;
};
export none;
};
}
protocol bgp edge_router {
local as 65000;
neighbor 10.0.1.1 as 65000;
ipv4 {
import none;
export filter {
# Send best paths to edge routers
if net ~ [ 2001:db8:30::/48+ ] then accept;
reject;
};
};
}
Multiple DNS servers announce same service IP:
service_ip = '198.51.100.1'
while True:
if dns_server_healthy():
sys.stdout.write(f'announce route {service_ip}/32 next-hop self\n')
else:
sys.stdout.write(f'withdraw route {service_ip}/32\n')
sys.stdout.flush()
sleep(5)Benefits:
- Automatic failover
- Geographic load distribution
- Query latency reduction
Multiple web servers with health checks:
service_ips = ['203.0.113.10', '203.0.113.11', '203.0.113.12']
for ip in service_ips:
if check_web_service():
# Announce with metric based on load
current_load = get_system_load()
metric = int(100 + current_load * 10)
sys.stdout.write(
f'announce route {ip}/32 next-hop self med {metric}\n'
)
else:
sys.stdout.write(f'withdraw route {ip}/32\n')Announce read replica availability:
replica_ip = '198.51.100.50'
while True:
# Check replication lag
lag = get_replication_lag()
if lag < 1.0: # Less than 1 second lag
metric = int(100 + lag * 100)
sys.stdout.write(
f'announce route {replica_ip}/32 next-hop self med {metric}\n'
)
else:
# Too far behind, remove from pool
sys.stdout.write(f'withdraw route {replica_ip}/32\n')
sys.stdout.flush()
sleep(5)Content delivery nodes announce based on capacity:
edge_ip = '203.0.113.100'
while True:
available_bandwidth = get_available_bandwidth()
cpu_usage = get_cpu_usage()
if cpu_usage < 80 and available_bandwidth > 100: # Mbps
# Metric based on utilization
metric = int(100 + cpu_usage)
sys.stdout.write(
f'announce route {edge_ip}/32 next-hop self med {metric}\n'
)
else:
# Over capacity
sys.stdout.write(f'withdraw route {edge_ip}/32\n')
sys.stdout.flush()
sleep(10)[HAProxy] ----monitor----> [ExaBGP]
|
|---> Announces VIP based on backend health
HAProxy health check:
import socket
def check_haproxy_backends():
"""Query HAProxy stats socket"""
s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
s.connect('/var/run/haproxy.sock')
s.send(b'show stat\n')
stats = s.recv(4096).decode()
s.close()
# Parse stats and check backend health
return healthy_backends_count > 0[NGINX] ----health_check----> [ExaBGP]
|
|---> Announces based on upstream health
Use maintenance file trigger:
maintenance_file = '/etc/exabgp/maintenance'
def in_maintenance():
return os.path.exists(maintenance_file)
while True:
if in_maintenance():
# Withdraw routes for maintenance
for ip in service_ips:
sys.stdout.write(f'withdraw route {ip}/32\n')
else:
# Normal operation
if service_healthy():
announce_routes()Gradually increase metrics to drain traffic:
def drain_traffic(duration=300): # 5 minutes
steps = 10
step_duration = duration / steps
for i in range(steps):
metric = 100 + (i * 100) # 100 -> 1000
announce_with_metric(metric)
sleep(step_duration)
# Final withdrawal
withdraw_all_routes()- BGP Session State: Ensure sessions stay established
- Route Announcements: Track active announcements per node
- Failover Events: Count and time failovers
- Health Check Results: Success/failure rates
- Convergence Time: Time to failover completion
from prometheus_client import Counter, Gauge, Histogram
bgp_session_up = Gauge('bgp_session_up', 'BGP session status')
routes_announced = Gauge('routes_announced', 'Number of announced routes')
health_checks_total = Counter('health_checks_total', 'Health checks', ['status'])
failover_time = Histogram('failover_time_seconds', 'Failover duration')1. Routes Not Appearing
- Check BGP session state
- Verify health check is passing
- Confirm service IPs on loopback
- Check routing policy filters
2. Slow Failover
- Reduce health check interval
- Tune BGP timers
- Verify route server configuration
- Check for delayed withdrawals
3. Flapping
- Implement health check dampening
- Add retry logic
- Increase check intervals during instability
- Use hysteresis (different thresholds for up/down)
# Check ExaBGP status
exabgpcli show neighbor summary
# View announced routes
exabgpcli show adj-rib out
# Check BGP sessions on router
show bgp summary
show bgp ipv4 unicast neighbors
# Monitor health check script
tail -f /var/log/exabgp/healthcheck.log- Use Route Servers: Simplifies management at scale
- Metric Strategy: Plan metrics carefully for load distribution
- Health Check Robustness: Multiple retries before failing
- Loopback IPs: Always configure service IPs on loopback
- Monitoring: Comprehensive monitoring of BGP and health checks
- Graceful Degradation: Use metrics for gradual failure, not binary
- Documentation: Document metric assignments and IP allocations
- Testing: Regularly test failover scenarios
- Logging: Log all health check state changes
- Automation: Automate deployment and configuration
- BGP Convergence: Typically 5-15 seconds
- Health Check Frequency: 5-10 seconds recommended
- Resource Usage: ExaBGP is lightweight (<50MB RAM typical)
- Scale: Can handle 100+ service IPs per node
- Doesn't scale well
- Impacts entire network on misconfiguration
- Limited route filtering
- Restricts network topologies
- Better to keep OSPF for network devices only
- Active/passive only (no load distribution)
- Limited to L2 domains
- Manual metric configuration
- No application-level health checks
- Harder to integrate with modern orchestration
- No health checking
- Client-side caching issues
- Long TTLs delay failover
- No real-time failover
- Vincent Bernat's HA with ExaBGP blog
- RIPE Labs articles on ExaBGP
- ExaBGP GitHub wiki
- BGP RFC 4271
- BGP MED (RFC 4451)
π Home
π Getting Started
π§ API
π‘οΈ Use Cases
π Address Families
βοΈ Configuration
π Operations
π Reference
- Architecture
- BGP State Machine
- Communities (RFC)
- Extended Communities
- BGP Ecosystem
- Capabilities (AFI/SAFI)
- RFC Support
π Migration
π Community
π External
- GitHub Repo β
- Slack β
- Issues β
π» Ghost written by Claude (Anthropic AI)