high availability patterns

High Availability Patterns with ExaBGP

Overview

ExaBGP enables dynamic service advertisement and failover through BGP, providing high availability without traditional failover mechanisms like VRRP or Pacemaker.

Core Concept

Dynamic Service Advertisement: Nodes advertise their availability via BGP, announcing service IP addresses they can serve. The network automatically routes traffic to available nodes based on BGP path selection.

Architecture Patterns

Pattern 1: Direct Announcement (Simple)

[Web Servers] ----BGP----> [Edge Routers]
   (ExaBGP)                (BGP Speakers)

Characteristics:

Web servers run ExaBGP
Direct BGP peering with edge routers
Service IPs on loopback interfaces
Health checks control announcements

Pros: Simple, direct Cons: N×M BGP sessions, router config changes for new services

Pattern 2: Route Server (Recommended)

[Web Servers] ----BGP----> [Route Servers] ----BGP----> [Edge Routers]
   (ExaBGP)                (BIRD/Quagga)                (BGP Speakers)

Characteristics:

Intermediate route servers (BIRD/Quagga)
Star topology instead of full mesh
Route servers select best paths
Separates routing decisions from processes

Pros: Scalable, clean separation, easier management Cons: Additional infrastructure component

Service IP Allocation

Virtual IP Strategy

Allocate multiple virtual IPs per service:

Service A: 2001:db8:30::1, ::2, ::3
Service B: 2001:db8:40::1, ::2, ::3

Each node announces all IPs with different metrics for load distribution.

Loopback Configuration

Configure service IPs on loopback interface:

# Linux
ip addr add 2001:db8:30::1/128 dev lo
ip addr add 2001:db8:30::2/128 dev lo
ip addr add 2001:db8:30::3/128 dev lo

This prevents IP movement issues and enables anycast-style operation.

Metric-Based Load Distribution

Metric Strategy

Each node advertises routes with calculated metrics:

Node 1 (web1):

2001:db8:30::1 - metric 100 (primary)
2001:db8:30::2 - metric 101 (backup)
2001:db8:30::3 - metric 102 (backup)

Node 2 (web2):

2001:db8:30::1 - metric 101 (backup)
2001:db8:30::2 - metric 100 (primary)
2001:db8:30::3 - metric 102 (backup)

Node 3 (web3):

2001:db8:30::1 - metric 102 (backup)
2001:db8:30::2 - metric 101 (backup)
2001:db8:30::3 - metric 100 (primary)

Result: Each IP is primarily served by different node, achieving load distribution.

Failure Handling

When a service fails, increase metrics:

Healthy: metric 100-102
Failed: metric 1000-1002

BGP automatically converges to healthy nodes within seconds.

Health Check Implementation

Basic Health Check Script

#!/usr/bin/env python3
import sys
import subprocess
from time import sleep

def check_service():
    """Check if service is healthy"""
    try:
        result = subprocess.run(
            ['curl', '--fail', '--max-time', '2', 'http://localhost'],
            capture_output=True,
            timeout=3
        )
        return result.returncode == 0
    except:
        return False

# Configuration
service_ips = [
    ('2001:db8:30::1', 100),  # (IP, healthy_metric)
    ('2001:db8:30::2', 101),
    ('2001:db8:30::3', 102),
]
failed_metric = 1000

while True:
    healthy = check_service()

    for ip, base_metric in service_ips:
        metric = base_metric if healthy else (failed_metric + base_metric - 100)
        sys.stdout.write(
            f'announce route {ip}/128 next-hop self med {metric}\n'
        )

    sys.stdout.flush()
    sleep(10)

Advanced Health Checks

Multiple Service Checks:

checks = [
    ('http', 'http://localhost:80'),
    ('https', 'https://localhost:443'),
    ('app', 'http://localhost:8080/health'),
]

def check_all_services():
    return all(check_url(url) for name, url in checks)

Gradual Degradation:

def calculate_metric(base_metric, health_score):
    """
    health_score: 0.0 (dead) to 1.0 (perfect)
    Returns adjusted metric
    """
    if health_score < 0.3:
        return 1000  # Remove from service
    elif health_score < 0.7:
        return base_metric + 50  # Reduced preference
    else:
        return base_metric  # Normal operation

Retry Logic:

def check_with_retry(check_func, retries=3):
    for i in range(retries):
        if check_func():
            return True
        sleep(1)
    return False

ExaBGP Configuration

Basic Configuration

neighbor 192.0.2.1 {
    router-id 10.0.0.1;
    local-address 192.0.2.2;
    local-as 65000;
    peer-as 65000;

    family {
        ipv4 unicast;
        ipv6 unicast;
    }

    api {
        processes [healthcheck];
    }
}

process healthcheck {
    run /usr/local/bin/healthcheck.py;
    encoder text;
}

Route Server Configuration (BIRD)

protocol bgp web1 {
    local as 65000;
    neighbor 10.0.0.1 as 65000;

    ipv4 {
        import filter {
            # Accept service announcements
            if net ~ [ 2001:db8:30::/48+ ] then accept;
            reject;
        };
        export none;
    };
}

protocol bgp edge_router {
    local as 65000;
    neighbor 10.0.1.1 as 65000;

    ipv4 {
        import none;
        export filter {
            # Send best paths to edge routers
            if net ~ [ 2001:db8:30::/48+ ] then accept;
            reject;
        };
    };
}

Use Case Examples

1. Anycast DNS

Multiple DNS servers announce same service IP:

service_ip = '198.51.100.1'

while True:
    if dns_server_healthy():
        sys.stdout.write(f'announce route {service_ip}/32 next-hop self\n')
    else:
        sys.stdout.write(f'withdraw route {service_ip}/32\n')
    sys.stdout.flush()
    sleep(5)

Benefits:

Automatic failover
Geographic load distribution
Query latency reduction

2. Web Service High Availability

Multiple web servers with health checks:

service_ips = ['203.0.113.10', '203.0.113.11', '203.0.113.12']

for ip in service_ips:
    if check_web_service():
        # Announce with metric based on load
        current_load = get_system_load()
        metric = int(100 + current_load * 10)
        sys.stdout.write(
            f'announce route {ip}/32 next-hop self med {metric}\n'
        )
    else:
        sys.stdout.write(f'withdraw route {ip}/32\n')

3. Database Read Replicas

Announce read replica availability:

replica_ip = '198.51.100.50'

while True:
    # Check replication lag
    lag = get_replication_lag()

    if lag < 1.0:  # Less than 1 second lag
        metric = int(100 + lag * 100)
        sys.stdout.write(
            f'announce route {replica_ip}/32 next-hop self med {metric}\n'
        )
    else:
        # Too far behind, remove from pool
        sys.stdout.write(f'withdraw route {replica_ip}/32\n')

    sys.stdout.flush()
    sleep(5)

4. CDN Edge Node

Content delivery nodes announce based on capacity:

edge_ip = '203.0.113.100'

while True:
    available_bandwidth = get_available_bandwidth()
    cpu_usage = get_cpu_usage()

    if cpu_usage < 80 and available_bandwidth > 100:  # Mbps
        # Metric based on utilization
        metric = int(100 + cpu_usage)
        sys.stdout.write(
            f'announce route {edge_ip}/32 next-hop self med {metric}\n'
        )
    else:
        # Over capacity
        sys.stdout.write(f'withdraw route {edge_ip}/32\n')

    sys.stdout.flush()
    sleep(10)

Integration with Load Balancers

HAProxy + ExaBGP

[HAProxy] ----monitor----> [ExaBGP]
    |
    |---> Announces VIP based on backend health

HAProxy health check:

import socket

def check_haproxy_backends():
    """Query HAProxy stats socket"""
    s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
    s.connect('/var/run/haproxy.sock')
    s.send(b'show stat\n')
    stats = s.recv(4096).decode()
    s.close()

    # Parse stats and check backend health
    return healthy_backends_count > 0

NGINX + ExaBGP

[NGINX] ----health_check----> [ExaBGP]
    |
    |---> Announces based on upstream health

Maintenance Windows

Graceful Withdrawal

Use maintenance file trigger:

maintenance_file = '/etc/exabgp/maintenance'

def in_maintenance():
    return os.path.exists(maintenance_file)

while True:
    if in_maintenance():
        # Withdraw routes for maintenance
        for ip in service_ips:
            sys.stdout.write(f'withdraw route {ip}/32\n')
    else:
        # Normal operation
        if service_healthy():
            announce_routes()

Controlled Drain

Gradually increase metrics to drain traffic:

def drain_traffic(duration=300):  # 5 minutes
    steps = 10
    step_duration = duration / steps

    for i in range(steps):
        metric = 100 + (i * 100)  # 100 -> 1000
        announce_with_metric(metric)
        sleep(step_duration)

    # Final withdrawal
    withdraw_all_routes()

Monitoring and Alerting

Key Metrics to Monitor

BGP Session State: Ensure sessions stay established
Route Announcements: Track active announcements per node
Failover Events: Count and time failovers
Health Check Results: Success/failure rates
Convergence Time: Time to failover completion

Example Prometheus Metrics

from prometheus_client import Counter, Gauge, Histogram

bgp_session_up = Gauge('bgp_session_up', 'BGP session status')
routes_announced = Gauge('routes_announced', 'Number of announced routes')
health_checks_total = Counter('health_checks_total', 'Health checks', ['status'])
failover_time = Histogram('failover_time_seconds', 'Failover duration')

Troubleshooting

Common Issues

1. Routes Not Appearing

Check BGP session state
Verify health check is passing
Confirm service IPs on loopback
Check routing policy filters

2. Slow Failover

Reduce health check interval
Tune BGP timers
Verify route server configuration
Check for delayed withdrawals

3. Flapping

Implement health check dampening
Add retry logic
Increase check intervals during instability
Use hysteresis (different thresholds for up/down)

Debug Commands

# Check ExaBGP status
exabgpcli show neighbor summary

# View announced routes
exabgpcli show adj-rib out

# Check BGP sessions on router
show bgp summary
show bgp ipv4 unicast neighbors

# Monitor health check script
tail -f /var/log/exabgp/healthcheck.log

Best Practices

Use Route Servers: Simplifies management at scale
Metric Strategy: Plan metrics carefully for load distribution
Health Check Robustness: Multiple retries before failing
Loopback IPs: Always configure service IPs on loopback
Monitoring: Comprehensive monitoring of BGP and health checks
Graceful Degradation: Use metrics for gradual failure, not binary
Documentation: Document metric assignments and IP allocations
Testing: Regularly test failover scenarios
Logging: Log all health check state changes
Automation: Automate deployment and configuration

Performance Considerations

BGP Convergence: Typically 5-15 seconds
Health Check Frequency: 5-10 seconds recommended
Resource Usage: ExaBGP is lightweight (<50MB RAM typical)
Scale: Can handle 100+ service IPs per node

Alternative Approaches

Why Not OSPF?

Doesn't scale well
Impacts entire network on misconfiguration
Limited route filtering
Restricts network topologies
Better to keep OSPF for network devices only

Why Not VRRP/Keepalived?

Active/passive only (no load distribution)
Limited to L2 domains
Manual metric configuration
No application-level health checks
Harder to integrate with modern orchestration

Why Not DNS Round-Robin?

No health checking
Client-side caching issues
Long TTLs delay failover
No real-time failover

References

Vincent Bernat's HA with ExaBGP blog
RIPE Labs articles on ExaBGP
ExaBGP GitHub wiki
BGP RFC 4271
BGP MED (RFC 4451)

🏠 Home

🚀 Getting Started

🔧 API

🛡️ Use Cases

🌐 Address Families

FlowSpec
- Match Conditions
- Actions Reference

⚙️ Configuration

🔍 Operations

📚 Reference

🔄 Migration

🌍 Community

🔗 External

GitHub Repo ↗
Slack ↗
Issues ↗

👻 Ghost written by Claude (Anthropic AI)

high availability patterns

High Availability Patterns with ExaBGP

Overview

Core Concept

Architecture Patterns

Pattern 1: Direct Announcement (Simple)

Pattern 2: Route Server (Recommended)

Service IP Allocation

Virtual IP Strategy

Loopback Configuration

Metric-Based Load Distribution

Metric Strategy

Failure Handling

Health Check Implementation

Basic Health Check Script

Advanced Health Checks

ExaBGP Configuration

Basic Configuration

Route Server Configuration (BIRD)

Use Case Examples

1. Anycast DNS

2. Web Service High Availability

3. Database Read Replicas

4. CDN Edge Node

Integration with Load Balancers

HAProxy + ExaBGP

NGINX + ExaBGP

Maintenance Windows

Graceful Withdrawal

Controlled Drain

Monitoring and Alerting

Key Metrics to Monitor

Example Prometheus Metrics

Troubleshooting

Common Issues

Debug Commands

Best Practices

Performance Considerations

Alternative Approaches

Why Not OSPF?

Why Not VRRP/Keepalived?

Why Not DNS Round-Robin?

References

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!