echo
Ο€
+
cargo
hack
+
mint
+
symfony
+
!=
apex
+
+
phoenix
+
svelte
+
+
+
apex
+
+
+
+
+
termux
cdn
soap
bitbucket
adonis
+
+
&&
ractive
crystal
riot
+
+
+
+
pinecone
+
0b
wsl
angular
+
+
+
+
grpc
+
zig
parcel
+
qwik
+
+
abap
+
istio
saml
+
+
gulp
ada
couchdb
+
+
+
+
+
webpack
aurelia
vite
supabase
+
+
composer
+
toml
grafana
strapi
jwt
dynamo
r
abap
rocket
influxdb
+
Back to Blog
πŸ”§ Configuring High Availability Systems on Alpine Linux: Enterprise Resilience
Alpine Linux High Availability Clustering

πŸ”§ Configuring High Availability Systems on Alpine Linux: Enterprise Resilience

Published Jun 18, 2025

Comprehensive tutorial for system administrators to implement high availability, failover clustering, and redundancy systems on Alpine Linux. Perfect for mission-critical enterprise deployments!

18 min read
0 views
Table of Contents

πŸ”§ Configuring High Availability Systems on Alpine Linux: Enterprise Resilience

Let’s build enterprise-grade high availability systems on Alpine Linux! πŸš€ This comprehensive tutorial shows you how to implement failover clustering, load balancing, and redundancy to ensure 99.99% uptime for mission-critical applications. Perfect for system administrators managing production environments where downtime is not an option! 😊

πŸ€” What is High Availability?

High Availability (HA) is a system design approach that ensures services remain operational and accessible with minimal downtime through redundant components, automatic failover mechanisms, and fault-tolerant architectures that eliminate single points of failure!

High Availability is like:

  • πŸ₯ Emergency backup systems that instantly activate when primary systems fail
  • πŸŒ‰ Multiple bridge routes ensuring traffic always flows even if one path is blocked
  • ⚑ Uninterruptible power supply that seamlessly switches between power sources

🎯 What You Need

Before we start, you need:

  • βœ… Multiple Alpine Linux servers (minimum 3 for proper clustering)
  • βœ… Understanding of networking, clustering, and system administration
  • βœ… Knowledge of load balancing and failover concepts
  • βœ… Root access and network configuration capabilities

πŸ“‹ Step 1: Install High Availability Foundation

Install Clustering and HA Packages

Let’s set up the complete HA foundation! 😊

What we’re doing: Installing essential clustering software, monitoring tools, and high availability components for enterprise-grade system resilience.

# Update package list
apk update

# Install clustering and HA core packages
apk add pacemaker corosync crmsh
apk add resource-agents fence-agents

# Install load balancing components
apk add haproxy keepalived
apk add nginx

# Install database clustering (PostgreSQL with replication)
apk add postgresql postgresql-contrib
apk add postgresql-client

# Install monitoring and alerting
apk add nagios-core nagios-plugins
apk add zabbix-agent prometheus node-exporter

# Install network and storage tools
apk add drbd-utils lvm2
apk add nfs-utils cifs-utils

# Install backup and synchronization
apk add rsync rdiff-backup
apk add borgbackup

# Install security and encryption
apk add openssh-server
apk add gnupg

# Verify installations
pcs --version
corosync -v
haproxy -v

echo "HA foundation installed! πŸ—οΈ"

What this does: πŸ“– Installs complete high availability software stack for enterprise clustering.

Example output:

pcs-0.11.4
Corosync Cluster Engine, version '3.1.7'
HAProxy version 2.6.12-1
HA foundation installed! πŸ—οΈ

What this means: Alpine Linux is equipped with enterprise HA capabilities! βœ…

Configure Network Infrastructure

Let’s set up redundant networking! 🎯

What we’re doing: Configuring network bonding, VLANs, and redundant network paths for fault-tolerant connectivity.

# Configure network bonding for redundancy
cat > /etc/network/interfaces << 'EOF'
# High Availability Network Configuration

# Loopback interface
auto lo
iface lo inet loopback

# Management network (primary)
auto eth0
iface eth0 inet static
    address 192.168.1.10
    netmask 255.255.255.0
    gateway 192.168.1.1
    dns-nameservers 8.8.8.8 8.8.4.4

# Cluster heartbeat network (dedicated)
auto eth1
iface eth1 inet static
    address 10.0.1.10
    netmask 255.255.255.0
    # No gateway - heartbeat only

# Storage network (optional)
auto eth2
iface eth2 inet static
    address 10.0.2.10
    netmask 255.255.255.0
    # No gateway - storage only

# Virtual IP for cluster services
auto eth0:0
iface eth0:0 inet static
    address 192.168.1.100
    netmask 255.255.255.0
EOF

# Configure network bonding module
echo "bonding" >> /etc/modules

# Create bonding configuration
cat > /etc/modprobe.d/bonding.conf << 'EOF'
# Network bonding configuration for HA
alias bond0 bonding
options bonding mode=active-backup miimon=100 downdelay=200 updelay=200
EOF

# Configure advanced network settings
cat > /etc/sysctl.d/99-ha-network.conf << 'EOF'
# High Availability Network Optimizations

# Enable IP forwarding
net.ipv4.ip_forward = 1
net.ipv6.conf.all.forwarding = 1

# Enable source route verification
net.ipv4.conf.all.rp_filter = 1

# Disable ICMP redirects
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.send_redirects = 0

# Increase network buffer sizes
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728

# Enable TCP window scaling
net.ipv4.tcp_window_scaling = 1

# Increase connection tracking
net.netfilter.nf_conntrack_max = 1048576
EOF

# Apply network settings
sysctl -p /etc/sysctl.d/99-ha-network.conf

# Restart networking
service networking restart

echo "Network infrastructure configured! 🌐"

What this does: πŸ“– Sets up redundant network infrastructure with dedicated heartbeat and management networks.

Example output:

Network infrastructure configured! 🌐

What this means: Network foundation is ready for HA clustering! βœ…

πŸ“‹ Step 2: Configure Cluster Foundation

Set Up Corosync Cluster Communication

Let’s configure cluster heartbeat and communication! 😊

What we’re doing: Setting up Corosync for reliable cluster communication, node membership, and split-brain prevention.

# Generate cluster authentication key
corosync-keygen

# Create corosync configuration
cat > /etc/corosync/corosync.conf << 'EOF'
# Corosync Cluster Configuration
totem {
    version: 2
    cluster_name: alpine-ha-cluster
    
    # Crypto configuration
    crypto_cipher: aes256
    crypto_hash: sha256
    
    # Interface configuration
    interface {
        ringnumber: 0
        bindnetaddr: 10.0.1.0
        mcastaddr: 239.255.1.1
        mcastport: 5405
        ttl: 1
    }
    
    # Transport protocol
    transport: udpu
    
    # Timing configuration
    token: 3000
    token_retransmits_before_loss_const: 10
    join: 60
    consensus: 3600
    max_messages: 20
    
    # Compatibility
    clear_node_high_bit: yes
}

logging {
    fileline: off
    to_stderr: no
    to_logfile: yes
    logfile: /var/log/cluster/corosync.log
    to_syslog: yes
    debug: off
    timestamp: on
    logger_subsys {
        subsys: QUORUM
        debug: off
    }
}

nodelist {
    node {
        ring0_addr: 10.0.1.10
        name: node1
        nodeid: 1
    }
    
    node {
        ring0_addr: 10.0.1.11
        name: node2
        nodeid: 2
    }
    
    node {
        ring0_addr: 10.0.1.12
        name: node3
        nodeid: 3
    }
}

quorum {
    provider: corosync_votequorum
    expected_votes: 3
    two_node: 0
}
EOF

# Create cluster logging directory
mkdir -p /var/log/cluster
chown hacluster:haclient /var/log/cluster

# Set up cluster user
adduser -D -s /sbin/nologin hacluster
echo "hacluster:$(openssl rand -base64 12)" | chpasswd

# Enable and start corosync
rc-update add corosync default
service corosync start

# Check cluster status
corosync-cfgtool -s

echo "Corosync cluster configured! πŸ”—"

What this does: πŸ“– Configures Corosync for reliable cluster communication and membership management.

Example output:

Printing ring status.
Local node ID 1
RING ID 0
        id      = 10.0.1.10
        status  = ring 0 active with no faults
Corosync cluster configured! πŸ”—

What this means: Cluster communication layer is operational! βœ…

Configure Pacemaker Resource Management

Let’s set up advanced resource management! 🎯

What we’re doing: Configuring Pacemaker for intelligent resource management, failover policies, and service orchestration.

# Start pacemaker service
rc-update add pacemaker default
service pacemaker start

# Wait for pacemaker to initialize
sleep 10

# Configure cluster properties
pcs property set stonith-enabled=false
pcs property set no-quorum-policy=ignore
pcs property set start-failure-is-fatal=false
pcs property set cluster-recheck-interval=60s

# Set cluster defaults
pcs resource defaults resource-stickiness=100
pcs resource defaults migration-threshold=3

# Create resource groups for related services
pcs resource group add web-services
pcs resource group add database-services
pcs resource group add storage-services

# Configure cluster notifications
cat > /etc/pacemaker/notify.sh << 'EOF'
#!/bin/bash

# Pacemaker notification script
RECIPIENT="[email protected]"
SUBJECT="Cluster Event: $1"
MESSAGE="
Cluster Event Details:
- Event Type: $1
- Node: $2
- Resource: $3
- Timestamp: $(date)
- Details: $4
"

# Send notification (configure mail system first)
echo "$MESSAGE" | mail -s "$SUBJECT" "$RECIPIENT"

# Log to syslog
logger "CLUSTER_EVENT: $1 on $2 for resource $3"
EOF

chmod +x /etc/pacemaker/notify.sh

# Create advanced resource monitoring
cat > /etc/pacemaker/monitor-resources.sh << 'EOF'
#!/bin/bash

# Advanced resource monitoring script
LOG_FILE="/var/log/cluster/resource-monitor.log"

log_message() {
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" >> "$LOG_FILE"
}

# Check resource status
check_resources() {
    log_message "Starting resource health check"
    
    # Get cluster status
    STATUS=$(pcs status)
    
    # Check for failed resources
    FAILED=$(echo "$STATUS" | grep -i "failed\|error\|stopped")
    
    if [ -n "$FAILED" ]; then
        log_message "ALERT: Failed resources detected: $FAILED"
        /etc/pacemaker/notify.sh "RESOURCE_FAILURE" "$(hostname)" "multiple" "$FAILED"
    else
        log_message "All resources healthy"
    fi
    
    # Check node status
    OFFLINE=$(echo "$STATUS" | grep -i "offline")
    if [ -n "$OFFLINE" ]; then
        log_message "ALERT: Offline nodes detected: $OFFLINE"
        /etc/pacemaker/notify.sh "NODE_OFFLINE" "$(hostname)" "cluster" "$OFFLINE"
    fi
}

# Run health check
check_resources
EOF

chmod +x /etc/pacemaker/monitor-resources.sh

# Add monitoring to crontab
echo "*/5 * * * * /etc/pacemaker/monitor-resources.sh" | crontab -

# Check pacemaker status
pcs status

echo "Pacemaker resource management configured! πŸŽ›οΈ"

What this does: πŸ“– Sets up intelligent resource management with monitoring and alerting capabilities.

Example output:

Cluster name: alpine-ha-cluster
Cluster Summary:
  * Stack: corosync
  * Current DC: node1
  * Last updated: Wed Dec 18 10:00:00 2024
  * 3 nodes configured
  * 0 resource instances configured

Pacemaker resource management configured! πŸŽ›οΈ

What this means: Pacemaker is managing cluster resources intelligently! βœ…

πŸ“‹ Step 3: Configure Load Balancing and Failover

Set Up HAProxy Load Balancer

Let’s configure enterprise load balancing! 😊

What we’re doing: Setting up HAProxy for intelligent load distribution, health checking, and automatic failover with sticky sessions and SSL termination.

# Create HAProxy configuration
cat > /etc/haproxy/haproxy.cfg << 'EOF'
# HAProxy High Availability Configuration
global
    daemon
    user haproxy
    group haproxy
    log stdout local0
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin
    stats timeout 30s
    
    # SSL Configuration
    ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
    ssl-default-bind-options ssl-min-ver TLSv1.2 no-tls-tickets
    
    # Performance tuning
    tune.ssl.default-dh-param 2048
    tune.bufsize 32768
    tune.maxrewrite 1024

defaults
    mode http
    log global
    option httplog
    option dontlognull
    option log-health-checks
    option forwardfor except 127.0.0.0/8
    option redispatch
    
    # Timeouts
    timeout connect 5000ms
    timeout client 50000ms
    timeout server 50000ms
    timeout http-request 10s
    timeout http-keep-alive 2s
    timeout check 10s
    
    # Health check defaults
    default-server inter 3s fall 3 rise 2
    
    # Error pages
    errorfile 400 /etc/haproxy/errors/400.http
    errorfile 403 /etc/haproxy/errors/403.http
    errorfile 408 /etc/haproxy/errors/408.http
    errorfile 500 /etc/haproxy/errors/500.http
    errorfile 502 /etc/haproxy/errors/502.http
    errorfile 503 /etc/haproxy/errors/503.http
    errorfile 504 /etc/haproxy/errors/504.http

# Statistics interface
frontend stats
    bind *:8404
    stats enable
    stats uri /stats
    stats refresh 30s
    stats hide-version
    stats auth admin:$(openssl rand -base64 12)

# Frontend for web services
frontend web-frontend
    bind *:80
    bind *:443 ssl crt /etc/ssl/certs/cluster.pem
    
    # Redirect HTTP to HTTPS
    redirect scheme https if !{ ssl_fc }
    
    # Rate limiting
    stick-table type ip size 100k expire 30s store http_req_rate(10s)
    http-request track-sc0 src
    http-request reject if { sc_http_req_rate(0) gt 20 }
    
    # Default backend
    default_backend web-servers

# Backend web servers
backend web-servers
    balance roundrobin
    option httpchk GET /health
    http-check expect status 200
    
    # Stick sessions to servers
    cookie SERVERID insert indirect nocache
    
    # Server definitions
    server web1 192.168.1.20:80 check cookie web1 weight 100
    server web2 192.168.1.21:80 check cookie web2 weight 100
    server web3 192.168.1.22:80 check cookie web3 weight 100 backup

# Database load balancer (read-only queries)
frontend db-read-frontend
    bind *:5433
    mode tcp
    default_backend db-read-servers

backend db-read-servers
    mode tcp
    balance leastconn
    option tcp-check
    tcp-check connect port 5432
    
    server dbread1 192.168.1.30:5432 check weight 100
    server dbread2 192.168.1.31:5432 check weight 100
    server dbread3 192.168.1.32:5432 check weight 50 backup

# API services
frontend api-frontend
    bind *:8080
    
    # API rate limiting (higher limits)
    stick-table type ip size 100k expire 30s store http_req_rate(10s)
    http-request track-sc0 src
    http-request reject if { sc_http_req_rate(0) gt 100 }
    
    # API versioning
    acl api_v1 path_beg /api/v1
    acl api_v2 path_beg /api/v2
    
    use_backend api-v1-servers if api_v1
    use_backend api-v2-servers if api_v2
    default_backend api-v2-servers

backend api-v1-servers
    balance roundrobin
    option httpchk GET /api/v1/health
    server api1-v1 192.168.1.40:8080 check weight 100
    server api2-v1 192.168.1.41:8080 check weight 100

backend api-v2-servers
    balance roundrobin
    option httpchk GET /api/v2/health
    server api1-v2 192.168.1.42:8080 check weight 100
    server api2-v2 192.168.1.43:8080 check weight 100
    server api3-v2 192.168.1.44:8080 check weight 100
EOF

# Create HAProxy user
adduser -D -s /sbin/nologin haproxy

# Create required directories
mkdir -p /var/lib/haproxy /run/haproxy
chown haproxy:haproxy /var/lib/haproxy /run/haproxy

# Create SSL certificate directory
mkdir -p /etc/ssl/certs

# Generate self-signed certificate for testing
openssl req -x509 -newkey rsa:4096 -keyout /tmp/cluster-key.pem -out /tmp/cluster-cert.pem -days 365 -nodes -subj "/C=US/ST=State/L=City/O=Organization/CN=cluster.local"
cat /tmp/cluster-cert.pem /tmp/cluster-key.pem > /etc/ssl/certs/cluster.pem
chmod 600 /etc/ssl/certs/cluster.pem

# Create error pages
mkdir -p /etc/haproxy/errors
cat > /etc/haproxy/errors/503.http << 'EOF'
HTTP/1.0 503 Service Unavailable
Cache-Control: no-cache
Connection: close
Content-Type: text/html

<html><body><h1>503 Service Unavailable</h1>
<p>The requested service is temporarily unavailable. Please try again later.</p>
</body></html>
EOF

# Enable and start HAProxy
rc-update add haproxy default
service haproxy start

# Check HAProxy status
service haproxy status

echo "HAProxy load balancer configured! βš–οΈ"

What this does: πŸ“– Sets up enterprise-grade load balancing with SSL termination, health checking, and failover.

Example output:

 * Starting haproxy ...
 * HAProxy started successfully
HAProxy load balancer configured! βš–οΈ

What this means: Load balancer is distributing traffic intelligently across backend servers! βœ…

Configure Keepalived for VIP Management

Let’s set up virtual IP failover! 🎯

What we’re doing: Configuring Keepalived for automatic Virtual IP (VIP) failover using VRRP protocol for seamless service continuity.

# Create keepalived configuration
cat > /etc/keepalived/keepalived.conf << 'EOF'
# Keepalived High Availability Configuration
global_defs {
    router_id ALPINE_HA_01
    vrrp_skip_check_adv_addr
    vrrp_strict
    vrrp_garp_interval 0
    vrrp_gna_interval 0
    
    # Notification scripts
    notification_email {
        [email protected]
        [email protected]
    }
    notification_email_from [email protected]
    smtp_server 127.0.0.1
    smtp_connect_timeout 30
}

# Health check scripts
vrrp_script chk_haproxy {
    script "/etc/keepalived/check_haproxy.sh"
    interval 2
    weight -2
    fall 3
    rise 2
}

vrrp_script chk_nginx {
    script "/etc/keepalived/check_nginx.sh"
    interval 2
    weight -2
    fall 3
    rise 2
}

# Primary web services VIP
vrrp_instance WEB_SERVICES {
    state MASTER
    interface eth0
    virtual_router_id 101
    priority 110
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass $(openssl rand -base64 8)
    }
    
    virtual_ipaddress {
        192.168.1.100/24 dev eth0 label eth0:web
    }
    
    track_script {
        chk_haproxy
    }
    
    notify_master "/etc/keepalived/notify_master.sh WEB_SERVICES"
    notify_backup "/etc/keepalived/notify_backup.sh WEB_SERVICES"
    notify_fault "/etc/keepalived/notify_fault.sh WEB_SERVICES"
}

# Database services VIP
vrrp_instance DB_SERVICES {
    state MASTER
    interface eth0
    virtual_router_id 102
    priority 110
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass $(openssl rand -base64 8)
    }
    
    virtual_ipaddress {
        192.168.1.101/24 dev eth0 label eth0:db
    }
    
    notify_master "/etc/keepalived/notify_master.sh DB_SERVICES"
    notify_backup "/etc/keepalived/notify_backup.sh DB_SERVICES"
    notify_fault "/etc/keepalived/notify_fault.sh DB_SERVICES"
}

# API services VIP
vrrp_instance API_SERVICES {
    state MASTER
    interface eth0
    virtual_router_id 103
    priority 110
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass $(openssl rand -base64 8)
    }
    
    virtual_ipaddress {
        192.168.1.102/24 dev eth0 label eth0:api
    }
    
    notify_master "/etc/keepalived/notify_master.sh API_SERVICES"
    notify_backup "/etc/keepalived/notify_backup.sh API_SERVICES"
    notify_fault "/etc/keepalived/notify_fault.sh API_SERVICES"
}
EOF

# Create health check scripts
cat > /etc/keepalived/check_haproxy.sh << 'EOF'
#!/bin/bash
# HAProxy health check script

HAPROXY_STATS_URL="http://localhost:8404/stats"
HAPROXY_PID_FILE="/var/run/haproxy.pid"

# Check if HAProxy process is running
if [ ! -f "$HAPROXY_PID_FILE" ]; then
    exit 1
fi

PID=$(cat "$HAPROXY_PID_FILE")
if ! kill -0 "$PID" 2>/dev/null; then
    exit 1
fi

# Check HAProxy stats endpoint
if ! curl -f -s "$HAPROXY_STATS_URL" > /dev/null; then
    exit 1
fi

exit 0
EOF

cat > /etc/keepalived/check_nginx.sh << 'EOF'
#!/bin/bash
# Nginx health check script

NGINX_PID_FILE="/var/run/nginx.pid"

# Check if Nginx process is running
if [ ! -f "$NGINX_PID_FILE" ]; then
    exit 1
fi

PID=$(cat "$NGINX_PID_FILE")
if ! kill -0 "$PID" 2>/dev/null; then
    exit 1
fi

# Check if Nginx is responding
if ! curl -f -s http://localhost/ > /dev/null; then
    exit 1
fi

exit 0
EOF

# Create notification scripts
cat > /etc/keepalived/notify_master.sh << 'EOF'
#!/bin/bash
# Keepalived master notification script

SERVICE="$1"
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
HOSTNAME=$(hostname)

logger "KEEPALIVED: $HOSTNAME became MASTER for $SERVICE at $TIMESTAMP"

# Start services that should run on master
case "$SERVICE" in
    "WEB_SERVICES")
        service haproxy start
        ;;
    "DB_SERVICES")
        service postgresql start
        ;;
    "API_SERVICES")
        service nginx start
        ;;
esac

# Send notification
echo "Node $HOSTNAME became MASTER for $SERVICE at $TIMESTAMP" | \
    mail -s "HA Cluster: Master Transition" [email protected]
EOF

cat > /etc/keepalived/notify_backup.sh << 'EOF'
#!/bin/bash
# Keepalived backup notification script

SERVICE="$1"
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
HOSTNAME=$(hostname)

logger "KEEPALIVED: $HOSTNAME became BACKUP for $SERVICE at $TIMESTAMP"

# Optionally stop services on backup nodes
case "$SERVICE" in
    "WEB_SERVICES")
        # Keep HAProxy running for health checks
        ;;
    "DB_SERVICES")
        # Keep database in standby mode
        ;;
    "API_SERVICES")
        # Keep API services in standby mode
        ;;
esac
EOF

cat > /etc/keepalived/notify_fault.sh << 'EOF'
#!/bin/bash
# Keepalived fault notification script

SERVICE="$1"
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
HOSTNAME=$(hostname)

logger "KEEPALIVED: FAULT detected for $SERVICE on $HOSTNAME at $TIMESTAMP"

# Send urgent notification
echo "URGENT: Fault detected for $SERVICE on $HOSTNAME at $TIMESTAMP" | \
    mail -s "HA Cluster: FAULT ALERT" [email protected]
EOF

# Make scripts executable
chmod +x /etc/keepalived/*.sh

# Enable and start keepalived
rc-update add keepalived default
service keepalived start

# Check keepalived status
service keepalived status
ip addr show

echo "Keepalived VIP management configured! πŸ”„"

What this does: πŸ“– Sets up automatic Virtual IP failover with health monitoring and notification system.

Example output:

 * Starting keepalived ...
 * Keepalived started successfully
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    inet 192.168.1.10/24 brd 192.168.1.255 scope global eth0
    inet 192.168.1.100/24 scope global secondary eth0:web
Keepalived VIP management configured! πŸ”„

What this means: Virtual IP failover is active and monitoring service health! βœ…

πŸ“‹ Step 4: Configure Database High Availability

Set Up PostgreSQL Streaming Replication

Let’s implement database high availability! 😊

What we’re doing: Configuring PostgreSQL with streaming replication, automatic failover, and read-only replica load balancing.

# Initialize PostgreSQL cluster
su postgres -c "initdb -D /var/lib/postgresql/13/data"

# Configure PostgreSQL for replication
cat >> /var/lib/postgresql/13/data/postgresql.conf << 'EOF'

# High Availability Configuration
listen_addresses = '*'
port = 5432
max_connections = 200

# Write-Ahead Logging
wal_level = replica
archive_mode = on
archive_command = 'rsync -a %p postgres@backup-server:/backup/wal/%f'
max_wal_senders = 10
max_replication_slots = 10
wal_keep_size = 1GB

# Streaming replication
hot_standby = on
hot_standby_feedback = on
max_standby_streaming_delay = 30s

# Logging
log_destination = 'stderr'
logging_collector = on
log_directory = '/var/log/postgresql'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h '
log_min_duration_statement = 1000

# Performance tuning
shared_buffers = 256MB
effective_cache_size = 1GB
work_mem = 4MB
maintenance_work_mem = 64MB
checkpoint_completion_target = 0.9
random_page_cost = 1.1

# Connection settings
tcp_keepalives_idle = 600
tcp_keepalives_interval = 30
tcp_keepalives_count = 3
EOF

# Configure client authentication
cat > /var/lib/postgresql/13/data/pg_hba.conf << 'EOF'
# PostgreSQL High Availability HBA Configuration
# TYPE  DATABASE        USER            ADDRESS                 METHOD

# Local connections
local   all             postgres                                peer
local   all             all                                     md5

# IPv4 local connections:
host    all             all             127.0.0.1/32            md5

# Cluster network connections
host    all             all             192.168.1.0/24          md5
host    all             all             10.0.1.0/24             md5

# Replication connections
host    replication     replicator      192.168.1.0/24          md5
host    replication     replicator      10.0.1.0/24             md5

# Deny all other connections
host    all             all             0.0.0.0/0               reject
EOF

# Create replication user
su postgres -c "createuser --replication -P replicator"

# Start PostgreSQL
rc-update add postgresql default
service postgresql start

# Create sample database and tables
su postgres -c "createdb testdb"
su postgres -c "psql testdb" << 'EOF'
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    username VARCHAR(50) UNIQUE NOT NULL,
    email VARCHAR(100) UNIQUE NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE sessions (
    id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(id),
    session_token VARCHAR(255) UNIQUE NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    expires_at TIMESTAMP NOT NULL
);

INSERT INTO users (username, email) VALUES 
('admin', '[email protected]'),
('user1', '[email protected]'),
('user2', '[email protected]');
\q
EOF

echo "PostgreSQL primary server configured! 🐘"

What this does: πŸ“– Configures PostgreSQL primary server with streaming replication and performance optimization.

Example output:

PostgreSQL primary server configured! 🐘

What this means: Database primary is ready for high availability replication! βœ…

Configure PostgreSQL Standby Servers

Let’s set up replica servers! 🎯

What we’re doing: Creating PostgreSQL standby servers with automatic failover capabilities and read-only query load balancing.

# Create standby server setup script
cat > /opt/setup-postgres-standby.sh << 'EOF'
#!/bin/bash

# PostgreSQL Standby Server Setup Script
PRIMARY_HOST=${1:-192.168.1.10}
STANDBY_ID=${2:-1}

echo "Setting up PostgreSQL standby server (ID: $STANDBY_ID)"

# Stop PostgreSQL if running
service postgresql stop

# Remove existing data directory
rm -rf /var/lib/postgresql/13/data

# Create base backup from primary
su postgres -c "pg_basebackup -h $PRIMARY_HOST -D /var/lib/postgresql/13/data -U replicator -P -v -R -X stream -C -S standby_$STANDBY_ID"

# Configure recovery settings
cat > /var/lib/postgresql/13/data/standby.signal << 'EOL'
# This file indicates that this is a standby server
EOL

# Configure standby-specific settings
cat >> /var/lib/postgresql/13/data/postgresql.conf << 'EOL'

# Standby-specific configuration
primary_conninfo = 'host=$PRIMARY_HOST port=5432 user=replicator'
primary_slot_name = 'standby_$STANDBY_ID'
promote_trigger_file = '/tmp/postgresql.trigger'

# Hot standby settings
hot_standby = on
max_standby_streaming_delay = 30s
hot_standby_feedback = on

# Recovery settings
recovery_target_timeline = 'latest'
EOL

# Set proper permissions
chown -R postgres:postgres /var/lib/postgresql/13/data
chmod 700 /var/lib/postgresql/13/data

# Start standby server
service postgresql start

echo "PostgreSQL standby server configured!"
EOF

chmod +x /opt/setup-postgres-standby.sh

# Create failover script
cat > /opt/postgres-failover.sh << 'EOF'
#!/bin/bash

# PostgreSQL Failover Script
STANDBY_HOST=${1:-192.168.1.11}
NEW_PRIMARY_HOST=${2:-192.168.1.11}

echo "Initiating PostgreSQL failover to $NEW_PRIMARY_HOST"

# Create trigger file to promote standby
ssh postgres@$STANDBY_HOST "touch /tmp/postgresql.trigger"

# Wait for promotion
sleep 10

# Update application configuration to point to new primary
echo "Updating application configuration..."

# Update HAProxy backend configuration
sed -i "s/server dbprimary .*/server dbprimary $NEW_PRIMARY_HOST:5432 check weight 100/" /etc/haproxy/haproxy.cfg
service haproxy reload

# Update keepalived VIP to new primary
ssh root@$NEW_PRIMARY_HOST "service keepalived restart"

echo "PostgreSQL failover completed!"
EOF

chmod +x /opt/postgres-failover.sh

# Create monitoring script
cat > /opt/postgres-monitor.sh << 'EOF'
#!/bin/bash

# PostgreSQL Cluster Monitoring Script
LOG_FILE="/var/log/postgresql-cluster.log"

log_message() {
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" >> "$LOG_FILE"
}

check_primary() {
    PRIMARY_HOST="192.168.1.10"
    
    if ! pg_isready -h "$PRIMARY_HOST" -p 5432 -U postgres > /dev/null 2>&1; then
        log_message "ALERT: Primary database $PRIMARY_HOST is not responding"
        
        # Attempt failover to first standby
        log_message "Initiating automatic failover"
        /opt/postgres-failover.sh 192.168.1.11 192.168.1.11
        
        return 1
    fi
    
    log_message "Primary database is healthy"
    return 0
}

check_standby() {
    STANDBY_HOST="$1"
    
    if ! pg_isready -h "$STANDBY_HOST" -p 5432 -U postgres > /dev/null 2>&1; then
        log_message "WARNING: Standby database $STANDBY_HOST is not responding"
        return 1
    fi
    
    # Check replication lag
    LAG=$(psql -h "$STANDBY_HOST" -U postgres -t -c "SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp()))" 2>/dev/null)
    
    if [ -n "$LAG" ] && [ "$LAG" -gt 60 ]; then
        log_message "WARNING: Standby $STANDBY_HOST has high replication lag: ${LAG}s"
    else
        log_message "Standby database $STANDBY_HOST is healthy"
    fi
    
    return 0
}

# Main monitoring loop
log_message "Starting PostgreSQL cluster monitoring"

check_primary
check_standby "192.168.1.11"
check_standby "192.168.1.12"

log_message "Monitoring cycle completed"
EOF

chmod +x /opt/postgres-monitor.sh

# Add monitoring to crontab
echo "*/2 * * * * /opt/postgres-monitor.sh" | crontab -

echo "PostgreSQL HA cluster management configured! πŸ”„"

What this does: πŸ“– Creates comprehensive PostgreSQL HA management with automatic failover and monitoring.

Example output:

PostgreSQL HA cluster management configured! πŸ”„

What this means: Database cluster has automatic failover and comprehensive monitoring! βœ…

πŸ“‹ Step 5: Configure Storage High Availability

Set Up DRBD Block Replication

Let’s implement replicated storage! 😊

What we’re doing: Configuring DRBD (Distributed Replicated Block Device) for real-time block-level storage replication across cluster nodes.

# Load DRBD kernel module
modprobe drbd
echo "drbd" >> /etc/modules

# Create DRBD configuration
cat > /etc/drbd.d/global_common.conf << 'EOF'
# DRBD Global Configuration
global {
    usage-count yes;
    udev-always-use-vnr;
}

common {
    handlers {
        pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
        pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
        local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
    }

    startup {
        degr-wfc-timeout 60;
    }

    options {
        auto-promote yes;
    }

    disk {
        on-io-error detach;
        c-plan-ahead 1;
        c-delay-target 10;
        c-fill-target 0;
        c-max-rate 700M;
        c-min-rate 250k;
    }

    net {
        after-sb-0pri discard-younger-primary;
        after-sb-1pri discard-secondary;
        after-sb-2pri call-pri-lost-after-sb;
        protocol C;
        tcp-cork yes;
        max-buffers 20000;
        max-epoch-size 20000;
        sndbuf-size 0;
        rcvbuf-size 0;
    }
}
EOF

# Create DRBD resource configuration
cat > /etc/drbd.d/data.res << 'EOF'
# DRBD Data Resource Configuration
resource data {
    device    /dev/drbd1;
    disk      /dev/sdb1;
    meta-disk internal;
    
    on node1 {
        address   10.0.1.10:7789;
        node-id   1;
    }
    
    on node2 {
        address   10.0.1.11:7789;
        node-id   2;
    }
    
    connection-mesh {
        hosts node1 node2;
    }
}
EOF

# Initialize DRBD metadata
drbdadm create-md data

# Start DRBD service
rc-update add drbd default
service drbd start

# Enable DRBD resource
drbdadm up data

# Check DRBD status
drbdadm status

echo "DRBD storage replication configured! πŸ’Ύ"

What this does: πŸ“– Sets up real-time block storage replication for data consistency across nodes.

Example output:

data role:Secondary
  disk:Inconsistent
  peer role:Secondary
    peer-disk:Inconsistent
DRBD storage replication configured! πŸ’Ύ

What this means: Storage replication is active and synchronizing data! βœ…

Configure Shared Storage Management

Let’s set up cluster-aware storage! 🎯

What we’re doing: Implementing cluster-aware filesystems and shared storage management for applications requiring consistent data access.

# Create LVM configuration for cluster
cat > /etc/lvm/lvm.conf.backup << 'EOF'
# LVM Configuration backup
# This is a backup of the original lvm.conf
EOF

cp /etc/lvm/lvm.conf /etc/lvm/lvm.conf.backup

# Configure LVM for cluster use
sed -i 's/locking_type = 1/locking_type = 3/' /etc/lvm/lvm.conf
sed -i 's/use_lvmetad = 1/use_lvmetad = 0/' /etc/lvm/lvm.conf

# Create shared storage management script
cat > /opt/manage-shared-storage.sh << 'EOF'
#!/bin/bash

# Shared Storage Management Script
DRBD_DEVICE="/dev/drbd1"
VG_NAME="cluster-vg"
LV_NAME="shared-data"
MOUNT_POINT="/shared"

case "$1" in
    "promote")
        echo "Promoting DRBD to primary and mounting storage"
        
        # Promote DRBD to primary
        drbdadm primary data
        
        # Wait for DRBD to be ready
        sleep 5
        
        # Activate volume group
        vgchange -ay $VG_NAME
        
        # Mount filesystem
        mkdir -p $MOUNT_POINT
        mount /dev/$VG_NAME/$LV_NAME $MOUNT_POINT
        
        echo "Shared storage is now available at $MOUNT_POINT"
        ;;
        
    "demote")
        echo "Demoting DRBD and unmounting storage"
        
        # Unmount filesystem
        umount $MOUNT_POINT
        
        # Deactivate volume group
        vgchange -an $VG_NAME
        
        # Demote DRBD to secondary
        drbdadm secondary data
        
        echo "Shared storage has been safely unmounted"
        ;;
        
    "status")
        echo "Storage Status:"
        echo "==============="
        drbdadm status
        echo ""
        lvs
        echo ""
        df -h $MOUNT_POINT 2>/dev/null || echo "Shared storage not mounted"
        ;;
        
    "init")
        echo "Initializing shared storage"
        
        # Create physical volume
        pvcreate $DRBD_DEVICE
        
        # Create volume group
        vgcreate $VG_NAME $DRBD_DEVICE
        
        # Create logical volume
        lvcreate -n $LV_NAME -l 100%FREE $VG_NAME
        
        # Create filesystem
        mkfs.ext4 /dev/$VG_NAME/$LV_NAME
        
        # Create mount point
        mkdir -p $MOUNT_POINT
        
        echo "Shared storage initialized"
        ;;
        
    *)
        echo "Usage: $0 {promote|demote|status|init}"
        echo "  promote - Promote DRBD and mount shared storage"
        echo "  demote  - Unmount and demote DRBD storage"
        echo "  status  - Show storage status"
        echo "  init    - Initialize shared storage (first time only)"
        exit 1
        ;;
esac
EOF

chmod +x /opt/manage-shared-storage.sh

# Create Pacemaker resource for shared storage
cat > /opt/create-storage-resources.sh << 'EOF'
#!/bin/bash

# Create Pacemaker resources for shared storage
echo "Creating Pacemaker storage resources"

# DRBD resource
pcs resource create drbd_data ocf:linbit:drbd \
    drbd_resource=data \
    op monitor interval=60s

# DRBD master-slave set
pcs resource master drbd_data_master drbd_data \
    master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 \
    notify=true

# Filesystem resource
pcs resource create shared_fs Filesystem \
    device="/dev/cluster-vg/shared-data" \
    directory="/shared" \
    fstype="ext4" \
    op monitor interval=20s

# Create resource group
pcs resource group add storage-group shared_fs

# Set resource constraints
pcs constraint colocation add shared_fs with drbd_data_master INFINITY with-rsc-role=Master
pcs constraint order promote drbd_data_master then start shared_fs

echo "Storage resources created in Pacemaker"
EOF

chmod +x /opt/create-storage-resources.sh

# Create storage monitoring
cat > /opt/monitor-storage.sh << 'EOF'
#!/bin/bash

# Storage Health Monitoring Script
LOG_FILE="/var/log/cluster-storage.log"

log_message() {
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" >> "$LOG_FILE"
}

check_drbd() {
    # Check DRBD status
    DRBD_STATUS=$(drbdadm status data)
    
    if echo "$DRBD_STATUS" | grep -q "Inconsistent"; then
        log_message "WARNING: DRBD is in inconsistent state"
        return 1
    fi
    
    if echo "$DRBD_STATUS" | grep -q "StandAlone"; then
        log_message "ALERT: DRBD is in standalone mode"
        return 1
    fi
    
    log_message "DRBD status is healthy"
    return 0
}

check_filesystem() {
    # Check filesystem health
    if [ -d "/shared" ]; then
        FS_STATUS=$(df /shared 2>/dev/null)
        if [ $? -eq 0 ]; then
            USAGE=$(echo "$FS_STATUS" | tail -1 | awk '{print $5}' | sed 's/%//')
            if [ "$USAGE" -gt 90 ]; then
                log_message "WARNING: Shared filesystem is ${USAGE}% full"
            else
                log_message "Shared filesystem usage: ${USAGE}%"
            fi
        else
            log_message "WARNING: Shared filesystem not accessible"
            return 1
        fi
    fi
    
    return 0
}

# Run checks
log_message "Starting storage health check"
check_drbd
check_filesystem
log_message "Storage health check completed"
EOF

chmod +x /opt/monitor-storage.sh

# Add storage monitoring to crontab
echo "*/5 * * * * /opt/monitor-storage.sh" | crontab -

echo "Shared storage management configured! πŸ—„οΈ"

What this does: πŸ“– Creates comprehensive shared storage management with cluster awareness and monitoring.

Example output:

Shared storage management configured! πŸ—„οΈ

What this means: Cluster-aware storage is ready with automatic failover capabilities! βœ…

πŸ“‹ Step 6: Monitoring and Alerting

Set Up Comprehensive Monitoring

Let’s implement enterprise monitoring! 😊

What we’re doing: Setting up comprehensive monitoring with Prometheus, Grafana, and custom alerting for all HA components.

# Install monitoring stack
apk add prometheus grafana
apk add alertmanager prometheus-node-exporter

# Configure Prometheus
cat > /etc/prometheus/prometheus.yml << 'EOF'
# Prometheus Configuration for HA Cluster
global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    cluster: 'alpine-ha-cluster'
    replica: 'prometheus-1'

rule_files:
  - "ha_cluster.rules.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - localhost:9093

scrape_configs:
  # Prometheus itself
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  # Node exporters
  - job_name: 'node'
    static_configs:
      - targets: 
          - 'node1:9100'
          - 'node2:9100'
          - 'node3:9100'

  # HAProxy stats
  - job_name: 'haproxy'
    static_configs:
      - targets: ['localhost:8404']
    metrics_path: '/stats/prometheus'

  # PostgreSQL
  - job_name: 'postgresql'
    static_configs:
      - targets: 
          - 'node1:9187'
          - 'node2:9187'

  # Corosync/Pacemaker
  - job_name: 'cluster'
    static_configs:
      - targets:
          - 'node1:9664'
          - 'node2:9664'
          - 'node3:9664'

  # DRBD monitoring
  - job_name: 'drbd'
    static_configs:
      - targets:
          - 'node1:9942'
          - 'node2:9942'
EOF

# Create alerting rules
cat > /etc/prometheus/ha_cluster.rules.yml << 'EOF'
# High Availability Cluster Alerting Rules
groups:
  - name: ha_cluster
    rules:
      # Node availability
      - alert: NodeDown
        expr: up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Node {{ $labels.instance }} is down"
          description: "Node {{ $labels.instance }} has been down for more than 1 minute"

      # High CPU usage
      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"
          description: "CPU usage is above 80% for more than 5 minutes"

      # High memory usage
      - alert: HighMemoryUsage
        expr: (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 > 90
        for: 3m
        labels:
          severity: critical
        annotations:
          summary: "High memory usage on {{ $labels.instance }}"
          description: "Memory usage is above 90% for more than 3 minutes"

      # Disk space warning
      - alert: DiskSpaceWarning
        expr: (1 - node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 > 85
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "Disk space warning on {{ $labels.instance }}"
          description: "Disk usage is above 85% on {{ $labels.mountpoint }}"

      # HAProxy backend down
      - alert: HAProxyBackendDown
        expr: haproxy_backend_up == 0
        for: 30s
        labels:
          severity: critical
        annotations:
          summary: "HAProxy backend {{ $labels.proxy }} is down"
          description: "HAProxy backend {{ $labels.proxy }} has been down for more than 30 seconds"

      # PostgreSQL down
      - alert: PostgreSQLDown
        expr: pg_up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "PostgreSQL is down on {{ $labels.instance }}"
          description: "PostgreSQL has been down for more than 1 minute"

      # High replication lag
      - alert: PostgreSQLReplicationLag
        expr: pg_stat_replication_lag > 60
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High PostgreSQL replication lag"
          description: "Replication lag is {{ $value }} seconds"

      # DRBD disconnected
      - alert: DRBDDisconnected
        expr: drbd_connected == 0
        for: 30s
        labels:
          severity: critical
        annotations:
          summary: "DRBD resource {{ $labels.resource }} is disconnected"
          description: "DRBD resource has been disconnected for more than 30 seconds"

      # Cluster quorum lost
      - alert: ClusterQuorumLost
        expr: corosync_quorate == 0
        for: 30s
        labels:
          severity: critical
        annotations:
          summary: "Cluster has lost quorum"
          description: "Cluster quorum has been lost - services may be unavailable"
EOF

# Configure Alertmanager
cat > /etc/alertmanager/alertmanager.yml << 'EOF'
# Alertmanager Configuration
global:
  smtp_smarthost: 'localhost:587'
  smtp_from: '[email protected]'

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'web.hook'
  routes:
    - match:
        severity: critical
      receiver: 'critical-alerts'
    - match:
        severity: warning
      receiver: 'warning-alerts'

receivers:
  - name: 'web.hook'
    webhook_configs:
      - url: 'http://localhost:5001/'

  - name: 'critical-alerts'
    email_configs:
      - to: '[email protected]'
        subject: 'CRITICAL: {{ .GroupLabels.alertname }}'
        body: |
          {{ range .Alerts }}
          Alert: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          {{ end }}
    webhook_configs:
      - url: 'http://localhost:5001/critical'

  - name: 'warning-alerts'
    email_configs:
      - to: '[email protected]'
        subject: 'WARNING: {{ .GroupLabels.alertname }}'
        body: |
          {{ range .Alerts }}
          Alert: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          {{ end }}

inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'instance']
EOF

# Configure Grafana dashboards
mkdir -p /var/lib/grafana/dashboards

cat > /var/lib/grafana/dashboards/ha-cluster-dashboard.json << 'EOF'
{
  "dashboard": {
    "id": null,
    "title": "HA Cluster Overview",
    "tags": ["ha", "cluster"],
    "timezone": "browser",
    "panels": [
      {
        "id": 1,
        "title": "Cluster Nodes Status",
        "type": "stat",
        "targets": [
          {
            "expr": "up",
            "legendFormat": "{{instance}}"
          }
        ]
      },
      {
        "id": 2,
        "title": "CPU Usage",
        "type": "graph",
        "targets": [
          {
            "expr": "100 - (avg by(instance) (irate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)",
            "legendFormat": "{{instance}}"
          }
        ]
      },
      {
        "id": 3,
        "title": "Memory Usage",
        "type": "graph",
        "targets": [
          {
            "expr": "(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100",
            "legendFormat": "{{instance}}"
          }
        ]
      },
      {
        "id": 4,
        "title": "HAProxy Backend Status",
        "type": "table",
        "targets": [
          {
            "expr": "haproxy_backend_up",
            "legendFormat": "{{proxy}}"
          }
        ]
      }
    ],
    "time": {
      "from": "now-1h",
      "to": "now"
    },
    "refresh": "5s"
  }
}
EOF

# Enable and start monitoring services
rc-update add prometheus default
rc-update add alertmanager default
rc-update add grafana-server default
rc-update add prometheus-node-exporter default

service prometheus start
service alertmanager start
service grafana-server start
service prometheus-node-exporter start

echo "Comprehensive monitoring configured! πŸ“Š"

What this does: πŸ“– Sets up complete monitoring stack with alerting for all HA components.

Example output:

Comprehensive monitoring configured! πŸ“Š

What this means: Enterprise-grade monitoring is active with intelligent alerting! βœ…

πŸŽ‰ You’re All Set!

Congratulations! You’ve successfully configured a complete High Availability system on Alpine Linux! πŸš€

What You’ve Accomplished:

βœ… Cluster Foundation - Corosync and Pacemaker for intelligent resource management
βœ… Load Balancing - HAProxy with SSL termination and health checking
βœ… Virtual IP Failover - Keepalived with automatic VIP management
βœ… Database HA - PostgreSQL streaming replication with automatic failover
βœ… Storage Replication - DRBD block-level replication and shared storage
βœ… Comprehensive Monitoring - Prometheus, Grafana, and intelligent alerting

Architecture Overview:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Virtual IPs                         β”‚
β”‚  Web: 192.168.1.100  DB: 192.168.1.101  API: 192.168.1.102 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚                  β”‚                  β”‚
    β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”         β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”         β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”
    β”‚ Node 1 │◄───────── Node 2 │◄───────── Node 3 β”‚
    β”‚(Master)β”‚         β”‚(Standby)β”‚         β”‚(Standby)β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                  β”‚                  β”‚
    β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”         β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”         β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”
    β”‚ DRBD   │◄───────── DRBD   │◄───────── DRBD   β”‚
    β”‚Primary β”‚         β”‚Secondaryβ”‚         β”‚Secondaryβ”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Commands:

# Cluster management
pcs status
pcs resource show
pcs node standby node1

# Load balancer
service haproxy status
haproxy -c -f /etc/haproxy/haproxy.cfg

# Database failover
/opt/postgres-failover.sh

# Storage management
/opt/manage-shared-storage.sh status
drbdadm status

Next Steps:

  • πŸ”§ Application Integration - Configure your applications for HA
  • πŸ“ˆ Performance Tuning - Optimize for your specific workload
  • πŸ”’ Security Hardening - Implement additional security measures
  • πŸ“‹ Disaster Recovery - Plan for site-wide failures
  • πŸ§ͺ Testing - Regularly test failover scenarios

Your enterprise-grade High Availability system is now operational with 99.99% uptime capability! πŸ˜ŠπŸ”§