π§ Configuring High Availability Systems on Alpine Linux: Enterprise Resilience
Letβs build enterprise-grade high availability systems on Alpine Linux! π This comprehensive tutorial shows you how to implement failover clustering, load balancing, and redundancy to ensure 99.99% uptime for mission-critical applications. Perfect for system administrators managing production environments where downtime is not an option! π
π€ What is High Availability?
High Availability (HA) is a system design approach that ensures services remain operational and accessible with minimal downtime through redundant components, automatic failover mechanisms, and fault-tolerant architectures that eliminate single points of failure!
High Availability is like:
- π₯ Emergency backup systems that instantly activate when primary systems fail
- π Multiple bridge routes ensuring traffic always flows even if one path is blocked
- β‘ Uninterruptible power supply that seamlessly switches between power sources
π― What You Need
Before we start, you need:
- β Multiple Alpine Linux servers (minimum 3 for proper clustering)
- β Understanding of networking, clustering, and system administration
- β Knowledge of load balancing and failover concepts
- β Root access and network configuration capabilities
π Step 1: Install High Availability Foundation
Install Clustering and HA Packages
Letβs set up the complete HA foundation! π
What weβre doing: Installing essential clustering software, monitoring tools, and high availability components for enterprise-grade system resilience.
# Update package list
apk update
# Install clustering and HA core packages
apk add pacemaker corosync crmsh
apk add resource-agents fence-agents
# Install load balancing components
apk add haproxy keepalived
apk add nginx
# Install database clustering (PostgreSQL with replication)
apk add postgresql postgresql-contrib
apk add postgresql-client
# Install monitoring and alerting
apk add nagios-core nagios-plugins
apk add zabbix-agent prometheus node-exporter
# Install network and storage tools
apk add drbd-utils lvm2
apk add nfs-utils cifs-utils
# Install backup and synchronization
apk add rsync rdiff-backup
apk add borgbackup
# Install security and encryption
apk add openssh-server
apk add gnupg
# Verify installations
pcs --version
corosync -v
haproxy -v
echo "HA foundation installed! ποΈ"
What this does: π Installs complete high availability software stack for enterprise clustering.
Example output:
pcs-0.11.4
Corosync Cluster Engine, version '3.1.7'
HAProxy version 2.6.12-1
HA foundation installed! ποΈ
What this means: Alpine Linux is equipped with enterprise HA capabilities! β
Configure Network Infrastructure
Letβs set up redundant networking! π―
What weβre doing: Configuring network bonding, VLANs, and redundant network paths for fault-tolerant connectivity.
# Configure network bonding for redundancy
cat > /etc/network/interfaces << 'EOF'
# High Availability Network Configuration
# Loopback interface
auto lo
iface lo inet loopback
# Management network (primary)
auto eth0
iface eth0 inet static
address 192.168.1.10
netmask 255.255.255.0
gateway 192.168.1.1
dns-nameservers 8.8.8.8 8.8.4.4
# Cluster heartbeat network (dedicated)
auto eth1
iface eth1 inet static
address 10.0.1.10
netmask 255.255.255.0
# No gateway - heartbeat only
# Storage network (optional)
auto eth2
iface eth2 inet static
address 10.0.2.10
netmask 255.255.255.0
# No gateway - storage only
# Virtual IP for cluster services
auto eth0:0
iface eth0:0 inet static
address 192.168.1.100
netmask 255.255.255.0
EOF
# Configure network bonding module
echo "bonding" >> /etc/modules
# Create bonding configuration
cat > /etc/modprobe.d/bonding.conf << 'EOF'
# Network bonding configuration for HA
alias bond0 bonding
options bonding mode=active-backup miimon=100 downdelay=200 updelay=200
EOF
# Configure advanced network settings
cat > /etc/sysctl.d/99-ha-network.conf << 'EOF'
# High Availability Network Optimizations
# Enable IP forwarding
net.ipv4.ip_forward = 1
net.ipv6.conf.all.forwarding = 1
# Enable source route verification
net.ipv4.conf.all.rp_filter = 1
# Disable ICMP redirects
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.send_redirects = 0
# Increase network buffer sizes
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
# Enable TCP window scaling
net.ipv4.tcp_window_scaling = 1
# Increase connection tracking
net.netfilter.nf_conntrack_max = 1048576
EOF
# Apply network settings
sysctl -p /etc/sysctl.d/99-ha-network.conf
# Restart networking
service networking restart
echo "Network infrastructure configured! π"
What this does: π Sets up redundant network infrastructure with dedicated heartbeat and management networks.
Example output:
Network infrastructure configured! π
What this means: Network foundation is ready for HA clustering! β
π Step 2: Configure Cluster Foundation
Set Up Corosync Cluster Communication
Letβs configure cluster heartbeat and communication! π
What weβre doing: Setting up Corosync for reliable cluster communication, node membership, and split-brain prevention.
# Generate cluster authentication key
corosync-keygen
# Create corosync configuration
cat > /etc/corosync/corosync.conf << 'EOF'
# Corosync Cluster Configuration
totem {
version: 2
cluster_name: alpine-ha-cluster
# Crypto configuration
crypto_cipher: aes256
crypto_hash: sha256
# Interface configuration
interface {
ringnumber: 0
bindnetaddr: 10.0.1.0
mcastaddr: 239.255.1.1
mcastport: 5405
ttl: 1
}
# Transport protocol
transport: udpu
# Timing configuration
token: 3000
token_retransmits_before_loss_const: 10
join: 60
consensus: 3600
max_messages: 20
# Compatibility
clear_node_high_bit: yes
}
logging {
fileline: off
to_stderr: no
to_logfile: yes
logfile: /var/log/cluster/corosync.log
to_syslog: yes
debug: off
timestamp: on
logger_subsys {
subsys: QUORUM
debug: off
}
}
nodelist {
node {
ring0_addr: 10.0.1.10
name: node1
nodeid: 1
}
node {
ring0_addr: 10.0.1.11
name: node2
nodeid: 2
}
node {
ring0_addr: 10.0.1.12
name: node3
nodeid: 3
}
}
quorum {
provider: corosync_votequorum
expected_votes: 3
two_node: 0
}
EOF
# Create cluster logging directory
mkdir -p /var/log/cluster
chown hacluster:haclient /var/log/cluster
# Set up cluster user
adduser -D -s /sbin/nologin hacluster
echo "hacluster:$(openssl rand -base64 12)" | chpasswd
# Enable and start corosync
rc-update add corosync default
service corosync start
# Check cluster status
corosync-cfgtool -s
echo "Corosync cluster configured! π"
What this does: π Configures Corosync for reliable cluster communication and membership management.
Example output:
Printing ring status.
Local node ID 1
RING ID 0
id = 10.0.1.10
status = ring 0 active with no faults
Corosync cluster configured! π
What this means: Cluster communication layer is operational! β
Configure Pacemaker Resource Management
Letβs set up advanced resource management! π―
What weβre doing: Configuring Pacemaker for intelligent resource management, failover policies, and service orchestration.
# Start pacemaker service
rc-update add pacemaker default
service pacemaker start
# Wait for pacemaker to initialize
sleep 10
# Configure cluster properties
pcs property set stonith-enabled=false
pcs property set no-quorum-policy=ignore
pcs property set start-failure-is-fatal=false
pcs property set cluster-recheck-interval=60s
# Set cluster defaults
pcs resource defaults resource-stickiness=100
pcs resource defaults migration-threshold=3
# Create resource groups for related services
pcs resource group add web-services
pcs resource group add database-services
pcs resource group add storage-services
# Configure cluster notifications
cat > /etc/pacemaker/notify.sh << 'EOF'
#!/bin/bash
# Pacemaker notification script
RECIPIENT="[email protected]"
SUBJECT="Cluster Event: $1"
MESSAGE="
Cluster Event Details:
- Event Type: $1
- Node: $2
- Resource: $3
- Timestamp: $(date)
- Details: $4
"
# Send notification (configure mail system first)
echo "$MESSAGE" | mail -s "$SUBJECT" "$RECIPIENT"
# Log to syslog
logger "CLUSTER_EVENT: $1 on $2 for resource $3"
EOF
chmod +x /etc/pacemaker/notify.sh
# Create advanced resource monitoring
cat > /etc/pacemaker/monitor-resources.sh << 'EOF'
#!/bin/bash
# Advanced resource monitoring script
LOG_FILE="/var/log/cluster/resource-monitor.log"
log_message() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" >> "$LOG_FILE"
}
# Check resource status
check_resources() {
log_message "Starting resource health check"
# Get cluster status
STATUS=$(pcs status)
# Check for failed resources
FAILED=$(echo "$STATUS" | grep -i "failed\|error\|stopped")
if [ -n "$FAILED" ]; then
log_message "ALERT: Failed resources detected: $FAILED"
/etc/pacemaker/notify.sh "RESOURCE_FAILURE" "$(hostname)" "multiple" "$FAILED"
else
log_message "All resources healthy"
fi
# Check node status
OFFLINE=$(echo "$STATUS" | grep -i "offline")
if [ -n "$OFFLINE" ]; then
log_message "ALERT: Offline nodes detected: $OFFLINE"
/etc/pacemaker/notify.sh "NODE_OFFLINE" "$(hostname)" "cluster" "$OFFLINE"
fi
}
# Run health check
check_resources
EOF
chmod +x /etc/pacemaker/monitor-resources.sh
# Add monitoring to crontab
echo "*/5 * * * * /etc/pacemaker/monitor-resources.sh" | crontab -
# Check pacemaker status
pcs status
echo "Pacemaker resource management configured! ποΈ"
What this does: π Sets up intelligent resource management with monitoring and alerting capabilities.
Example output:
Cluster name: alpine-ha-cluster
Cluster Summary:
* Stack: corosync
* Current DC: node1
* Last updated: Wed Dec 18 10:00:00 2024
* 3 nodes configured
* 0 resource instances configured
Pacemaker resource management configured! ποΈ
What this means: Pacemaker is managing cluster resources intelligently! β
π Step 3: Configure Load Balancing and Failover
Set Up HAProxy Load Balancer
Letβs configure enterprise load balancing! π
What weβre doing: Setting up HAProxy for intelligent load distribution, health checking, and automatic failover with sticky sessions and SSL termination.
# Create HAProxy configuration
cat > /etc/haproxy/haproxy.cfg << 'EOF'
# HAProxy High Availability Configuration
global
daemon
user haproxy
group haproxy
log stdout local0
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
# SSL Configuration
ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
ssl-default-bind-options ssl-min-ver TLSv1.2 no-tls-tickets
# Performance tuning
tune.ssl.default-dh-param 2048
tune.bufsize 32768
tune.maxrewrite 1024
defaults
mode http
log global
option httplog
option dontlognull
option log-health-checks
option forwardfor except 127.0.0.0/8
option redispatch
# Timeouts
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms
timeout http-request 10s
timeout http-keep-alive 2s
timeout check 10s
# Health check defaults
default-server inter 3s fall 3 rise 2
# Error pages
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
# Statistics interface
frontend stats
bind *:8404
stats enable
stats uri /stats
stats refresh 30s
stats hide-version
stats auth admin:$(openssl rand -base64 12)
# Frontend for web services
frontend web-frontend
bind *:80
bind *:443 ssl crt /etc/ssl/certs/cluster.pem
# Redirect HTTP to HTTPS
redirect scheme https if !{ ssl_fc }
# Rate limiting
stick-table type ip size 100k expire 30s store http_req_rate(10s)
http-request track-sc0 src
http-request reject if { sc_http_req_rate(0) gt 20 }
# Default backend
default_backend web-servers
# Backend web servers
backend web-servers
balance roundrobin
option httpchk GET /health
http-check expect status 200
# Stick sessions to servers
cookie SERVERID insert indirect nocache
# Server definitions
server web1 192.168.1.20:80 check cookie web1 weight 100
server web2 192.168.1.21:80 check cookie web2 weight 100
server web3 192.168.1.22:80 check cookie web3 weight 100 backup
# Database load balancer (read-only queries)
frontend db-read-frontend
bind *:5433
mode tcp
default_backend db-read-servers
backend db-read-servers
mode tcp
balance leastconn
option tcp-check
tcp-check connect port 5432
server dbread1 192.168.1.30:5432 check weight 100
server dbread2 192.168.1.31:5432 check weight 100
server dbread3 192.168.1.32:5432 check weight 50 backup
# API services
frontend api-frontend
bind *:8080
# API rate limiting (higher limits)
stick-table type ip size 100k expire 30s store http_req_rate(10s)
http-request track-sc0 src
http-request reject if { sc_http_req_rate(0) gt 100 }
# API versioning
acl api_v1 path_beg /api/v1
acl api_v2 path_beg /api/v2
use_backend api-v1-servers if api_v1
use_backend api-v2-servers if api_v2
default_backend api-v2-servers
backend api-v1-servers
balance roundrobin
option httpchk GET /api/v1/health
server api1-v1 192.168.1.40:8080 check weight 100
server api2-v1 192.168.1.41:8080 check weight 100
backend api-v2-servers
balance roundrobin
option httpchk GET /api/v2/health
server api1-v2 192.168.1.42:8080 check weight 100
server api2-v2 192.168.1.43:8080 check weight 100
server api3-v2 192.168.1.44:8080 check weight 100
EOF
# Create HAProxy user
adduser -D -s /sbin/nologin haproxy
# Create required directories
mkdir -p /var/lib/haproxy /run/haproxy
chown haproxy:haproxy /var/lib/haproxy /run/haproxy
# Create SSL certificate directory
mkdir -p /etc/ssl/certs
# Generate self-signed certificate for testing
openssl req -x509 -newkey rsa:4096 -keyout /tmp/cluster-key.pem -out /tmp/cluster-cert.pem -days 365 -nodes -subj "/C=US/ST=State/L=City/O=Organization/CN=cluster.local"
cat /tmp/cluster-cert.pem /tmp/cluster-key.pem > /etc/ssl/certs/cluster.pem
chmod 600 /etc/ssl/certs/cluster.pem
# Create error pages
mkdir -p /etc/haproxy/errors
cat > /etc/haproxy/errors/503.http << 'EOF'
HTTP/1.0 503 Service Unavailable
Cache-Control: no-cache
Connection: close
Content-Type: text/html
<html><body><h1>503 Service Unavailable</h1>
<p>The requested service is temporarily unavailable. Please try again later.</p>
</body></html>
EOF
# Enable and start HAProxy
rc-update add haproxy default
service haproxy start
# Check HAProxy status
service haproxy status
echo "HAProxy load balancer configured! βοΈ"
What this does: π Sets up enterprise-grade load balancing with SSL termination, health checking, and failover.
Example output:
* Starting haproxy ...
* HAProxy started successfully
HAProxy load balancer configured! βοΈ
What this means: Load balancer is distributing traffic intelligently across backend servers! β
Configure Keepalived for VIP Management
Letβs set up virtual IP failover! π―
What weβre doing: Configuring Keepalived for automatic Virtual IP (VIP) failover using VRRP protocol for seamless service continuity.
# Create keepalived configuration
cat > /etc/keepalived/keepalived.conf << 'EOF'
# Keepalived High Availability Configuration
global_defs {
router_id ALPINE_HA_01
vrrp_skip_check_adv_addr
vrrp_strict
vrrp_garp_interval 0
vrrp_gna_interval 0
# Notification scripts
notification_email {
[email protected]
[email protected]
}
notification_email_from [email protected]
smtp_server 127.0.0.1
smtp_connect_timeout 30
}
# Health check scripts
vrrp_script chk_haproxy {
script "/etc/keepalived/check_haproxy.sh"
interval 2
weight -2
fall 3
rise 2
}
vrrp_script chk_nginx {
script "/etc/keepalived/check_nginx.sh"
interval 2
weight -2
fall 3
rise 2
}
# Primary web services VIP
vrrp_instance WEB_SERVICES {
state MASTER
interface eth0
virtual_router_id 101
priority 110
advert_int 1
authentication {
auth_type PASS
auth_pass $(openssl rand -base64 8)
}
virtual_ipaddress {
192.168.1.100/24 dev eth0 label eth0:web
}
track_script {
chk_haproxy
}
notify_master "/etc/keepalived/notify_master.sh WEB_SERVICES"
notify_backup "/etc/keepalived/notify_backup.sh WEB_SERVICES"
notify_fault "/etc/keepalived/notify_fault.sh WEB_SERVICES"
}
# Database services VIP
vrrp_instance DB_SERVICES {
state MASTER
interface eth0
virtual_router_id 102
priority 110
advert_int 1
authentication {
auth_type PASS
auth_pass $(openssl rand -base64 8)
}
virtual_ipaddress {
192.168.1.101/24 dev eth0 label eth0:db
}
notify_master "/etc/keepalived/notify_master.sh DB_SERVICES"
notify_backup "/etc/keepalived/notify_backup.sh DB_SERVICES"
notify_fault "/etc/keepalived/notify_fault.sh DB_SERVICES"
}
# API services VIP
vrrp_instance API_SERVICES {
state MASTER
interface eth0
virtual_router_id 103
priority 110
advert_int 1
authentication {
auth_type PASS
auth_pass $(openssl rand -base64 8)
}
virtual_ipaddress {
192.168.1.102/24 dev eth0 label eth0:api
}
notify_master "/etc/keepalived/notify_master.sh API_SERVICES"
notify_backup "/etc/keepalived/notify_backup.sh API_SERVICES"
notify_fault "/etc/keepalived/notify_fault.sh API_SERVICES"
}
EOF
# Create health check scripts
cat > /etc/keepalived/check_haproxy.sh << 'EOF'
#!/bin/bash
# HAProxy health check script
HAPROXY_STATS_URL="http://localhost:8404/stats"
HAPROXY_PID_FILE="/var/run/haproxy.pid"
# Check if HAProxy process is running
if [ ! -f "$HAPROXY_PID_FILE" ]; then
exit 1
fi
PID=$(cat "$HAPROXY_PID_FILE")
if ! kill -0 "$PID" 2>/dev/null; then
exit 1
fi
# Check HAProxy stats endpoint
if ! curl -f -s "$HAPROXY_STATS_URL" > /dev/null; then
exit 1
fi
exit 0
EOF
cat > /etc/keepalived/check_nginx.sh << 'EOF'
#!/bin/bash
# Nginx health check script
NGINX_PID_FILE="/var/run/nginx.pid"
# Check if Nginx process is running
if [ ! -f "$NGINX_PID_FILE" ]; then
exit 1
fi
PID=$(cat "$NGINX_PID_FILE")
if ! kill -0 "$PID" 2>/dev/null; then
exit 1
fi
# Check if Nginx is responding
if ! curl -f -s http://localhost/ > /dev/null; then
exit 1
fi
exit 0
EOF
# Create notification scripts
cat > /etc/keepalived/notify_master.sh << 'EOF'
#!/bin/bash
# Keepalived master notification script
SERVICE="$1"
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
HOSTNAME=$(hostname)
logger "KEEPALIVED: $HOSTNAME became MASTER for $SERVICE at $TIMESTAMP"
# Start services that should run on master
case "$SERVICE" in
"WEB_SERVICES")
service haproxy start
;;
"DB_SERVICES")
service postgresql start
;;
"API_SERVICES")
service nginx start
;;
esac
# Send notification
echo "Node $HOSTNAME became MASTER for $SERVICE at $TIMESTAMP" | \
mail -s "HA Cluster: Master Transition" [email protected]
EOF
cat > /etc/keepalived/notify_backup.sh << 'EOF'
#!/bin/bash
# Keepalived backup notification script
SERVICE="$1"
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
HOSTNAME=$(hostname)
logger "KEEPALIVED: $HOSTNAME became BACKUP for $SERVICE at $TIMESTAMP"
# Optionally stop services on backup nodes
case "$SERVICE" in
"WEB_SERVICES")
# Keep HAProxy running for health checks
;;
"DB_SERVICES")
# Keep database in standby mode
;;
"API_SERVICES")
# Keep API services in standby mode
;;
esac
EOF
cat > /etc/keepalived/notify_fault.sh << 'EOF'
#!/bin/bash
# Keepalived fault notification script
SERVICE="$1"
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
HOSTNAME=$(hostname)
logger "KEEPALIVED: FAULT detected for $SERVICE on $HOSTNAME at $TIMESTAMP"
# Send urgent notification
echo "URGENT: Fault detected for $SERVICE on $HOSTNAME at $TIMESTAMP" | \
mail -s "HA Cluster: FAULT ALERT" [email protected]
EOF
# Make scripts executable
chmod +x /etc/keepalived/*.sh
# Enable and start keepalived
rc-update add keepalived default
service keepalived start
# Check keepalived status
service keepalived status
ip addr show
echo "Keepalived VIP management configured! π"
What this does: π Sets up automatic Virtual IP failover with health monitoring and notification system.
Example output:
* Starting keepalived ...
* Keepalived started successfully
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
inet 192.168.1.10/24 brd 192.168.1.255 scope global eth0
inet 192.168.1.100/24 scope global secondary eth0:web
Keepalived VIP management configured! π
What this means: Virtual IP failover is active and monitoring service health! β
π Step 4: Configure Database High Availability
Set Up PostgreSQL Streaming Replication
Letβs implement database high availability! π
What weβre doing: Configuring PostgreSQL with streaming replication, automatic failover, and read-only replica load balancing.
# Initialize PostgreSQL cluster
su postgres -c "initdb -D /var/lib/postgresql/13/data"
# Configure PostgreSQL for replication
cat >> /var/lib/postgresql/13/data/postgresql.conf << 'EOF'
# High Availability Configuration
listen_addresses = '*'
port = 5432
max_connections = 200
# Write-Ahead Logging
wal_level = replica
archive_mode = on
archive_command = 'rsync -a %p postgres@backup-server:/backup/wal/%f'
max_wal_senders = 10
max_replication_slots = 10
wal_keep_size = 1GB
# Streaming replication
hot_standby = on
hot_standby_feedback = on
max_standby_streaming_delay = 30s
# Logging
log_destination = 'stderr'
logging_collector = on
log_directory = '/var/log/postgresql'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h '
log_min_duration_statement = 1000
# Performance tuning
shared_buffers = 256MB
effective_cache_size = 1GB
work_mem = 4MB
maintenance_work_mem = 64MB
checkpoint_completion_target = 0.9
random_page_cost = 1.1
# Connection settings
tcp_keepalives_idle = 600
tcp_keepalives_interval = 30
tcp_keepalives_count = 3
EOF
# Configure client authentication
cat > /var/lib/postgresql/13/data/pg_hba.conf << 'EOF'
# PostgreSQL High Availability HBA Configuration
# TYPE DATABASE USER ADDRESS METHOD
# Local connections
local all postgres peer
local all all md5
# IPv4 local connections:
host all all 127.0.0.1/32 md5
# Cluster network connections
host all all 192.168.1.0/24 md5
host all all 10.0.1.0/24 md5
# Replication connections
host replication replicator 192.168.1.0/24 md5
host replication replicator 10.0.1.0/24 md5
# Deny all other connections
host all all 0.0.0.0/0 reject
EOF
# Create replication user
su postgres -c "createuser --replication -P replicator"
# Start PostgreSQL
rc-update add postgresql default
service postgresql start
# Create sample database and tables
su postgres -c "createdb testdb"
su postgres -c "psql testdb" << 'EOF'
CREATE TABLE users (
id SERIAL PRIMARY KEY,
username VARCHAR(50) UNIQUE NOT NULL,
email VARCHAR(100) UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE sessions (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id),
session_token VARCHAR(255) UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
expires_at TIMESTAMP NOT NULL
);
INSERT INTO users (username, email) VALUES
('admin', '[email protected]'),
('user1', '[email protected]'),
('user2', '[email protected]');
\q
EOF
echo "PostgreSQL primary server configured! π"
What this does: π Configures PostgreSQL primary server with streaming replication and performance optimization.
Example output:
PostgreSQL primary server configured! π
What this means: Database primary is ready for high availability replication! β
Configure PostgreSQL Standby Servers
Letβs set up replica servers! π―
What weβre doing: Creating PostgreSQL standby servers with automatic failover capabilities and read-only query load balancing.
# Create standby server setup script
cat > /opt/setup-postgres-standby.sh << 'EOF'
#!/bin/bash
# PostgreSQL Standby Server Setup Script
PRIMARY_HOST=${1:-192.168.1.10}
STANDBY_ID=${2:-1}
echo "Setting up PostgreSQL standby server (ID: $STANDBY_ID)"
# Stop PostgreSQL if running
service postgresql stop
# Remove existing data directory
rm -rf /var/lib/postgresql/13/data
# Create base backup from primary
su postgres -c "pg_basebackup -h $PRIMARY_HOST -D /var/lib/postgresql/13/data -U replicator -P -v -R -X stream -C -S standby_$STANDBY_ID"
# Configure recovery settings
cat > /var/lib/postgresql/13/data/standby.signal << 'EOL'
# This file indicates that this is a standby server
EOL
# Configure standby-specific settings
cat >> /var/lib/postgresql/13/data/postgresql.conf << 'EOL'
# Standby-specific configuration
primary_conninfo = 'host=$PRIMARY_HOST port=5432 user=replicator'
primary_slot_name = 'standby_$STANDBY_ID'
promote_trigger_file = '/tmp/postgresql.trigger'
# Hot standby settings
hot_standby = on
max_standby_streaming_delay = 30s
hot_standby_feedback = on
# Recovery settings
recovery_target_timeline = 'latest'
EOL
# Set proper permissions
chown -R postgres:postgres /var/lib/postgresql/13/data
chmod 700 /var/lib/postgresql/13/data
# Start standby server
service postgresql start
echo "PostgreSQL standby server configured!"
EOF
chmod +x /opt/setup-postgres-standby.sh
# Create failover script
cat > /opt/postgres-failover.sh << 'EOF'
#!/bin/bash
# PostgreSQL Failover Script
STANDBY_HOST=${1:-192.168.1.11}
NEW_PRIMARY_HOST=${2:-192.168.1.11}
echo "Initiating PostgreSQL failover to $NEW_PRIMARY_HOST"
# Create trigger file to promote standby
ssh postgres@$STANDBY_HOST "touch /tmp/postgresql.trigger"
# Wait for promotion
sleep 10
# Update application configuration to point to new primary
echo "Updating application configuration..."
# Update HAProxy backend configuration
sed -i "s/server dbprimary .*/server dbprimary $NEW_PRIMARY_HOST:5432 check weight 100/" /etc/haproxy/haproxy.cfg
service haproxy reload
# Update keepalived VIP to new primary
ssh root@$NEW_PRIMARY_HOST "service keepalived restart"
echo "PostgreSQL failover completed!"
EOF
chmod +x /opt/postgres-failover.sh
# Create monitoring script
cat > /opt/postgres-monitor.sh << 'EOF'
#!/bin/bash
# PostgreSQL Cluster Monitoring Script
LOG_FILE="/var/log/postgresql-cluster.log"
log_message() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" >> "$LOG_FILE"
}
check_primary() {
PRIMARY_HOST="192.168.1.10"
if ! pg_isready -h "$PRIMARY_HOST" -p 5432 -U postgres > /dev/null 2>&1; then
log_message "ALERT: Primary database $PRIMARY_HOST is not responding"
# Attempt failover to first standby
log_message "Initiating automatic failover"
/opt/postgres-failover.sh 192.168.1.11 192.168.1.11
return 1
fi
log_message "Primary database is healthy"
return 0
}
check_standby() {
STANDBY_HOST="$1"
if ! pg_isready -h "$STANDBY_HOST" -p 5432 -U postgres > /dev/null 2>&1; then
log_message "WARNING: Standby database $STANDBY_HOST is not responding"
return 1
fi
# Check replication lag
LAG=$(psql -h "$STANDBY_HOST" -U postgres -t -c "SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp()))" 2>/dev/null)
if [ -n "$LAG" ] && [ "$LAG" -gt 60 ]; then
log_message "WARNING: Standby $STANDBY_HOST has high replication lag: ${LAG}s"
else
log_message "Standby database $STANDBY_HOST is healthy"
fi
return 0
}
# Main monitoring loop
log_message "Starting PostgreSQL cluster monitoring"
check_primary
check_standby "192.168.1.11"
check_standby "192.168.1.12"
log_message "Monitoring cycle completed"
EOF
chmod +x /opt/postgres-monitor.sh
# Add monitoring to crontab
echo "*/2 * * * * /opt/postgres-monitor.sh" | crontab -
echo "PostgreSQL HA cluster management configured! π"
What this does: π Creates comprehensive PostgreSQL HA management with automatic failover and monitoring.
Example output:
PostgreSQL HA cluster management configured! π
What this means: Database cluster has automatic failover and comprehensive monitoring! β
π Step 5: Configure Storage High Availability
Set Up DRBD Block Replication
Letβs implement replicated storage! π
What weβre doing: Configuring DRBD (Distributed Replicated Block Device) for real-time block-level storage replication across cluster nodes.
# Load DRBD kernel module
modprobe drbd
echo "drbd" >> /etc/modules
# Create DRBD configuration
cat > /etc/drbd.d/global_common.conf << 'EOF'
# DRBD Global Configuration
global {
usage-count yes;
udev-always-use-vnr;
}
common {
handlers {
pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
}
startup {
degr-wfc-timeout 60;
}
options {
auto-promote yes;
}
disk {
on-io-error detach;
c-plan-ahead 1;
c-delay-target 10;
c-fill-target 0;
c-max-rate 700M;
c-min-rate 250k;
}
net {
after-sb-0pri discard-younger-primary;
after-sb-1pri discard-secondary;
after-sb-2pri call-pri-lost-after-sb;
protocol C;
tcp-cork yes;
max-buffers 20000;
max-epoch-size 20000;
sndbuf-size 0;
rcvbuf-size 0;
}
}
EOF
# Create DRBD resource configuration
cat > /etc/drbd.d/data.res << 'EOF'
# DRBD Data Resource Configuration
resource data {
device /dev/drbd1;
disk /dev/sdb1;
meta-disk internal;
on node1 {
address 10.0.1.10:7789;
node-id 1;
}
on node2 {
address 10.0.1.11:7789;
node-id 2;
}
connection-mesh {
hosts node1 node2;
}
}
EOF
# Initialize DRBD metadata
drbdadm create-md data
# Start DRBD service
rc-update add drbd default
service drbd start
# Enable DRBD resource
drbdadm up data
# Check DRBD status
drbdadm status
echo "DRBD storage replication configured! πΎ"
What this does: π Sets up real-time block storage replication for data consistency across nodes.
Example output:
data role:Secondary
disk:Inconsistent
peer role:Secondary
peer-disk:Inconsistent
DRBD storage replication configured! πΎ
What this means: Storage replication is active and synchronizing data! β
Configure Shared Storage Management
Letβs set up cluster-aware storage! π―
What weβre doing: Implementing cluster-aware filesystems and shared storage management for applications requiring consistent data access.
# Create LVM configuration for cluster
cat > /etc/lvm/lvm.conf.backup << 'EOF'
# LVM Configuration backup
# This is a backup of the original lvm.conf
EOF
cp /etc/lvm/lvm.conf /etc/lvm/lvm.conf.backup
# Configure LVM for cluster use
sed -i 's/locking_type = 1/locking_type = 3/' /etc/lvm/lvm.conf
sed -i 's/use_lvmetad = 1/use_lvmetad = 0/' /etc/lvm/lvm.conf
# Create shared storage management script
cat > /opt/manage-shared-storage.sh << 'EOF'
#!/bin/bash
# Shared Storage Management Script
DRBD_DEVICE="/dev/drbd1"
VG_NAME="cluster-vg"
LV_NAME="shared-data"
MOUNT_POINT="/shared"
case "$1" in
"promote")
echo "Promoting DRBD to primary and mounting storage"
# Promote DRBD to primary
drbdadm primary data
# Wait for DRBD to be ready
sleep 5
# Activate volume group
vgchange -ay $VG_NAME
# Mount filesystem
mkdir -p $MOUNT_POINT
mount /dev/$VG_NAME/$LV_NAME $MOUNT_POINT
echo "Shared storage is now available at $MOUNT_POINT"
;;
"demote")
echo "Demoting DRBD and unmounting storage"
# Unmount filesystem
umount $MOUNT_POINT
# Deactivate volume group
vgchange -an $VG_NAME
# Demote DRBD to secondary
drbdadm secondary data
echo "Shared storage has been safely unmounted"
;;
"status")
echo "Storage Status:"
echo "==============="
drbdadm status
echo ""
lvs
echo ""
df -h $MOUNT_POINT 2>/dev/null || echo "Shared storage not mounted"
;;
"init")
echo "Initializing shared storage"
# Create physical volume
pvcreate $DRBD_DEVICE
# Create volume group
vgcreate $VG_NAME $DRBD_DEVICE
# Create logical volume
lvcreate -n $LV_NAME -l 100%FREE $VG_NAME
# Create filesystem
mkfs.ext4 /dev/$VG_NAME/$LV_NAME
# Create mount point
mkdir -p $MOUNT_POINT
echo "Shared storage initialized"
;;
*)
echo "Usage: $0 {promote|demote|status|init}"
echo " promote - Promote DRBD and mount shared storage"
echo " demote - Unmount and demote DRBD storage"
echo " status - Show storage status"
echo " init - Initialize shared storage (first time only)"
exit 1
;;
esac
EOF
chmod +x /opt/manage-shared-storage.sh
# Create Pacemaker resource for shared storage
cat > /opt/create-storage-resources.sh << 'EOF'
#!/bin/bash
# Create Pacemaker resources for shared storage
echo "Creating Pacemaker storage resources"
# DRBD resource
pcs resource create drbd_data ocf:linbit:drbd \
drbd_resource=data \
op monitor interval=60s
# DRBD master-slave set
pcs resource master drbd_data_master drbd_data \
master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 \
notify=true
# Filesystem resource
pcs resource create shared_fs Filesystem \
device="/dev/cluster-vg/shared-data" \
directory="/shared" \
fstype="ext4" \
op monitor interval=20s
# Create resource group
pcs resource group add storage-group shared_fs
# Set resource constraints
pcs constraint colocation add shared_fs with drbd_data_master INFINITY with-rsc-role=Master
pcs constraint order promote drbd_data_master then start shared_fs
echo "Storage resources created in Pacemaker"
EOF
chmod +x /opt/create-storage-resources.sh
# Create storage monitoring
cat > /opt/monitor-storage.sh << 'EOF'
#!/bin/bash
# Storage Health Monitoring Script
LOG_FILE="/var/log/cluster-storage.log"
log_message() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" >> "$LOG_FILE"
}
check_drbd() {
# Check DRBD status
DRBD_STATUS=$(drbdadm status data)
if echo "$DRBD_STATUS" | grep -q "Inconsistent"; then
log_message "WARNING: DRBD is in inconsistent state"
return 1
fi
if echo "$DRBD_STATUS" | grep -q "StandAlone"; then
log_message "ALERT: DRBD is in standalone mode"
return 1
fi
log_message "DRBD status is healthy"
return 0
}
check_filesystem() {
# Check filesystem health
if [ -d "/shared" ]; then
FS_STATUS=$(df /shared 2>/dev/null)
if [ $? -eq 0 ]; then
USAGE=$(echo "$FS_STATUS" | tail -1 | awk '{print $5}' | sed 's/%//')
if [ "$USAGE" -gt 90 ]; then
log_message "WARNING: Shared filesystem is ${USAGE}% full"
else
log_message "Shared filesystem usage: ${USAGE}%"
fi
else
log_message "WARNING: Shared filesystem not accessible"
return 1
fi
fi
return 0
}
# Run checks
log_message "Starting storage health check"
check_drbd
check_filesystem
log_message "Storage health check completed"
EOF
chmod +x /opt/monitor-storage.sh
# Add storage monitoring to crontab
echo "*/5 * * * * /opt/monitor-storage.sh" | crontab -
echo "Shared storage management configured! ποΈ"
What this does: π Creates comprehensive shared storage management with cluster awareness and monitoring.
Example output:
Shared storage management configured! ποΈ
What this means: Cluster-aware storage is ready with automatic failover capabilities! β
π Step 6: Monitoring and Alerting
Set Up Comprehensive Monitoring
Letβs implement enterprise monitoring! π
What weβre doing: Setting up comprehensive monitoring with Prometheus, Grafana, and custom alerting for all HA components.
# Install monitoring stack
apk add prometheus grafana
apk add alertmanager prometheus-node-exporter
# Configure Prometheus
cat > /etc/prometheus/prometheus.yml << 'EOF'
# Prometheus Configuration for HA Cluster
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
cluster: 'alpine-ha-cluster'
replica: 'prometheus-1'
rule_files:
- "ha_cluster.rules.yml"
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
scrape_configs:
# Prometheus itself
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# Node exporters
- job_name: 'node'
static_configs:
- targets:
- 'node1:9100'
- 'node2:9100'
- 'node3:9100'
# HAProxy stats
- job_name: 'haproxy'
static_configs:
- targets: ['localhost:8404']
metrics_path: '/stats/prometheus'
# PostgreSQL
- job_name: 'postgresql'
static_configs:
- targets:
- 'node1:9187'
- 'node2:9187'
# Corosync/Pacemaker
- job_name: 'cluster'
static_configs:
- targets:
- 'node1:9664'
- 'node2:9664'
- 'node3:9664'
# DRBD monitoring
- job_name: 'drbd'
static_configs:
- targets:
- 'node1:9942'
- 'node2:9942'
EOF
# Create alerting rules
cat > /etc/prometheus/ha_cluster.rules.yml << 'EOF'
# High Availability Cluster Alerting Rules
groups:
- name: ha_cluster
rules:
# Node availability
- alert: NodeDown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Node {{ $labels.instance }} is down"
description: "Node {{ $labels.instance }} has been down for more than 1 minute"
# High CPU usage
- alert: HighCPUUsage
expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
description: "CPU usage is above 80% for more than 5 minutes"
# High memory usage
- alert: HighMemoryUsage
expr: (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 > 90
for: 3m
labels:
severity: critical
annotations:
summary: "High memory usage on {{ $labels.instance }}"
description: "Memory usage is above 90% for more than 3 minutes"
# Disk space warning
- alert: DiskSpaceWarning
expr: (1 - node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 > 85
for: 2m
labels:
severity: warning
annotations:
summary: "Disk space warning on {{ $labels.instance }}"
description: "Disk usage is above 85% on {{ $labels.mountpoint }}"
# HAProxy backend down
- alert: HAProxyBackendDown
expr: haproxy_backend_up == 0
for: 30s
labels:
severity: critical
annotations:
summary: "HAProxy backend {{ $labels.proxy }} is down"
description: "HAProxy backend {{ $labels.proxy }} has been down for more than 30 seconds"
# PostgreSQL down
- alert: PostgreSQLDown
expr: pg_up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "PostgreSQL is down on {{ $labels.instance }}"
description: "PostgreSQL has been down for more than 1 minute"
# High replication lag
- alert: PostgreSQLReplicationLag
expr: pg_stat_replication_lag > 60
for: 2m
labels:
severity: warning
annotations:
summary: "High PostgreSQL replication lag"
description: "Replication lag is {{ $value }} seconds"
# DRBD disconnected
- alert: DRBDDisconnected
expr: drbd_connected == 0
for: 30s
labels:
severity: critical
annotations:
summary: "DRBD resource {{ $labels.resource }} is disconnected"
description: "DRBD resource has been disconnected for more than 30 seconds"
# Cluster quorum lost
- alert: ClusterQuorumLost
expr: corosync_quorate == 0
for: 30s
labels:
severity: critical
annotations:
summary: "Cluster has lost quorum"
description: "Cluster quorum has been lost - services may be unavailable"
EOF
# Configure Alertmanager
cat > /etc/alertmanager/alertmanager.yml << 'EOF'
# Alertmanager Configuration
global:
smtp_smarthost: 'localhost:587'
smtp_from: '[email protected]'
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'web.hook'
routes:
- match:
severity: critical
receiver: 'critical-alerts'
- match:
severity: warning
receiver: 'warning-alerts'
receivers:
- name: 'web.hook'
webhook_configs:
- url: 'http://localhost:5001/'
- name: 'critical-alerts'
email_configs:
- to: '[email protected]'
subject: 'CRITICAL: {{ .GroupLabels.alertname }}'
body: |
{{ range .Alerts }}
Alert: {{ .Annotations.summary }}
Description: {{ .Annotations.description }}
{{ end }}
webhook_configs:
- url: 'http://localhost:5001/critical'
- name: 'warning-alerts'
email_configs:
- to: '[email protected]'
subject: 'WARNING: {{ .GroupLabels.alertname }}'
body: |
{{ range .Alerts }}
Alert: {{ .Annotations.summary }}
Description: {{ .Annotations.description }}
{{ end }}
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'instance']
EOF
# Configure Grafana dashboards
mkdir -p /var/lib/grafana/dashboards
cat > /var/lib/grafana/dashboards/ha-cluster-dashboard.json << 'EOF'
{
"dashboard": {
"id": null,
"title": "HA Cluster Overview",
"tags": ["ha", "cluster"],
"timezone": "browser",
"panels": [
{
"id": 1,
"title": "Cluster Nodes Status",
"type": "stat",
"targets": [
{
"expr": "up",
"legendFormat": "{{instance}}"
}
]
},
{
"id": 2,
"title": "CPU Usage",
"type": "graph",
"targets": [
{
"expr": "100 - (avg by(instance) (irate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)",
"legendFormat": "{{instance}}"
}
]
},
{
"id": 3,
"title": "Memory Usage",
"type": "graph",
"targets": [
{
"expr": "(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100",
"legendFormat": "{{instance}}"
}
]
},
{
"id": 4,
"title": "HAProxy Backend Status",
"type": "table",
"targets": [
{
"expr": "haproxy_backend_up",
"legendFormat": "{{proxy}}"
}
]
}
],
"time": {
"from": "now-1h",
"to": "now"
},
"refresh": "5s"
}
}
EOF
# Enable and start monitoring services
rc-update add prometheus default
rc-update add alertmanager default
rc-update add grafana-server default
rc-update add prometheus-node-exporter default
service prometheus start
service alertmanager start
service grafana-server start
service prometheus-node-exporter start
echo "Comprehensive monitoring configured! π"
What this does: π Sets up complete monitoring stack with alerting for all HA components.
Example output:
Comprehensive monitoring configured! π
What this means: Enterprise-grade monitoring is active with intelligent alerting! β
π Youβre All Set!
Congratulations! Youβve successfully configured a complete High Availability system on Alpine Linux! π
What Youβve Accomplished:
β
Cluster Foundation - Corosync and Pacemaker for intelligent resource management
β
Load Balancing - HAProxy with SSL termination and health checking
β
Virtual IP Failover - Keepalived with automatic VIP management
β
Database HA - PostgreSQL streaming replication with automatic failover
β
Storage Replication - DRBD block-level replication and shared storage
β
Comprehensive Monitoring - Prometheus, Grafana, and intelligent alerting
Architecture Overview:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Virtual IPs β
β Web: 192.168.1.100 DB: 192.168.1.101 API: 192.168.1.102 β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββΌβββββββββββββββββββ
β β β
ββββββΌββββ ββββββΌββββ ββββββΌββββ
β Node 1 βββββββββββ€ Node 2 βββββββββββ€ Node 3 β
β(Master)β β(Standby)β β(Standby)β
ββββββββββ ββββββββββ ββββββββββ
β β β
ββββββΌββββ ββββββΌββββ ββββββΌββββ
β DRBD βββββββββββ€ DRBD βββββββββββ€ DRBD β
βPrimary β βSecondaryβ βSecondaryβ
ββββββββββ ββββββββββ ββββββββββ
Key Commands:
# Cluster management
pcs status
pcs resource show
pcs node standby node1
# Load balancer
service haproxy status
haproxy -c -f /etc/haproxy/haproxy.cfg
# Database failover
/opt/postgres-failover.sh
# Storage management
/opt/manage-shared-storage.sh status
drbdadm status
Next Steps:
- π§ Application Integration - Configure your applications for HA
- π Performance Tuning - Optimize for your specific workload
- π Security Hardening - Implement additional security measures
- π Disaster Recovery - Plan for site-wide failures
- π§ͺ Testing - Regularly test failover scenarios
Your enterprise-grade High Availability system is now operational with 99.99% uptime capability! ππ§