📊 Installing Prometheus and Grafana on Alpine Linux: Complete Monitoring Guide

Let’s build a powerful monitoring and visualization stack with Prometheus and Grafana on Alpine Linux! 🚀 This comprehensive tutorial shows you how to set up complete infrastructure monitoring with metrics collection, alerting, and beautiful dashboards. Perfect for DevOps teams and system administrators! 😊

🤔 What are Prometheus and Grafana?

Prometheus is a powerful monitoring and alerting system that collects metrics from your infrastructure, while Grafana provides stunning visualizations and dashboards for your data!

This monitoring stack is like:

🔍 Smart surveillance systems that watch over your entire infrastructure
📈 Business intelligence dashboards that turn raw data into insights
🚨 Early warning systems that alert you before problems become critical

🎯 What You Need

Before we start, you need:

✅ Alpine Linux system with sufficient resources (4GB+ RAM recommended)
✅ Understanding of system monitoring concepts and metrics
✅ Basic knowledge of networking and service configuration
✅ Root access for system service installation

📋 Step 1: Install and Configure Prometheus

Install Prometheus Server

Let’s install Prometheus, the metrics collection and storage engine! 😊

What we’re doing: Installing Prometheus server for comprehensive metrics collection and monitoring.

# Update package list
apk update

# Install Prometheus server
apk add prometheus

# Install additional monitoring tools
apk add prometheus-node-exporter prometheus-alertmanager

# Check Prometheus version
prometheus --version

# Check installation paths
ls -la /etc/prometheus/
ls -la /var/lib/prometheus/

# Create Prometheus user and directories
adduser -D -s /bin/false prometheus 2>/dev/null || true
mkdir -p /var/lib/prometheus
mkdir -p /etc/prometheus/rules
mkdir -p /etc/prometheus/file_sd
chown -R prometheus:prometheus /var/lib/prometheus /etc/prometheus

# Start Prometheus service
rc-service prometheus start

# Enable Prometheus to start at boot
rc-update add prometheus default

# Test Prometheus web interface
echo "Prometheus should be available at: http://localhost:9090"

What this does: 📖 Installs Prometheus with all necessary components for monitoring.

Example output:

prometheus, version 2.45.0 (branch: HEAD, revision: 8b2f6b4)
  build user:       builduser@buildhost
  build date:       20231124-14:56:23
  go version:       go1.21.5

What this means: Prometheus is installed and ready for configuration! ✅

Configure Prometheus for Production

Let’s create a comprehensive Prometheus configuration! 🎯

What we’re doing: Configuring Prometheus with targets, rules, and optimal settings for production monitoring.

# Backup original Prometheus configuration
cp /etc/prometheus/prometheus.yml /etc/prometheus/prometheus.yml.backup

# Create comprehensive Prometheus configuration
cat > /etc/prometheus/prometheus.yml << 'EOF'
# Prometheus Configuration for Complete Monitoring
global:
  scrape_interval: 15s          # How frequently to scrape targets
  evaluation_interval: 15s      # How frequently to evaluate rules
  external_labels:
    cluster: 'alpine-production'
    environment: 'production'

# Alerting configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - localhost:9093

# Rules configuration
rule_files:
  - "/etc/prometheus/rules/*.yml"

# Scrape configurations
scrape_configs:
  # Prometheus itself
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
    scrape_interval: 5s
    metrics_path: /metrics

  # Node Exporter for system metrics
  - job_name: 'node-exporter'
    static_configs:
      - targets: ['localhost:9100']
    scrape_interval: 10s
    metrics_path: /metrics

  # Alpine Linux specific monitoring
  - job_name: 'alpine-system'
    static_configs:
      - targets: ['localhost:9100']
    scrape_interval: 15s
    metric_relabel_configs:
      - source_labels: [__name__]
        regex: 'node_(cpu|memory|disk|network).*'
        target_label: __name__
        replacement: 'alpine_${1}'

  # Application monitoring
  - job_name: 'application-metrics'
    file_sd_configs:
      - files:
        - '/etc/prometheus/file_sd/applications.yml'
    scrape_interval: 30s

  # Docker container monitoring (if Docker is installed)
  - job_name: 'docker-containers'
    static_configs:
      - targets: ['localhost:9323']
    scrape_interval: 30s
    metrics_path: /metrics

  # Custom service monitoring
  - job_name: 'custom-services'
    file_sd_configs:
      - files:
        - '/etc/prometheus/file_sd/services.yml'
    scrape_interval: 60s

# Storage configuration
storage:
  tsdb:
    path: /var/lib/prometheus/data
    retention.time: 30d
    retention.size: 10GB
    wal-compression: true

# Web configuration
web:
  listen-address: '0.0.0.0:9090'
  max-connections: 512
  read-timeout: 30s
  external-url: 'http://localhost:9090'
  enable-lifecycle: true
  enable-admin-api: true
EOF

# Create application discovery configuration
cat > /etc/prometheus/file_sd/applications.yml << 'EOF'
# Application Service Discovery
- targets:
  - 'localhost:8080'
  - 'localhost:8081'
  labels:
    service: 'web-application'
    environment: 'production'
    team: 'backend'

- targets:
  - 'localhost:3000'
  labels:
    service: 'frontend-application'
    environment: 'production'
    team: 'frontend'
EOF

# Create services discovery configuration
cat > /etc/prometheus/file_sd/services.yml << 'EOF'
# Service Discovery for Custom Services
- targets:
  - 'localhost:6379'
  labels:
    service: 'redis'
    environment: 'production'
    type: 'database'

- targets:
  - 'localhost:11211'
  labels:
    service: 'memcached'
    environment: 'production'
    type: 'cache'
EOF

# Set proper ownership
chown -R prometheus:prometheus /etc/prometheus/

# Validate Prometheus configuration
promtool check config /etc/prometheus/prometheus.yml

# Restart Prometheus with new configuration
rc-service prometheus restart

echo "Prometheus configured for production monitoring! 📊"

What this creates: Production-ready Prometheus configuration with service discovery! ✅

Create Alerting Rules

Let’s set up intelligent alerting rules! 🚨

What we’re doing: Creating comprehensive alerting rules for system health, performance, and availability monitoring.

# Create alerting rules for system monitoring
cat > /etc/prometheus/rules/system-alerts.yml << 'EOF'
# System Alerting Rules for Alpine Linux
groups:
  - name: system.rules
    rules:
      # High CPU usage alert
      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
          service: system
        annotations:
          summary: "High CPU usage detected"
          description: "CPU usage is above 80% for more than 5 minutes on {{ $labels.instance }}"

      # High memory usage alert
      - alert: HighMemoryUsage
        expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 85
        for: 5m
        labels:
          severity: warning
          service: system
        annotations:
          summary: "High memory usage detected"
          description: "Memory usage is above 85% on {{ $labels.instance }}"

      # Low disk space alert
      - alert: LowDiskSpace
        expr: (node_filesystem_size_bytes - node_filesystem_free_bytes) / node_filesystem_size_bytes * 100 > 90
        for: 10m
        labels:
          severity: critical
          service: system
        annotations:
          summary: "Low disk space warning"
          description: "Disk usage is above 90% on {{ $labels.instance }} {{ $labels.mountpoint }}"

      # System load alert
      - alert: HighSystemLoad
        expr: node_load15 > 2
        for: 10m
        labels:
          severity: warning
          service: system
        annotations:
          summary: "High system load detected"
          description: "15-minute load average is {{ $value }} on {{ $labels.instance }}"

      # Instance down alert
      - alert: InstanceDown
        expr: up == 0
        for: 1m
        labels:
          severity: critical
          service: monitoring
        annotations:
          summary: "Instance is down"
          description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute"

  - name: application.rules
    rules:
      # Application response time alert
      - alert: HighResponseTime
        expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 1
        for: 5m
        labels:
          severity: warning
          service: application
        annotations:
          summary: "High application response time"
          description: "95th percentile response time is {{ $value }}s on {{ $labels.instance }}"

      # Error rate alert
      - alert: HighErrorRate
        expr: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) * 100 > 5
        for: 5m
        labels:
          severity: critical
          service: application
        annotations:
          summary: "High error rate detected"
          description: "Error rate is {{ $value }}% on {{ $labels.instance }}"

  - name: infrastructure.rules
    rules:
      # Redis connection alert
      - alert: RedisDown
        expr: up{job="redis"} == 0
        for: 1m
        labels:
          severity: critical
          service: redis
        annotations:
          summary: "Redis is down"
          description: "Redis service is not responding on {{ $labels.instance }}"

      # Network connectivity alert
      - alert: HighNetworkTraffic
        expr: rate(node_network_receive_bytes_total[5m]) > 100000000  # 100MB/s
        for: 10m
        labels:
          severity: warning
          service: network
        annotations:
          summary: "High network traffic detected"
          description: "Network receive traffic is {{ $value | humanize }}B/s on {{ $labels.instance }}"
EOF

# Create recording rules for performance optimization
cat > /etc/prometheus/rules/recording-rules.yml << 'EOF'
# Recording Rules for Performance Optimization
groups:
  - name: performance.rules
    interval: 30s
    rules:
      # CPU usage recording rule
      - record: instance:cpu_utilization:rate5m
        expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
        labels:
          metric_type: performance

      # Memory usage recording rule
      - record: instance:memory_utilization:ratio
        expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes
        labels:
          metric_type: performance

      # Disk I/O recording rule
      - record: instance:disk_io:rate5m
        expr: rate(node_disk_io_time_seconds_total[5m])
        labels:
          metric_type: performance

      # Network traffic recording rule
      - record: instance:network_traffic:rate5m
        expr: rate(node_network_receive_bytes_total[5m]) + rate(node_network_transmit_bytes_total[5m])
        labels:
          metric_type: performance

  - name: application.recording
    interval: 60s
    rules:
      # Request rate recording rule
      - record: application:request_rate:rate5m
        expr: rate(http_requests_total[5m])
        labels:
          metric_type: application

      # Error rate recording rule
      - record: application:error_rate:rate5m
        expr: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m])
        labels:
          metric_type: application
EOF

# Validate alerting rules
promtool check rules /etc/prometheus/rules/*.yml

# Set proper ownership for rules
chown -R prometheus:prometheus /etc/prometheus/rules/

# Reload Prometheus configuration
curl -X POST http://localhost:9090/-/reload

echo "Prometheus alerting rules configured! 🚨"

What this creates: Comprehensive alerting system for proactive monitoring! 🌟

🛠️ Step 2: Install Node Exporter

Configure Node Exporter for System Metrics

Let’s set up Node Exporter to collect detailed system metrics! 😊

What we’re doing: Installing and configuring Node Exporter for comprehensive system monitoring.

# Start Node Exporter service
rc-service prometheus-node-exporter start

# Enable Node Exporter to start at boot
rc-update add prometheus-node-exporter default

# Create Node Exporter configuration
cat > /etc/conf.d/prometheus-node-exporter << 'EOF'
# Node Exporter Configuration
NODE_EXPORTER_OPTS="--web.listen-address=0.0.0.0:9100 \
                    --path.procfs=/proc \
                    --path.sysfs=/sys \
                    --path.rootfs=/ \
                    --collector.filesystem.ignored-mount-points='^/(dev|proc|sys|var/lib/docker/.+)($|/)' \
                    --collector.filesystem.ignored-fs-types='^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$' \
                    --collector.textfile.directory=/var/lib/node_exporter/textfile_collector \
                    --collector.cpu \
                    --collector.diskstats \
                    --collector.filesystem \
                    --collector.loadavg \
                    --collector.meminfo \
                    --collector.netdev \
                    --collector.netstat \
                    --collector.stat \
                    --collector.time \
                    --collector.uname \
                    --collector.vmstat"
EOF

# Create textfile collector directory
mkdir -p /var/lib/node_exporter/textfile_collector
chown prometheus:prometheus /var/lib/node_exporter/textfile_collector

# Create custom metrics collection script
cat > /usr/local/bin/custom-metrics-collector.sh << 'EOF'
#!/bin/sh
# Custom Metrics Collector for Alpine Linux

TEXTFILE_DIR="/var/lib/node_exporter/textfile_collector"
TEMP_FILE=$(mktemp)

# Collect Alpine package information
echo "# HELP alpine_packages_total Total number of installed packages" >> $TEMP_FILE
echo "# TYPE alpine_packages_total gauge" >> $TEMP_FILE
PACKAGE_COUNT=$(apk info | wc -l)
echo "alpine_packages_total $PACKAGE_COUNT" >> $TEMP_FILE

# Collect service status
echo "# HELP alpine_service_status Service status (1=running, 0=stopped)" >> $TEMP_FILE
echo "# TYPE alpine_service_status gauge" >> $TEMP_FILE

SERVICES="sshd chronyd syslog prometheus"
for service in $SERVICES; do
    if rc-service $service status >/dev/null 2>&1; then
        echo "alpine_service_status{service=\"$service\"} 1" >> $TEMP_FILE
    else
        echo "alpine_service_status{service=\"$service\"} 0" >> $TEMP_FILE
    fi
done

# Collect system uptime in seconds
echo "# HELP alpine_uptime_seconds System uptime in seconds" >> $TEMP_FILE
echo "# TYPE alpine_uptime_seconds gauge" >> $TEMP_FILE
UPTIME=$(awk '{print $1}' /proc/uptime)
echo "alpine_uptime_seconds $UPTIME" >> $TEMP_FILE

# Collect temperature if available
if [ -r /sys/class/thermal/thermal_zone0/temp ]; then
    echo "# HELP alpine_temperature_celsius CPU temperature in Celsius" >> $TEMP_FILE
    echo "# TYPE alpine_temperature_celsius gauge" >> $TEMP_FILE
    TEMP=$(cat /sys/class/thermal/thermal_zone0/temp)
    TEMP_C=$(echo "$TEMP / 1000" | bc -l)
    echo "alpine_temperature_celsius $TEMP_C" >> $TEMP_FILE
fi

# Atomically move the file to the textfile directory
mv $TEMP_FILE $TEXTFILE_DIR/custom_metrics.prom
EOF

chmod +x /usr/local/bin/custom-metrics-collector.sh

# Create cron job for custom metrics
echo "*/1 * * * * /usr/local/bin/custom-metrics-collector.sh" | crontab -u prometheus -

# Restart Node Exporter with new configuration
rc-service prometheus-node-exporter restart

# Test Node Exporter endpoint
echo "Testing Node Exporter metrics..."
curl -s http://localhost:9100/metrics | head -20

echo "Node Exporter configured for system monitoring! 📈"

What this does: Sets up comprehensive system metrics collection with custom Alpine-specific metrics! ✅

🎨 Step 3: Install and Configure Grafana

Install Grafana Visualization Platform

Let’s install Grafana for beautiful data visualization! 🎮

What we’re doing: Installing Grafana for creating stunning monitoring dashboards and visualizations.

# Install Grafana
apk add grafana

# Check Grafana version
grafana-server --version

# Create Grafana configuration directories
mkdir -p /etc/grafana/provisioning/{dashboards,datasources,notifiers}
mkdir -p /var/lib/grafana/{dashboards,plugins}
mkdir -p /var/log/grafana

# Set proper ownership
chown -R grafana:grafana /var/lib/grafana /var/log/grafana /etc/grafana

# Create Grafana configuration
cat > /etc/grafana/grafana.ini << 'EOF'
# Grafana Configuration for Alpine Linux Monitoring

[default]
# Instance name
instance_name = alpine-monitoring

[paths]
# Data directory
data = /var/lib/grafana
# Logs directory
logs = /var/log/grafana
# Plugins directory
plugins = /var/lib/grafana/plugins
# Provisioning directory
provisioning = /etc/grafana/provisioning

[server]
# Server settings
http_addr = 0.0.0.0
http_port = 3000
domain = localhost
root_url = http://localhost:3000/
serve_from_sub_path = false
router_logging = false
enable_gzip = true

[database]
# SQLite configuration for simplicity
type = sqlite3
host = 127.0.0.1:3306
name = grafana
user = root
password =
path = /var/lib/grafana/grafana.db
ssl_mode = disable

[session]
# Session configuration
provider = file
provider_config = sessions
cookie_name = grafana_sess
cookie_secure = false
session_life_time = 86400

[security]
# Security settings
admin_user = admin
admin_password = alpine_monitoring_2025
secret_key = alpine_grafana_secret_key_12345
login_remember_days = 7
cookie_username = grafana_user
cookie_remember_name = grafana_remember
disable_gravatar = true

[users]
# User management
allow_sign_up = false
allow_org_create = false
auto_assign_org = true
auto_assign_org_role = Viewer
default_theme = dark

[auth.anonymous]
# Anonymous access
enabled = false

[log]
# Logging configuration
mode = file
level = info
format = text

[log.file]
# File logging
log_rotate = true
max_lines = 1000000
max_size_shift = 28
daily_rotate = true
max_days = 7

[alerting]
# Alerting settings
enabled = true
execute_alerts = true

[unified_alerting]
# Unified alerting
enabled = true

[metrics]
# Metrics settings
enabled = true
interval_seconds = 10

[grafana_net]
url = https://grafana.net

[external_image_storage]
provider = local

[plugins]
# Plugin settings
enable_alpha = false
app_tls_skip_verify_insecure = false
EOF

# Start Grafana service
rc-service grafana start

# Enable Grafana to start at boot
rc-update add grafana default

echo "Grafana installed and configured! 🎨"
echo "Access Grafana at: http://localhost:3000"
echo "Default login: admin / alpine_monitoring_2025"

What this does: Installs and configures Grafana with optimal settings for Alpine Linux monitoring! 🌟

Configure Grafana Data Sources

Let’s set up Prometheus as a data source in Grafana! 🔧

What we’re doing: Configuring Grafana to connect to Prometheus and other data sources automatically.

# Create Prometheus data source configuration
cat > /etc/grafana/provisioning/datasources/prometheus.yml << 'EOF'
# Grafana Data Sources Configuration
apiVersion: 1

datasources:
  # Primary Prometheus data source
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://localhost:9090
    isDefault: true
    editable: true
    jsonData:
      httpMethod: POST
      queryTimeout: 60s
      timeInterval: 30s
      exemplarTraceIdDestinations:
        - name: traceID
          url: http://localhost:16686/trace/${__value.raw}
    secureJsonData: {}

  # Node Exporter metrics (direct access)
  - name: Node-Exporter
    type: prometheus
    access: proxy
    url: http://localhost:9100
    isDefault: false
    editable: true
    jsonData:
      httpMethod: GET
      queryTimeout: 30s
      timeInterval: 15s

  # TestData for examples and testing
  - name: TestData
    type: testdata
    access: proxy
    isDefault: false
    editable: true
EOF

# Create dashboard provisioning configuration
cat > /etc/grafana/provisioning/dashboards/default.yml << 'EOF'
# Dashboard Provisioning Configuration
apiVersion: 1

providers:
  # System monitoring dashboards
  - name: 'alpine-system'
    orgId: 1
    folder: 'System Monitoring'
    type: file
    disableDeletion: false
    updateIntervalSeconds: 10
    allowUiUpdates: true
    options:
      path: /var/lib/grafana/dashboards/system

  # Application monitoring dashboards
  - name: 'alpine-apps'
    orgId: 1
    folder: 'Applications'
    type: file
    disableDeletion: false
    updateIntervalSeconds: 10
    allowUiUpdates: true
    options:
      path: /var/lib/grafana/dashboards/applications

  # Infrastructure monitoring dashboards
  - name: 'alpine-infrastructure'
    orgId: 1
    folder: 'Infrastructure'
    type: file
    disableDeletion: false
    updateIntervalSeconds: 10
    allowUiUpdates: true
    options:
      path: /var/lib/grafana/dashboards/infrastructure
EOF

# Create dashboard directories
mkdir -p /var/lib/grafana/dashboards/{system,applications,infrastructure}

# Set proper ownership
chown -R grafana:grafana /etc/grafana/provisioning /var/lib/grafana/dashboards

# Restart Grafana to load new configuration
rc-service grafana restart

echo "Grafana data sources configured! 🔗"

What this creates: Automatic data source configuration for seamless monitoring! ✅

Create System Monitoring Dashboard

Let’s create a comprehensive system monitoring dashboard! 📊

What we’re doing: Creating a beautiful and functional dashboard for monitoring Alpine Linux system metrics.

# Create comprehensive system monitoring dashboard
cat > /var/lib/grafana/dashboards/system/alpine-system-overview.json << 'EOF'
{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": "-- Grafana --",
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "gnetId": null,
  "graphTooltip": 0,
  "id": null,
  "links": [],
  "panels": [
    {
      "datasource": "Prometheus",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "vis": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "percent"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 0
      },
      "id": 1,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom"
        },
        "tooltip": {
          "mode": "single"
        }
      },
      "targets": [
        {
          "expr": "100 - (avg by(instance) (irate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)",
          "interval": "",
          "legendFormat": "CPU Usage",
          "refId": "A"
        }
      ],
      "title": "CPU Usage",
      "type": "timeseries"
    },
    {
      "datasource": "Prometheus",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "vis": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "percent"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 12,
        "y": 0
      },
      "id": 2,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom"
        },
        "tooltip": {
          "mode": "single"
        }
      },
      "targets": [
        {
          "expr": "(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100",
          "interval": "",
          "legendFormat": "Memory Usage",
          "refId": "A"
        }
      ],
      "title": "Memory Usage",
      "type": "timeseries"
    },
    {
      "datasource": "Prometheus",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "thresholds"
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "yellow",
                "value": 70
              },
              {
                "color": "red",
                "value": 90
              }
            ]
          },
          "unit": "percent"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 6,
        "x": 0,
        "y": 8
      },
      "id": 3,
      "options": {
        "orientation": "auto",
        "reduceOptions": {
          "values": false,
          "calcs": [
            "lastNotNull"
          ],
          "fields": ""
        },
        "showThresholdLabels": false,
        "showThresholdMarkers": true,
        "text": {}
      },
      "pluginVersion": "8.0.0",
      "targets": [
        {
          "expr": "(node_filesystem_size_bytes{mountpoint=\"/\"} - node_filesystem_free_bytes{mountpoint=\"/\"}) / node_filesystem_size_bytes{mountpoint=\"/\"} * 100",
          "interval": "",
          "legendFormat": "Root Disk Usage",
          "refId": "A"
        }
      ],
      "title": "Disk Usage",
      "type": "gauge"
    },
    {
      "datasource": "Prometheus",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "thresholds"
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "yellow",
                "value": 1
              },
              {
                "color": "red",
                "value": 2
              }
            ]
          },
          "unit": "short"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 6,
        "x": 6,
        "y": 8
      },
      "id": 4,
      "options": {
        "orientation": "auto",
        "reduceOptions": {
          "values": false,
          "calcs": [
            "lastNotNull"
          ],
          "fields": ""
        },
        "showThresholdLabels": false,
        "showThresholdMarkers": true,
        "text": {}
      },
      "pluginVersion": "8.0.0",
      "targets": [
        {
          "expr": "node_load15",
          "interval": "",
          "legendFormat": "Load Average",
          "refId": "A"
        }
      ],
      "title": "System Load",
      "type": "gauge"
    },
    {
      "datasource": "Prometheus",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "vis": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "binBps"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 12,
        "y": 8
      },
      "id": 5,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom"
        },
        "tooltip": {
          "mode": "single"
        }
      },
      "targets": [
        {
          "expr": "rate(node_network_receive_bytes_total[5m])",
          "interval": "",
          "legendFormat": "{{device}} - Receive",
          "refId": "A"
        },
        {
          "expr": "rate(node_network_transmit_bytes_total[5m])",
          "interval": "",
          "legendFormat": "{{device}} - Transmit",
          "refId": "B"
        }
      ],
      "title": "Network Traffic",
      "type": "timeseries"
    }
  ],
  "schemaVersion": 30,
  "style": "dark",
  "tags": ["alpine", "system", "monitoring"],
  "templating": {
    "list": []
  },
  "time": {
    "from": "now-1h",
    "to": "now"
  },
  "timepicker": {},
  "timezone": "",
  "title": "Alpine Linux System Overview",
  "uid": "alpine-system-overview",
  "version": 1
}
EOF

# Set proper ownership
chown -R grafana:grafana /var/lib/grafana/dashboards/

# Restart Grafana to load dashboards
rc-service grafana restart

echo "System monitoring dashboard created! 📊"

What this creates: Beautiful system monitoring dashboard with key Alpine Linux metrics! 🌟

📊 Quick Monitoring Commands Table

Command	Purpose	Result
🔧 `promtool query instant 'up'`	Check target status	✅ Service availability
🔍 `curl localhost:9090/api/v1/targets`	View Prometheus targets	✅ Monitoring endpoints
🚀 `grafana-cli admin reset-admin-password admin`	Reset Grafana password	✅ Access recovery
📋 `curl localhost:9100/metrics \| grep cpu`	View Node Exporter CPU metrics	✅ System metrics

🎮 Practice Time!

Let’s practice what you learned! Try these monitoring scenarios:

Example 1: Application Performance Monitoring 🟢

What we’re doing: Setting up comprehensive application performance monitoring with custom metrics and alerting.

# Create application monitoring setup
mkdir -p /opt/app-monitoring
cd /opt/app-monitoring

# Create sample application with metrics endpoint
cat > app-metrics-server.py << 'EOF'
#!/usr/bin/env python3
"""
Sample Application with Prometheus Metrics
"""
import time
import random
from http.server import HTTPServer, BaseHTTPRequestHandler
from urllib.parse import urlparse

class MetricsHandler(BaseHTTPRequestHandler):
    def do_GET(self):
        path = urlparse(self.path).path
        
        if path == '/metrics':
            self.send_response(200)
            self.send_header('Content-type', 'text/plain')
            self.end_headers()
            
            # Generate sample metrics
            cpu_usage = random.uniform(10, 90)
            memory_usage = random.uniform(30, 80)
            request_count = random.randint(100, 1000)
            response_time = random.uniform(0.1, 2.0)
            
            metrics = f"""# HELP app_cpu_usage_percent Application CPU usage
# TYPE app_cpu_usage_percent gauge
app_cpu_usage_percent {cpu_usage:.2f}

# HELP app_memory_usage_percent Application memory usage
# TYPE app_memory_usage_percent gauge
app_memory_usage_percent {memory_usage:.2f}

# HELP app_requests_total Total application requests
# TYPE app_requests_total counter
app_requests_total {request_count}

# HELP app_response_time_seconds Application response time
# TYPE app_response_time_seconds histogram
app_response_time_seconds_bucket{{le="0.1"}} {random.randint(10, 50)}
app_response_time_seconds_bucket{{le="0.5"}} {random.randint(50, 150)}
app_response_time_seconds_bucket{{le="1.0"}} {random.randint(150, 300)}
app_response_time_seconds_bucket{{le="2.0"}} {random.randint(300, 500)}
app_response_time_seconds_bucket{{le="+Inf"}} {random.randint(500, 600)}
app_response_time_seconds_sum {response_time * request_count:.2f}
app_response_time_seconds_count {request_count}

# HELP app_errors_total Total application errors
# TYPE app_errors_total counter
app_errors_total {random.randint(0, 50)}

# HELP app_uptime_seconds Application uptime
# TYPE app_uptime_seconds gauge
app_uptime_seconds {time.time()}
"""
            self.wfile.write(metrics.encode())
            
        elif path == '/health':
            self.send_response(200)
            self.send_header('Content-type', 'text/plain')
            self.end_headers()
            self.wfile.write(b'OK')
            
        else:
            self.send_response(404)
            self.end_headers()
            self.wfile.write(b'Not Found')

if __name__ == '__main__':
    server = HTTPServer(('localhost', 8080), MetricsHandler)
    print("Application metrics server running on http://localhost:8080/metrics")
    server.serve_forever()
EOF

# Install Python if not available
apk add python3

# Make the script executable
chmod +x app-metrics-server.py

# Start the application in background
python3 app-metrics-server.py &
APP_PID=$!

# Update Prometheus to scrape this application
cat >> /etc/prometheus/prometheus.yml << 'EOF'

  # Sample application monitoring
  - job_name: 'sample-application'
    static_configs:
      - targets: ['localhost:8080']
    scrape_interval: 5s
    metrics_path: /metrics
EOF

# Reload Prometheus configuration
curl -X POST http://localhost:9090/-/reload

# Create application dashboard
cat > /var/lib/grafana/dashboards/applications/application-performance.json << 'EOF'
{
  "dashboard": {
    "title": "Application Performance Monitoring",
    "panels": [
      {
        "title": "Application CPU Usage",
        "type": "stat",
        "targets": [
          {
            "expr": "app_cpu_usage_percent",
            "legendFormat": "CPU Usage %"
          }
        ]
      },
      {
        "title": "Application Memory Usage",
        "type": "stat",
        "targets": [
          {
            "expr": "app_memory_usage_percent",
            "legendFormat": "Memory Usage %"
          }
        ]
      },
      {
        "title": "Request Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(app_requests_total[5m])",
            "legendFormat": "Requests/sec"
          }
        ]
      },
      {
        "title": "Response Time",
        "type": "graph",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(app_response_time_seconds_bucket[5m]))",
            "legendFormat": "95th percentile"
          }
        ]
      }
    ]
  }
}
EOF

echo "Application performance monitoring configured! 🎯"
echo "Check metrics at: http://localhost:8080/metrics"
echo "Application PID: $APP_PID (kill with: kill $APP_PID)"

What this does: Shows you how to monitor application performance with custom metrics! 🎯

Example 2: Infrastructure Alerting System 🟡

What we’re doing: Creating a comprehensive alerting system with multiple notification channels.

# Create advanced alerting configuration
mkdir -p /opt/alerting-system
cd /opt/alerting-system

# Install Alertmanager
apk add prometheus-alertmanager

# Create Alertmanager configuration
cat > /etc/prometheus/alertmanager.yml << 'EOF'
# Alertmanager Configuration for Infrastructure Monitoring
global:
  smtp_smarthost: 'localhost:587'
  smtp_from: '[email protected]'
  smtp_auth_username: 'alertmanager'
  smtp_auth_password: 'password'

templates:
  - '/etc/prometheus/templates/*.tmpl'

route:
  group_by: ['alertname', 'cluster', 'service']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'default'
  routes:
    # Critical alerts go to immediate notification
    - match:
        severity: critical
      receiver: 'critical-alerts'
      group_wait: 5s
      repeat_interval: 5m
    
    # Warning alerts go to standard notification
    - match:
        severity: warning
      receiver: 'warning-alerts'
      repeat_interval: 30m
    
    # System alerts
    - match:
        service: system
      receiver: 'system-alerts'

receivers:
  - name: 'default'
    webhook_configs:
      - url: 'http://localhost:9093/webhook'
        send_resolved: true

  - name: 'critical-alerts'
    email_configs:
      - to: '[email protected]'
        subject: '🚨 CRITICAL: {{ .GroupLabels.alertname }}'
        body: |
          {{ range .Alerts }}
          Alert: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          Severity: {{ .Labels.severity }}
          Instance: {{ .Labels.instance }}
          Time: {{ .StartsAt }}
          {{ end }}
    webhook_configs:
      - url: 'http://localhost:9093/webhook/critical'
        send_resolved: true

  - name: 'warning-alerts'
    email_configs:
      - to: '[email protected]'
        subject: '⚠️ WARNING: {{ .GroupLabels.alertname }}'
        body: |
          {{ range .Alerts }}
          Alert: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          Instance: {{ .Labels.instance }}
          {{ end }}

  - name: 'system-alerts'
    webhook_configs:
      - url: 'http://localhost:9093/webhook/system'
        send_resolved: true

inhibit_rules:
  # Inhibit warning alerts if critical alerts are firing
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'instance']
EOF

# Create alert notification webhook receiver
cat > alert-webhook-receiver.py << 'EOF'
#!/usr/bin/env python3
"""
Alert Webhook Receiver for Custom Notifications
"""
import json
import time
from http.server import HTTPServer, BaseHTTPRequestHandler
from urllib.parse import urlparse

class AlertWebhookHandler(BaseHTTPRequestHandler):
    def do_POST(self):
        path = urlparse(self.path).path
        content_length = int(self.headers['Content-Length'])
        post_data = self.rfile.read(content_length)
        
        try:
            alert_data = json.loads(post_data.decode('utf-8'))
            self.process_alert(alert_data, path)
            
            self.send_response(200)
            self.send_header('Content-type', 'application/json')
            self.end_headers()
            self.wfile.write(b'{"status": "ok"}')
            
        except Exception as e:
            print(f"Error processing alert: {e}")
            self.send_response(500)
            self.end_headers()
    
    def process_alert(self, alert_data, path):
        timestamp = time.strftime('%Y-%m-%d %H:%M:%S')
        
        print(f"\n{'='*50}")
        print(f"ALERT RECEIVED - {timestamp}")
        print(f"Webhook Path: {path}")
        print(f"{'='*50}")
        
        for alert in alert_data.get('alerts', []):
            status = alert.get('status', 'unknown')
            labels = alert.get('labels', {})
            annotations = alert.get('annotations', {})
            
            print(f"Status: {status}")
            print(f"Alert: {labels.get('alertname', 'Unknown')}")
            print(f"Severity: {labels.get('severity', 'Unknown')}")
            print(f"Instance: {labels.get('instance', 'Unknown')}")
            print(f"Summary: {annotations.get('summary', 'No summary')}")
            print(f"Description: {annotations.get('description', 'No description')}")
            
            if status == 'firing':
                print("🚨 ALERT IS FIRING!")
                self.log_to_file(alert, 'FIRING')
            elif status == 'resolved':
                print("✅ ALERT RESOLVED")
                self.log_to_file(alert, 'RESOLVED')
            
            print("-" * 30)
    
    def log_to_file(self, alert, status):
        timestamp = time.strftime('%Y-%m-%d %H:%M:%S')
        labels = alert.get('labels', {})
        annotations = alert.get('annotations', {})
        
        log_entry = {
            'timestamp': timestamp,
            'status': status,
            'alertname': labels.get('alertname'),
            'severity': labels.get('severity'),
            'instance': labels.get('instance'),
            'summary': annotations.get('summary'),
            'description': annotations.get('description')
        }
        
        with open('/var/log/alerts.log', 'a') as f:
            f.write(json.dumps(log_entry) + '\n')

if __name__ == '__main__':
    server = HTTPServer(('localhost', 9093), AlertWebhookHandler)
    print("Alert webhook receiver running on http://localhost:9093/webhook")
    print("Logs will be written to /var/log/alerts.log")
    server.serve_forever()
EOF

chmod +x alert-webhook-receiver.py

# Start Alertmanager
rc-service prometheus-alertmanager start
rc-update add prometheus-alertmanager default

# Start alert webhook receiver in background
python3 alert-webhook-receiver.py &
WEBHOOK_PID=$!

# Create alert testing script
cat > test-alerts.sh << 'EOF'
#!/bin/sh
echo "🧪 Testing Alert System"

# Test firing an alert
echo "Sending test alert..."
curl -X POST http://localhost:9093/api/v1/alerts \
  -H "Content-Type: application/json" \
  -d '[
    {
      "labels": {
        "alertname": "TestAlert",
        "severity": "warning",
        "instance": "localhost:9090",
        "service": "test"
      },
      "annotations": {
        "summary": "Test alert for monitoring system",
        "description": "This is a test alert to verify the alerting system is working correctly"
      },
      "startsAt": "'$(date -Iseconds)'"
    }
  ]'

echo "Alert sent! Check webhook receiver output and /var/log/alerts.log"
EOF

chmod +x test-alerts.sh

echo "Advanced alerting system configured! 🚨"
echo "Webhook receiver PID: $WEBHOOK_PID (kill with: kill $WEBHOOK_PID)"
echo "Test alerts with: ./test-alerts.sh"

What this does: Demonstrates comprehensive alerting with custom notification handling! 🚨

🚨 Fix Common Problems

Problem 1: Prometheus targets down ❌

What happened: Prometheus cannot scrape metrics from targets. How to fix it: Check network connectivity and service configuration.

# Check Prometheus targets status
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | {job: .labels.job, health: .health, lastError: .lastError}'

# Check service status
rc-service prometheus status
rc-service prometheus-node-exporter status

# Check network connectivity
netstat -tulpn | grep -E "(9090|9100)"

# Restart services if needed
rc-service prometheus restart
rc-service prometheus-node-exporter restart

Problem 2: Grafana dashboard not loading data ❌

What happened: Grafana cannot connect to Prometheus or display metrics. How to fix it: Verify data source configuration and queries.

# Test Prometheus connection from Grafana host
curl -s http://localhost:9090/api/v1/query?query=up

# Check Grafana logs
tail -f /var/log/grafana/grafana.log

# Restart Grafana service
rc-service grafana restart

# Test data source connectivity in Grafana UI
echo "Visit http://localhost:3000/datasources and test connections"

Don’t worry! Monitoring systems require fine-tuning - check connectivity and configurations systematically! 💪

💡 Simple Tips

Start with basic metrics 📅 - Begin with CPU, memory, disk, and network monitoring
Set meaningful alert thresholds 🌱 - Avoid alert fatigue with appropriate limits
Create actionable dashboards 🤝 - Focus on metrics that help with decision making
Regular maintenance 💪 - Monitor data retention and clean up old metrics

✅ Check Everything Works

Let’s verify your monitoring stack is working perfectly:

# Complete monitoring system verification
cat > /usr/local/bin/monitoring-stack-check.sh << 'EOF'
#!/bin/sh
echo "=== Monitoring Stack System Check ==="

echo "1. Prometheus Server:"
if curl -s http://localhost:9090/api/v1/query?query=up >/dev/null; then
    echo "✅ Prometheus is running and responding"
    prometheus_version=$(curl -s http://localhost:9090/api/v1/status/buildinfo | jq -r '.data.version')
    echo "Version: $prometheus_version"
    targets_up=$(curl -s http://localhost:9090/api/v1/query?query=up | jq '.data.result | length')
    echo "Active targets: $targets_up"
else
    echo "❌ Prometheus is not responding"
fi

echo -e "\n2. Node Exporter:"
if curl -s http://localhost:9100/metrics >/dev/null; then
    echo "✅ Node Exporter is running"
    metrics_count=$(curl -s http://localhost:9100/metrics | wc -l)
    echo "Metrics available: $metrics_count"
else
    echo "❌ Node Exporter is not responding"
fi

echo -e "\n3. Grafana:"
if curl -s http://localhost:3000/api/health >/dev/null; then
    echo "✅ Grafana is running"
    grafana_version=$(curl -s http://localhost:3000/api/health | jq -r '.version')
    echo "Version: $grafana_version"
else
    echo "❌ Grafana is not responding"
fi

echo -e "\n4. Alertmanager:"
if curl -s http://localhost:9093/api/v1/status >/dev/null; then
    echo "✅ Alertmanager is running"
    alertmanager_version=$(curl -s http://localhost:9093/api/v1/status | jq -r '.data.versionInfo.version')
    echo "Version: $alertmanager_version"
else
    echo "❌ Alertmanager is not responding"
fi

echo -e "\n5. Data Collection Test:"
echo "Testing metric collection..."

# Test CPU metric
cpu_metric=$(curl -s "http://localhost:9090/api/v1/query?query=100-(avg(irate(node_cpu_seconds_total{mode=\"idle\"}[5m]))*100)" | jq -r '.data.result[0].value[1]')
if [ "$cpu_metric" != "null" ]; then
    echo "✅ CPU metrics: ${cpu_metric}%"
else
    echo "❌ CPU metrics not available"
fi

# Test memory metric
memory_metric=$(curl -s "http://localhost:9090/api/v1/query?query=(node_memory_MemTotal_bytes-node_memory_MemAvailable_bytes)/node_memory_MemTotal_bytes*100" | jq -r '.data.result[0].value[1]')
if [ "$memory_metric" != "null" ]; then
    echo "✅ Memory metrics: ${memory_metric}%"
else
    echo "❌ Memory metrics not available"
fi

echo -e "\n6. Alert Rules:"
rules_count=$(curl -s http://localhost:9090/api/v1/rules | jq '.data.groups | length')
echo "Loaded rule groups: $rules_count"

echo -e "\n7. Dashboard Status:"
if [ -f "/var/lib/grafana/dashboards/system/alpine-system-overview.json" ]; then
    echo "✅ System dashboard available"
else
    echo "❌ System dashboard missing"
fi

echo -e "\n8. Access URLs:"
echo "🌐 Prometheus: http://localhost:9090"
echo "🌐 Grafana: http://localhost:3000 (admin/alpine_monitoring_2025)"
echo "🌐 Alertmanager: http://localhost:9093"
echo "🌐 Node Exporter: http://localhost:9100/metrics"

echo -e "\nMonitoring stack operational! ✅"
EOF

chmod +x /usr/local/bin/monitoring-stack-check.sh
/usr/local/bin/monitoring-stack-check.sh

Good output shows:

=== Monitoring Stack System Check ===
1. Prometheus Server:
✅ Prometheus is running and responding
Version: 2.45.0
Active targets: 3

2. Node Exporter:
✅ Node Exporter is running
Metrics available: 1247

3. Grafana:
✅ Grafana is running
Version: 9.5.2

Monitoring stack operational! ✅

🏆 What You Learned

Great job! Now you can:

✅ Install and configure Prometheus for comprehensive metrics collection
✅ Set up Node Exporter for detailed system monitoring
✅ Configure Grafana for beautiful data visualization
✅ Create custom dashboards and monitoring workflows
✅ Implement intelligent alerting with Alertmanager
✅ Set up service discovery and target management
✅ Create application performance monitoring solutions
✅ Build comprehensive infrastructure monitoring stacks
✅ Troubleshoot common monitoring issues and optimize performance

🎯 What’s Next?

Now you can try:

📚 Setting up distributed monitoring with multiple Prometheus instances
🛠️ Implementing custom exporters for specific applications
🤝 Integrating with external alerting systems (Slack, PagerDuty)
🌟 Exploring advanced Grafana features like annotations and variables!

Remember: Effective monitoring is the foundation of reliable systems! You’re now building world-class observability on Alpine Linux! 🎉

Keep monitoring and you’ll master infrastructure observability! 💫

📊 Installing Prometheus and Grafana on Alpine Linux: Complete Monitoring Guide

Table of Contents

📊 Installing Prometheus and Grafana on Alpine Linux: Complete Monitoring Guide

🤔 What are Prometheus and Grafana?

🎯 What You Need

📋 Step 1: Install and Configure Prometheus

Install Prometheus Server

Configure Prometheus for Production

Create Alerting Rules

🛠️ Step 2: Install Node Exporter

Configure Node Exporter for System Metrics

🎨 Step 3: Install and Configure Grafana

Install Grafana Visualization Platform

Configure Grafana Data Sources

Create System Monitoring Dashboard

📊 Quick Monitoring Commands Table

🎮 Practice Time!

Example 1: Application Performance Monitoring 🟢

Example 2: Infrastructure Alerting System 🟡

🚨 Fix Common Problems

Problem 1: Prometheus targets down ❌

Problem 2: Grafana dashboard not loading data ❌

💡 Simple Tips

✅ Check Everything Works

🏆 What You Learned

🎯 What’s Next?

Share this article

📊 Installing Prometheus and Grafana on Alpine Linux: Complete Monitoring Guide

Table of Contents

📊 Installing Prometheus and Grafana on Alpine Linux: Complete Monitoring Guide

🤔 What are Prometheus and Grafana?

🎯 What You Need

📋 Step 1: Install and Configure Prometheus

Install Prometheus Server

Configure Prometheus for Production

Create Alerting Rules

🛠️ Step 2: Install Node Exporter

Configure Node Exporter for System Metrics

🎨 Step 3: Install and Configure Grafana

Install Grafana Visualization Platform

Configure Grafana Data Sources

Create System Monitoring Dashboard

📊 Quick Monitoring Commands Table

🎮 Practice Time!

Example 1: Application Performance Monitoring 🟢

Example 2: Infrastructure Alerting System 🟡

🚨 Fix Common Problems

Problem 1: Prometheus targets down ❌

Problem 2: Grafana dashboard not loading data ❌

💡 Simple Tips

✅ Check Everything Works

🏆 What You Learned

🎯 What’s Next?

Share this article

Related Articles

Alpine Linux Performance Monitoring: Complete Setup Guide

📡 Telecom Infrastructure Management on Alpine Linux: Network Excellence

🐳 Creating Container Images with Alpine Linux as Base: Complete Guide

Scan QR Code