Deploying Time-Series Database with InfluxDB and Telegraf on AlmaLinux

Time-series databases have become essential for modern infrastructure monitoring, IoT data collection, and real-time analytics. This guide demonstrates deploying a production-ready InfluxDB 2.0 cluster with Telegraf collectors on AlmaLinux, covering everything from installation to advanced optimization techniques.

Understanding Time-Series Data Architecture

Time-series databases are optimized for handling time-stamped data with these characteristics:

High write throughput: Millions of data points per second
Efficient compression: Specialized algorithms for time-series data
Fast aggregation queries: Built-in functions for time-based analysis
Automatic data retention: Age-based data lifecycle management
Real-time processing: Stream processing capabilities

InfluxDB 2.0 Architecture

Key components include:

Storage Engine: TSM (Time-Structured Merge Tree)
Query Engine: Flux language for powerful data analysis
Task Engine: Automated data processing
API Layer: RESTful API and client libraries
UI Dashboard: Built-in visualization and management

Prerequisites

Before deploying InfluxDB and Telegraf:

AlmaLinux 9 server with 8GB+ RAM
SSD storage (recommended for performance)
Network connectivity for data sources
Basic understanding of time-series concepts
SSL certificates for secure endpoints

Installing InfluxDB 2.0

System Preparation

# Update system packages
sudo dnf update -y

# Install required dependencies
sudo dnf install -y wget curl gnupg2

# Configure system limits for InfluxDB
cat <<EOF | sudo tee /etc/security/limits.d/influxdb.conf
influxdb soft nofile 65536
influxdb hard nofile 65536
influxdb soft nproc 32768
influxdb hard nproc 32768
EOF

# Configure kernel parameters
cat <<EOF | sudo tee /etc/sysctl.d/99-influxdb.conf
vm.swappiness = 1
vm.max_map_count = 262144
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.core.rmem_default = 134217728
net.core.wmem_default = 134217728
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
EOF

sudo sysctl -p /etc/sysctl.d/99-influxdb.conf

InfluxDB Installation

# Add InfluxDB repository
cat <<EOF | sudo tee /etc/yum.repos.d/influxdb.repo
[influxdb]
name = InfluxDB Repository - RHEL
baseurl = https://repos.influxdata.com/rhel/9/x86_64/stable
enabled = 1
gpgcheck = 1
gpgkey = https://repos.influxdata.com/influxdata-archive_compat.key
EOF

# Install InfluxDB
sudo dnf install -y influxdb2 influxdb2-cli

# Enable and start service
sudo systemctl enable influxdb
sudo systemctl start influxdb

# Verify installation
influx version

Initial Setup and Configuration

# Setup InfluxDB with initial user and organization
influx setup \
  --org my-org \
  --bucket metrics \
  --username admin \
  --password SecurePassword123! \
  --token my-super-secret-auth-token \
  --force

# Configure InfluxDB
sudo tee /etc/influxdb/config.toml <<EOF
# InfluxDB 2.0 Configuration

[meta]
  dir = "/var/lib/influxdb/meta"

[data]
  dir = "/var/lib/influxdb/data"
  wal-dir = "/var/lib/influxdb/wal"
  
  # Compaction settings
  compact-full-write-cold-duration = "4h"
  max-concurrent-compactions = 2
  
  # Cache settings
  cache-max-memory-size = "2g"
  cache-snapshot-memory-size = "256m"

[http]
  bind-address = ":8086"
  auth-enabled = true
  log-enabled = true
  write-tracing = false
  https-enabled = true
  https-certificate = "/etc/influxdb/cert.pem"
  https-private-key = "/etc/influxdb/key.pem"
  
  # Rate limiting
  rate-limit = 1000
  burst-limit = 10000
  
  # CORS
  access-control-allow-origin = "*"
  access-control-allow-methods = "GET, POST, PUT, DELETE, OPTIONS"

[storage-engine]
  # TSM engine settings
  cache-max-memory-size = "1g"
  cache-snapshot-memory-size = "25m"
  compact-throughput = "48m"
  max-concurrent-compactions = 3
  
[coordinator]
  write-timeout = "10s"
  max-concurrent-queries = 0
  query-timeout = "0s"
  max-select-point = 0
  max-select-series = 0
  max-select-buckets = 0

[retention]
  check-interval = "30m"
EOF

# Restart with new configuration
sudo systemctl restart influxdb

Setting Up SSL/TLS Security

# Generate self-signed certificate (for testing)
sudo mkdir -p /etc/influxdb/ssl
cd /etc/influxdb/ssl

sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
  -keyout /etc/influxdb/key.pem \
  -out /etc/influxdb/cert.pem \
  -subj "/C=US/ST=State/L=City/O=Organization/CN=influxdb.example.com"

# Set proper permissions
sudo chown influxdb:influxdb /etc/influxdb/*.pem
sudo chmod 600 /etc/influxdb/key.pem
sudo chmod 644 /etc/influxdb/cert.pem

# Configure firewall
sudo firewall-cmd --permanent --add-port=8086/tcp
sudo firewall-cmd --reload

Installing and Configuring Telegraf

Telegraf Installation

# Install Telegraf from InfluxDB repository
sudo dnf install -y telegraf

# Create base configuration
telegraf config > /etc/telegraf/telegraf.conf

# Enable and start Telegraf
sudo systemctl enable telegraf
sudo systemctl start telegraf

Comprehensive Telegraf Configuration

# /etc/telegraf/telegraf.conf
[global_tags]
  datacenter = "dc1"
  environment = "production"
  region = "us-east"

[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 5000
  metric_buffer_limit = 50000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = "0s"
  hostname = ""
  omit_hostname = false
  debug = false
  quiet = false

# Output to InfluxDB v2
[[outputs.influxdb_v2]]
  urls = ["https://localhost:8086"]
  token = "my-super-secret-auth-token"
  organization = "my-org"
  bucket = "metrics"
  
  ## Optional SSL Config
  insecure_skip_verify = false
  
  ## Timeout settings
  timeout = "5s"
  
  ## Batching
  metric_batch_size = 5000
  metric_buffer_limit = 50000

# System metrics collection
[[inputs.cpu]]
  percpu = true
  totalcpu = true
  collect_cpu_time = false
  report_active = false

[[inputs.disk]]
  ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]

[[inputs.diskio]]
  skip_serial_number = false
  device_tags = ["ID_FS_TYPE", "ID_FS_USAGE"]

[[inputs.kernel]]
  # no configuration

[[inputs.mem]]
  # no configuration

[[inputs.net]]
  interfaces = ["eth*", "eno*"]
  ignore_protocol_stats = false

[[inputs.processes]]
  # no configuration

[[inputs.swap]]
  # no configuration

[[inputs.system]]
  # no configuration

# Docker monitoring
[[inputs.docker]]
  endpoint = "unix:///var/run/docker.sock"
  gather_services = false
  container_names = []
  source_tag = false
  container_name_include = []
  container_name_exclude = []
  timeout = "5s"
  perdevice = true
  total = false

# Network monitoring
[[inputs.ping]]
  urls = ["google.com", "1.1.1.1", "8.8.8.8"]
  count = 3
  ping_interval = 1.0
  timeout = 1.0
  deadline = 10

# HTTP endpoint monitoring
[[inputs.http_response]]
  address = "https://example.com"
  response_timeout = "5s"
  method = "GET"
  follow_redirects = false
  response_status_code = 200

# Custom application metrics
[[inputs.prometheus]]
  urls = ["http://localhost:9090/metrics"]
  metric_version = 2
  response_timeout = "5s"

# Log parsing
[[inputs.tail]]
  files = ["/var/log/nginx/access.log"]
  from_beginning = false
  watch_method = "inotify"
  
  # Grok parsing
  grok_patterns = ["%{COMBINED_LOG_FORMAT}"]
  data_format = "grok"
  
  # Add custom tags
  [inputs.tail.tags]
    logtype = "nginx_access"

# SNMP monitoring
[[inputs.snmp]]
  agents = ["udp://192.168.1.1:161"]
  version = 2
  community = "public"
  
  [[inputs.snmp.field]]
    name = "hostname"
    oid = "RFC1213-MIB::sysName.0"
    
  [[inputs.snmp.field]]
    name = "uptime"
    oid = "RFC1213-MIB::sysUpTime.0"

Creating Custom Telegraf Plugins

Custom Exec Plugin

# Create custom metric collection script
sudo tee /usr/local/bin/custom_metrics.sh <<'EOF'
#!/bin/bash

# Collect custom application metrics
app_connections=$(ss -tan | grep :8080 | wc -l)
app_memory=$(ps aux | grep myapp | awk '{sum+=$6} END {print sum}')
app_cpu=$(ps aux | grep myapp | awk '{sum+=$3} END {print sum}')

# Output in InfluxDB line protocol format
echo "custom_app,host=$(hostname) connections=${app_connections}i,memory=${app_memory}i,cpu=${app_cpu}"
EOF

sudo chmod +x /usr/local/bin/custom_metrics.sh

# Add to Telegraf configuration
cat <<EOF | sudo tee -a /etc/telegraf/telegraf.conf

[[inputs.exec]]
  commands = ["/usr/local/bin/custom_metrics.sh"]
  timeout = "5s"
  data_format = "influx"
  interval = "30s"
EOF

Setting Up Data Retention Policies

Creating Retention Policies with Flux

# Create retention policy for different data types
influx bucket create \
  --name metrics-1d \
  --org my-org \
  --retention 1d

influx bucket create \
  --name metrics-7d \
  --org my-org \
  --retention 7d

influx bucket create \
  --name metrics-30d \
  --org my-org \
  --retention 30d

influx bucket create \
  --name metrics-1y \
  --org my-org \
  --retention 365d

Automated Downsampling Task

// Create downsampling task
option task = {
  name: "downsample-cpu-metrics",
  every: 1h,
}

from(bucket: "metrics")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement == "cpu")
  |> aggregateWindow(every: 1m, fn: mean, createEmpty: false)
  |> to(bucket: "metrics-7d", org: "my-org")

Advanced Flux Queries

Complex Analytics Query

// Calculate 95th percentile response time by service
from(bucket: "metrics")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement == "http_response_time")
  |> group(columns: ["service"])
  |> aggregateWindow(every: 5m, fn: (column, tables=<-) => 
      tables 
      |> quantile(q: 0.95, column: column)
    )
  |> yield(name: "p95_response_time")

// Anomaly detection using moving average
from(bucket: "metrics")
  |> range(start: -24h)
  |> filter(fn: (r) => r._measurement == "cpu" and r._field == "usage_idle")
  |> movingAverage(n: 10)
  |> map(fn: (r) => ({
      r with 
      anomaly: if r._value < r.moving_average - 20.0 then true else false
    }))
  |> filter(fn: (r) => r.anomaly == true)

Creating Alerts

// Alert task for high CPU usage
option task = {
  name: "cpu-alert",
  every: 1m,
}

from(bucket: "metrics")
  |> range(start: -5m)
  |> filter(fn: (r) => r._measurement == "cpu" and r._field == "usage_idle")
  |> aggregateWindow(every: 1m, fn: mean)
  |> map(fn: (r) => ({r with _value: 100.0 - r._value}))
  |> filter(fn: (r) => r._value > 80.0)
  |> each(fn: (r) => 
      sendAlert(
        url: "https://webhook.site/your-webhook",
        headers: {"Content-Type": "application/json"},
        data: {
          "alert": "High CPU Usage",
          "host": r.host,
          "value": r._value,
          "time": r._time
        }
      )
    )

Performance Optimization

Storage Optimization

# Configure optimal storage settings
sudo tee /etc/influxdb/storage-config.toml <<EOF
[storage]
  # WAL settings
  wal-fsync-delay = "100ms"
  max-concurrent-writes = 10
  
  # Compaction settings
  compact-full-write-cold-duration = "4h"
  compact-throughput = "48m"
  compact-throughput-burst = "96m"
  
  # Series cardinality limit
  max-series-per-database = 1000000
  max-values-per-tag = 100000
  
  # Cache settings
  cache-max-memory-size = "2g"
  cache-snapshot-memory-size = "256m"
  cache-snapshot-write-cold-duration = "10m"
EOF

# Monitor storage performance
cat <<'EOF' | sudo tee /usr/local/bin/monitor-influx-storage.sh
#!/bin/bash

# Check TSM file sizes
echo "TSM File Analysis:"
find /var/lib/influxdb/data -name "*.tsm" -type f -exec ls -lh {} \; | \
  awk '{size=$5; total+=size; count++} END {print "Total TSM files:", count, "Total size:", total}'

# Check compaction queue
echo -e "\nCompaction Status:"
influx query 'from(bucket: "_monitoring") 
  |> range(start: -1h) 
  |> filter(fn: (r) => r._measurement == "storage_compactions_queued")
  |> last()'

# Check write throughput
echo -e "\nWrite Performance:"
influx query 'from(bucket: "_monitoring") 
  |> range(start: -1h) 
  |> filter(fn: (r) => r._measurement == "storage_write_ok")
  |> aggregateWindow(every: 1m, fn: rate)'
EOF

sudo chmod +x /usr/local/bin/monitor-influx-storage.sh

Query Performance Tuning

# /etc/influxdb/query-config.toml
[query]
  # Query limits
  max-memory-bytes = 0
  queue-size = 1024
  max-concurrent-queries = 1024
  
  # Query timeout
  query-timeout = "60s"
  
  # Log slow queries
  log-queries-after = "5s"
  
  # Index settings
  max-select-point = 0
  max-select-series = 0
  max-select-buckets = 0

High Availability Setup

InfluxDB Clustering (Enterprise)

# influxdb-cluster.yaml
# Note: Clustering requires InfluxDB Enterprise
cluster:
  meta-nodes:
    - influx-meta-1:8091
    - influx-meta-2:8091
    - influx-meta-3:8091
  
  data-nodes:
    - influx-data-1:8088
    - influx-data-2:8088
    - influx-data-3:8088
    - influx-data-4:8088
  
  replication-factor: 2
  
  # Anti-entropy settings
  anti-entropy:
    enabled: true
    check-interval: "30s"
    max-fetch: 10

Load Balancing with HAProxy

# Install HAProxy
sudo dnf install -y haproxy

# Configure HAProxy for InfluxDB
sudo tee /etc/haproxy/haproxy.cfg <<EOF
global
    log         127.0.0.1 local0
    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     4000
    user        haproxy
    group       haproxy
    daemon

defaults
    mode                    tcp
    log                     global
    option                  tcplog
    option                  dontlognull
    option                  redispatch
    retries                 3
    timeout queue           1m
    timeout connect         10s
    timeout client          1m
    timeout server          1m
    timeout check           10s

frontend influxdb_write
    bind *:8086
    default_backend influxdb_write_backend

backend influxdb_write_backend
    balance roundrobin
    option httpchk GET /health
    server influx1 192.168.1.10:8086 check
    server influx2 192.168.1.11:8086 check
    server influx3 192.168.1.12:8086 check

frontend influxdb_query
    bind *:8087
    default_backend influxdb_query_backend

backend influxdb_query_backend
    balance leastconn
    option httpchk GET /health
    server influx1 192.168.1.10:8086 check
    server influx2 192.168.1.11:8086 check
    server influx3 192.168.1.12:8086 check
EOF

# Start HAProxy
sudo systemctl enable haproxy
sudo systemctl start haproxy

Integration with Visualization Tools

Grafana Integration

# Install Grafana
sudo dnf install -y grafana

# Configure Grafana data source
cat <<EOF | sudo tee /etc/grafana/provisioning/datasources/influxdb.yaml
apiVersion: 1

datasources:
  - name: InfluxDB
    type: influxdb
    access: proxy
    url: https://localhost:8086
    jsonData:
      version: Flux
      organization: my-org
      defaultBucket: metrics
      tlsSkipVerify: true
    secureJsonData:
      token: my-super-secret-auth-token
EOF

# Start Grafana
sudo systemctl enable grafana-server
sudo systemctl start grafana-server

Creating Dashboards with Flux

{
  "dashboard": {
    "title": "System Metrics Dashboard",
    "panels": [
      {
        "title": "CPU Usage",
        "targets": [
          {
            "query": "from(bucket: \"metrics\")\n  |> range(start: -1h)\n  |> filter(fn: (r) => r._measurement == \"cpu\" and r._field == \"usage_idle\")\n  |> map(fn: (r) => ({r with _value: 100.0 - r._value}))\n  |> aggregateWindow(every: 1m, fn: mean)"
          }
        ]
      },
      {
        "title": "Memory Usage",
        "targets": [
          {
            "query": "from(bucket: \"metrics\")\n  |> range(start: -1h)\n  |> filter(fn: (r) => r._measurement == \"mem\" and r._field == \"used_percent\")\n  |> aggregateWindow(every: 1m, fn: mean)"
          }
        ]
      }
    ]
  }
}

Monitoring and Maintenance

Automated Backup Script

#!/bin/bash
# /usr/local/bin/backup-influxdb.sh

BACKUP_DIR="/backup/influxdb"
RETENTION_DAYS=7
DATE=$(date +%Y%m%d_%H%M%S)

# Create backup directory
mkdir -p ${BACKUP_DIR}

# Backup InfluxDB
influx backup ${BACKUP_DIR}/backup_${DATE} \
  --token my-super-secret-auth-token

# Compress backup
tar -czf ${BACKUP_DIR}/backup_${DATE}.tar.gz \
  -C ${BACKUP_DIR} backup_${DATE}

# Remove uncompressed backup
rm -rf ${BACKUP_DIR}/backup_${DATE}

# Clean old backups
find ${BACKUP_DIR} -name "backup_*.tar.gz" -mtime +${RETENTION_DAYS} -delete

# Verify backup
if [ -f ${BACKUP_DIR}/backup_${DATE}.tar.gz ]; then
    echo "Backup completed successfully: backup_${DATE}.tar.gz"
else
    echo "Backup failed!" >&2
    exit 1
fi

Health Monitoring

# Create health check script
cat <<'EOF' | sudo tee /usr/local/bin/check-influxdb-health.sh
#!/bin/bash

# Check InfluxDB service
if ! systemctl is-active --quiet influxdb; then
    echo "CRITICAL: InfluxDB service is not running"
    exit 2
fi

# Check API endpoint
if ! curl -s -k https://localhost:8086/health | grep -q '"status":"pass"'; then
    echo "CRITICAL: InfluxDB API health check failed"
    exit 2
fi

# Check disk space
DISK_USAGE=$(df -h /var/lib/influxdb | awk 'NR==2 {print $5}' | sed 's/%//')
if [ $DISK_USAGE -gt 80 ]; then
    echo "WARNING: Disk usage is ${DISK_USAGE}%"
    exit 1
fi

echo "OK: InfluxDB is healthy"
exit 0
EOF

sudo chmod +x /usr/local/bin/check-influxdb-health.sh

# Add to crontab
echo "*/5 * * * * /usr/local/bin/check-influxdb-health.sh" | sudo crontab -

Conclusion

Deploying InfluxDB and Telegraf on AlmaLinux creates a powerful time-series data platform capable of handling millions of metrics per second. With proper configuration, retention policies, and monitoring, this setup provides a robust foundation for infrastructure monitoring, IoT data collection, and real-time analytics.

The combination of InfluxDB’s efficient storage engine, Flux’s powerful query language, and Telegraf’s extensive plugin ecosystem makes this solution ideal for organizations requiring comprehensive time-series data management at scale.

Deploying Time-Series Database with InfluxDB and Telegraf on AlmaLinux

Table of Contents

Understanding Time-Series Data Architecture

InfluxDB 2.0 Architecture

Prerequisites

Installing InfluxDB 2.0

System Preparation

InfluxDB Installation

Initial Setup and Configuration

Setting Up SSL/TLS Security

Installing and Configuring Telegraf

Telegraf Installation

Comprehensive Telegraf Configuration

Creating Custom Telegraf Plugins

Custom Exec Plugin

Setting Up Data Retention Policies

Creating Retention Policies with Flux

Automated Downsampling Task

Advanced Flux Queries

Complex Analytics Query

Creating Alerts

Performance Optimization

Storage Optimization

Query Performance Tuning

High Availability Setup

InfluxDB Clustering (Enterprise)

Load Balancing with HAProxy

Integration with Visualization Tools

Grafana Integration

Creating Dashboards with Flux

Monitoring and Maintenance

Automated Backup Script

Health Monitoring

Conclusion

Share this article

Deploying Time-Series Database with InfluxDB and Telegraf on AlmaLinux

Table of Contents

Understanding Time-Series Data Architecture

InfluxDB 2.0 Architecture

Prerequisites

Installing InfluxDB 2.0

System Preparation

InfluxDB Installation

Initial Setup and Configuration

Setting Up SSL/TLS Security

Installing and Configuring Telegraf

Telegraf Installation

Comprehensive Telegraf Configuration

Creating Custom Telegraf Plugins

Custom Exec Plugin

Setting Up Data Retention Policies

Creating Retention Policies with Flux

Automated Downsampling Task

Advanced Flux Queries

Complex Analytics Query

Creating Alerts

Performance Optimization

Storage Optimization

Query Performance Tuning

High Availability Setup

InfluxDB Clustering (Enterprise)

Load Balancing with HAProxy

Integration with Visualization Tools

Grafana Integration

Creating Dashboards with Flux

Monitoring and Maintenance

Automated Backup Script

Health Monitoring

Conclusion

Share this article

Related Articles

Building a Monitoring Stack with Prometheus and Grafana on AlmaLinux

Implementing eBPF for Advanced System Observability on AlmaLinux

Deploying Vector Database with Milvus for AI Workloads on AlmaLinux

Scan QR Code