Time-series databases have become essential for modern infrastructure monitoring, IoT data collection, and real-time analytics. This guide demonstrates deploying a production-ready InfluxDB 2.0 cluster with Telegraf collectors on AlmaLinux, covering everything from installation to advanced optimization techniques.
Understanding Time-Series Data Architecture
Time-series databases are optimized for handling time-stamped data with these characteristics:
- High write throughput: Millions of data points per second
- Efficient compression: Specialized algorithms for time-series data
- Fast aggregation queries: Built-in functions for time-based analysis
- Automatic data retention: Age-based data lifecycle management
- Real-time processing: Stream processing capabilities
InfluxDB 2.0 Architecture
Key components include:
- Storage Engine: TSM (Time-Structured Merge Tree)
- Query Engine: Flux language for powerful data analysis
- Task Engine: Automated data processing
- API Layer: RESTful API and client libraries
- UI Dashboard: Built-in visualization and management
Prerequisites
Before deploying InfluxDB and Telegraf:
- AlmaLinux 9 server with 8GB+ RAM
- SSD storage (recommended for performance)
- Network connectivity for data sources
- Basic understanding of time-series concepts
- SSL certificates for secure endpoints
Installing InfluxDB 2.0
System Preparation
# Update system packages
sudo dnf update -y
# Install required dependencies
sudo dnf install -y wget curl gnupg2
# Configure system limits for InfluxDB
cat <<EOF | sudo tee /etc/security/limits.d/influxdb.conf
influxdb soft nofile 65536
influxdb hard nofile 65536
influxdb soft nproc 32768
influxdb hard nproc 32768
EOF
# Configure kernel parameters
cat <<EOF | sudo tee /etc/sysctl.d/99-influxdb.conf
vm.swappiness = 1
vm.max_map_count = 262144
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.core.rmem_default = 134217728
net.core.wmem_default = 134217728
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
EOF
sudo sysctl -p /etc/sysctl.d/99-influxdb.conf
InfluxDB Installation
# Add InfluxDB repository
cat <<EOF | sudo tee /etc/yum.repos.d/influxdb.repo
[influxdb]
name = InfluxDB Repository - RHEL
baseurl = https://repos.influxdata.com/rhel/9/x86_64/stable
enabled = 1
gpgcheck = 1
gpgkey = https://repos.influxdata.com/influxdata-archive_compat.key
EOF
# Install InfluxDB
sudo dnf install -y influxdb2 influxdb2-cli
# Enable and start service
sudo systemctl enable influxdb
sudo systemctl start influxdb
# Verify installation
influx version
Initial Setup and Configuration
# Setup InfluxDB with initial user and organization
influx setup \
--org my-org \
--bucket metrics \
--username admin \
--password SecurePassword123! \
--token my-super-secret-auth-token \
--force
# Configure InfluxDB
sudo tee /etc/influxdb/config.toml <<EOF
# InfluxDB 2.0 Configuration
[meta]
dir = "/var/lib/influxdb/meta"
[data]
dir = "/var/lib/influxdb/data"
wal-dir = "/var/lib/influxdb/wal"
# Compaction settings
compact-full-write-cold-duration = "4h"
max-concurrent-compactions = 2
# Cache settings
cache-max-memory-size = "2g"
cache-snapshot-memory-size = "256m"
[http]
bind-address = ":8086"
auth-enabled = true
log-enabled = true
write-tracing = false
https-enabled = true
https-certificate = "/etc/influxdb/cert.pem"
https-private-key = "/etc/influxdb/key.pem"
# Rate limiting
rate-limit = 1000
burst-limit = 10000
# CORS
access-control-allow-origin = "*"
access-control-allow-methods = "GET, POST, PUT, DELETE, OPTIONS"
[storage-engine]
# TSM engine settings
cache-max-memory-size = "1g"
cache-snapshot-memory-size = "25m"
compact-throughput = "48m"
max-concurrent-compactions = 3
[coordinator]
write-timeout = "10s"
max-concurrent-queries = 0
query-timeout = "0s"
max-select-point = 0
max-select-series = 0
max-select-buckets = 0
[retention]
check-interval = "30m"
EOF
# Restart with new configuration
sudo systemctl restart influxdb
Setting Up SSL/TLS Security
# Generate self-signed certificate (for testing)
sudo mkdir -p /etc/influxdb/ssl
cd /etc/influxdb/ssl
sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
-keyout /etc/influxdb/key.pem \
-out /etc/influxdb/cert.pem \
-subj "/C=US/ST=State/L=City/O=Organization/CN=influxdb.example.com"
# Set proper permissions
sudo chown influxdb:influxdb /etc/influxdb/*.pem
sudo chmod 600 /etc/influxdb/key.pem
sudo chmod 644 /etc/influxdb/cert.pem
# Configure firewall
sudo firewall-cmd --permanent --add-port=8086/tcp
sudo firewall-cmd --reload
Installing and Configuring Telegraf
Telegraf Installation
# Install Telegraf from InfluxDB repository
sudo dnf install -y telegraf
# Create base configuration
telegraf config > /etc/telegraf/telegraf.conf
# Enable and start Telegraf
sudo systemctl enable telegraf
sudo systemctl start telegraf
Comprehensive Telegraf Configuration
# /etc/telegraf/telegraf.conf
[global_tags]
datacenter = "dc1"
environment = "production"
region = "us-east"
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 5000
metric_buffer_limit = 50000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = "0s"
hostname = ""
omit_hostname = false
debug = false
quiet = false
# Output to InfluxDB v2
[[outputs.influxdb_v2]]
urls = ["https://localhost:8086"]
token = "my-super-secret-auth-token"
organization = "my-org"
bucket = "metrics"
## Optional SSL Config
insecure_skip_verify = false
## Timeout settings
timeout = "5s"
## Batching
metric_batch_size = 5000
metric_buffer_limit = 50000
# System metrics collection
[[inputs.cpu]]
percpu = true
totalcpu = true
collect_cpu_time = false
report_active = false
[[inputs.disk]]
ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]
[[inputs.diskio]]
skip_serial_number = false
device_tags = ["ID_FS_TYPE", "ID_FS_USAGE"]
[[inputs.kernel]]
# no configuration
[[inputs.mem]]
# no configuration
[[inputs.net]]
interfaces = ["eth*", "eno*"]
ignore_protocol_stats = false
[[inputs.processes]]
# no configuration
[[inputs.swap]]
# no configuration
[[inputs.system]]
# no configuration
# Docker monitoring
[[inputs.docker]]
endpoint = "unix:///var/run/docker.sock"
gather_services = false
container_names = []
source_tag = false
container_name_include = []
container_name_exclude = []
timeout = "5s"
perdevice = true
total = false
# Network monitoring
[[inputs.ping]]
urls = ["google.com", "1.1.1.1", "8.8.8.8"]
count = 3
ping_interval = 1.0
timeout = 1.0
deadline = 10
# HTTP endpoint monitoring
[[inputs.http_response]]
address = "https://example.com"
response_timeout = "5s"
method = "GET"
follow_redirects = false
response_status_code = 200
# Custom application metrics
[[inputs.prometheus]]
urls = ["http://localhost:9090/metrics"]
metric_version = 2
response_timeout = "5s"
# Log parsing
[[inputs.tail]]
files = ["/var/log/nginx/access.log"]
from_beginning = false
watch_method = "inotify"
# Grok parsing
grok_patterns = ["%{COMBINED_LOG_FORMAT}"]
data_format = "grok"
# Add custom tags
[inputs.tail.tags]
logtype = "nginx_access"
# SNMP monitoring
[[inputs.snmp]]
agents = ["udp://192.168.1.1:161"]
version = 2
community = "public"
[[inputs.snmp.field]]
name = "hostname"
oid = "RFC1213-MIB::sysName.0"
[[inputs.snmp.field]]
name = "uptime"
oid = "RFC1213-MIB::sysUpTime.0"
Creating Custom Telegraf Plugins
Custom Exec Plugin
# Create custom metric collection script
sudo tee /usr/local/bin/custom_metrics.sh <<'EOF'
#!/bin/bash
# Collect custom application metrics
app_connections=$(ss -tan | grep :8080 | wc -l)
app_memory=$(ps aux | grep myapp | awk '{sum+=$6} END {print sum}')
app_cpu=$(ps aux | grep myapp | awk '{sum+=$3} END {print sum}')
# Output in InfluxDB line protocol format
echo "custom_app,host=$(hostname) connections=${app_connections}i,memory=${app_memory}i,cpu=${app_cpu}"
EOF
sudo chmod +x /usr/local/bin/custom_metrics.sh
# Add to Telegraf configuration
cat <<EOF | sudo tee -a /etc/telegraf/telegraf.conf
[[inputs.exec]]
commands = ["/usr/local/bin/custom_metrics.sh"]
timeout = "5s"
data_format = "influx"
interval = "30s"
EOF
Setting Up Data Retention Policies
Creating Retention Policies with Flux
# Create retention policy for different data types
influx bucket create \
--name metrics-1d \
--org my-org \
--retention 1d
influx bucket create \
--name metrics-7d \
--org my-org \
--retention 7d
influx bucket create \
--name metrics-30d \
--org my-org \
--retention 30d
influx bucket create \
--name metrics-1y \
--org my-org \
--retention 365d
Automated Downsampling Task
// Create downsampling task
option task = {
name: "downsample-cpu-metrics",
every: 1h,
}
from(bucket: "metrics")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "cpu")
|> aggregateWindow(every: 1m, fn: mean, createEmpty: false)
|> to(bucket: "metrics-7d", org: "my-org")
Advanced Flux Queries
Complex Analytics Query
// Calculate 95th percentile response time by service
from(bucket: "metrics")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "http_response_time")
|> group(columns: ["service"])
|> aggregateWindow(every: 5m, fn: (column, tables=<-) =>
tables
|> quantile(q: 0.95, column: column)
)
|> yield(name: "p95_response_time")
// Anomaly detection using moving average
from(bucket: "metrics")
|> range(start: -24h)
|> filter(fn: (r) => r._measurement == "cpu" and r._field == "usage_idle")
|> movingAverage(n: 10)
|> map(fn: (r) => ({
r with
anomaly: if r._value < r.moving_average - 20.0 then true else false
}))
|> filter(fn: (r) => r.anomaly == true)
Creating Alerts
// Alert task for high CPU usage
option task = {
name: "cpu-alert",
every: 1m,
}
from(bucket: "metrics")
|> range(start: -5m)
|> filter(fn: (r) => r._measurement == "cpu" and r._field == "usage_idle")
|> aggregateWindow(every: 1m, fn: mean)
|> map(fn: (r) => ({r with _value: 100.0 - r._value}))
|> filter(fn: (r) => r._value > 80.0)
|> each(fn: (r) =>
sendAlert(
url: "https://webhook.site/your-webhook",
headers: {"Content-Type": "application/json"},
data: {
"alert": "High CPU Usage",
"host": r.host,
"value": r._value,
"time": r._time
}
)
)
Performance Optimization
Storage Optimization
# Configure optimal storage settings
sudo tee /etc/influxdb/storage-config.toml <<EOF
[storage]
# WAL settings
wal-fsync-delay = "100ms"
max-concurrent-writes = 10
# Compaction settings
compact-full-write-cold-duration = "4h"
compact-throughput = "48m"
compact-throughput-burst = "96m"
# Series cardinality limit
max-series-per-database = 1000000
max-values-per-tag = 100000
# Cache settings
cache-max-memory-size = "2g"
cache-snapshot-memory-size = "256m"
cache-snapshot-write-cold-duration = "10m"
EOF
# Monitor storage performance
cat <<'EOF' | sudo tee /usr/local/bin/monitor-influx-storage.sh
#!/bin/bash
# Check TSM file sizes
echo "TSM File Analysis:"
find /var/lib/influxdb/data -name "*.tsm" -type f -exec ls -lh {} \; | \
awk '{size=$5; total+=size; count++} END {print "Total TSM files:", count, "Total size:", total}'
# Check compaction queue
echo -e "\nCompaction Status:"
influx query 'from(bucket: "_monitoring")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "storage_compactions_queued")
|> last()'
# Check write throughput
echo -e "\nWrite Performance:"
influx query 'from(bucket: "_monitoring")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "storage_write_ok")
|> aggregateWindow(every: 1m, fn: rate)'
EOF
sudo chmod +x /usr/local/bin/monitor-influx-storage.sh
Query Performance Tuning
# /etc/influxdb/query-config.toml
[query]
# Query limits
max-memory-bytes = 0
queue-size = 1024
max-concurrent-queries = 1024
# Query timeout
query-timeout = "60s"
# Log slow queries
log-queries-after = "5s"
# Index settings
max-select-point = 0
max-select-series = 0
max-select-buckets = 0
High Availability Setup
InfluxDB Clustering (Enterprise)
# influxdb-cluster.yaml
# Note: Clustering requires InfluxDB Enterprise
cluster:
meta-nodes:
- influx-meta-1:8091
- influx-meta-2:8091
- influx-meta-3:8091
data-nodes:
- influx-data-1:8088
- influx-data-2:8088
- influx-data-3:8088
- influx-data-4:8088
replication-factor: 2
# Anti-entropy settings
anti-entropy:
enabled: true
check-interval: "30s"
max-fetch: 10
Load Balancing with HAProxy
# Install HAProxy
sudo dnf install -y haproxy
# Configure HAProxy for InfluxDB
sudo tee /etc/haproxy/haproxy.cfg <<EOF
global
log 127.0.0.1 local0
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy
daemon
defaults
mode tcp
log global
option tcplog
option dontlognull
option redispatch
retries 3
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout check 10s
frontend influxdb_write
bind *:8086
default_backend influxdb_write_backend
backend influxdb_write_backend
balance roundrobin
option httpchk GET /health
server influx1 192.168.1.10:8086 check
server influx2 192.168.1.11:8086 check
server influx3 192.168.1.12:8086 check
frontend influxdb_query
bind *:8087
default_backend influxdb_query_backend
backend influxdb_query_backend
balance leastconn
option httpchk GET /health
server influx1 192.168.1.10:8086 check
server influx2 192.168.1.11:8086 check
server influx3 192.168.1.12:8086 check
EOF
# Start HAProxy
sudo systemctl enable haproxy
sudo systemctl start haproxy
Integration with Visualization Tools
Grafana Integration
# Install Grafana
sudo dnf install -y grafana
# Configure Grafana data source
cat <<EOF | sudo tee /etc/grafana/provisioning/datasources/influxdb.yaml
apiVersion: 1
datasources:
- name: InfluxDB
type: influxdb
access: proxy
url: https://localhost:8086
jsonData:
version: Flux
organization: my-org
defaultBucket: metrics
tlsSkipVerify: true
secureJsonData:
token: my-super-secret-auth-token
EOF
# Start Grafana
sudo systemctl enable grafana-server
sudo systemctl start grafana-server
Creating Dashboards with Flux
{
"dashboard": {
"title": "System Metrics Dashboard",
"panels": [
{
"title": "CPU Usage",
"targets": [
{
"query": "from(bucket: \"metrics\")\n |> range(start: -1h)\n |> filter(fn: (r) => r._measurement == \"cpu\" and r._field == \"usage_idle\")\n |> map(fn: (r) => ({r with _value: 100.0 - r._value}))\n |> aggregateWindow(every: 1m, fn: mean)"
}
]
},
{
"title": "Memory Usage",
"targets": [
{
"query": "from(bucket: \"metrics\")\n |> range(start: -1h)\n |> filter(fn: (r) => r._measurement == \"mem\" and r._field == \"used_percent\")\n |> aggregateWindow(every: 1m, fn: mean)"
}
]
}
]
}
}
Monitoring and Maintenance
Automated Backup Script
#!/bin/bash
# /usr/local/bin/backup-influxdb.sh
BACKUP_DIR="/backup/influxdb"
RETENTION_DAYS=7
DATE=$(date +%Y%m%d_%H%M%S)
# Create backup directory
mkdir -p ${BACKUP_DIR}
# Backup InfluxDB
influx backup ${BACKUP_DIR}/backup_${DATE} \
--token my-super-secret-auth-token
# Compress backup
tar -czf ${BACKUP_DIR}/backup_${DATE}.tar.gz \
-C ${BACKUP_DIR} backup_${DATE}
# Remove uncompressed backup
rm -rf ${BACKUP_DIR}/backup_${DATE}
# Clean old backups
find ${BACKUP_DIR} -name "backup_*.tar.gz" -mtime +${RETENTION_DAYS} -delete
# Verify backup
if [ -f ${BACKUP_DIR}/backup_${DATE}.tar.gz ]; then
echo "Backup completed successfully: backup_${DATE}.tar.gz"
else
echo "Backup failed!" >&2
exit 1
fi
Health Monitoring
# Create health check script
cat <<'EOF' | sudo tee /usr/local/bin/check-influxdb-health.sh
#!/bin/bash
# Check InfluxDB service
if ! systemctl is-active --quiet influxdb; then
echo "CRITICAL: InfluxDB service is not running"
exit 2
fi
# Check API endpoint
if ! curl -s -k https://localhost:8086/health | grep -q '"status":"pass"'; then
echo "CRITICAL: InfluxDB API health check failed"
exit 2
fi
# Check disk space
DISK_USAGE=$(df -h /var/lib/influxdb | awk 'NR==2 {print $5}' | sed 's/%//')
if [ $DISK_USAGE -gt 80 ]; then
echo "WARNING: Disk usage is ${DISK_USAGE}%"
exit 1
fi
echo "OK: InfluxDB is healthy"
exit 0
EOF
sudo chmod +x /usr/local/bin/check-influxdb-health.sh
# Add to crontab
echo "*/5 * * * * /usr/local/bin/check-influxdb-health.sh" | sudo crontab -
Conclusion
Deploying InfluxDB and Telegraf on AlmaLinux creates a powerful time-series data platform capable of handling millions of metrics per second. With proper configuration, retention policies, and monitoring, this setup provides a robust foundation for infrastructure monitoring, IoT data collection, and real-time analytics.
The combination of InfluxDB’s efficient storage engine, Flux’s powerful query language, and Telegraf’s extensive plugin ecosystem makes this solution ideal for organizations requiring comprehensive time-series data management at scale.