The demand for AI and machine learning infrastructure has exploded in 2025, with organizations requiring powerful GPU-accelerated systems for training and inference. This comprehensive guide will walk you through building a production-ready AI/ML infrastructure on Rocky Linux with full NVIDIA GPU support, enabling you to run demanding deep learning workloads efficiently.
Understanding GPU-Accelerated Computing
GPU acceleration has become essential for AI/ML workloads due to:
- Parallel Processing: GPUs excel at matrix operations fundamental to neural networks
- Performance: 10-100x speedup for training deep learning models
- Cost Efficiency: Better performance per watt for AI workloads
- Framework Support: Native GPU support in TensorFlow, PyTorch, and other frameworks
- Scalability: Multi-GPU and distributed training capabilities
Prerequisites
Before building your AI/ML infrastructure, ensure you have:
- Rocky Linux 9 server (fresh installation)
- NVIDIA GPU (Tesla, Quadro, or GeForce RTX series)
- Minimum 32GB RAM (64GB+ recommended)
- 500GB+ NVMe SSD storage
- Internet connection for package downloads
- Root or sudo access
System Preparation
Initial System Configuration
# Update system packages
sudo dnf update -y
# Install development tools
sudo dnf groupinstall -y "Development Tools"
sudo dnf install -y kernel-devel kernel-headers
# Install essential packages
sudo dnf install -y epel-release
sudo dnf install -y wget curl vim git htop nvme-cli pciutils
# Disable nouveau driver
cat << EOF | sudo tee /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
EOF
# Regenerate initramfs
sudo dracut --force
# Set up performance tuning
sudo dnf install -y tuned
sudo tuned-adm profile throughput-performance
Verifying GPU Hardware
# Check for NVIDIA GPUs
lspci | grep -i nvidia
# Get detailed GPU information
sudo lshw -C display
# Check GPU topology
nvidia-smi topo -m # After driver installation
Installing NVIDIA Drivers
Method 1: Using NVIDIA CUDA Repository
# Install NVIDIA CUDA repository
sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo
# Clean DNF cache
sudo dnf clean all
# Install NVIDIA driver and CUDA
sudo dnf install -y nvidia-driver nvidia-driver-cuda cuda
# Install additional NVIDIA tools
sudo dnf install -y nvidia-driver-cuda-libs nvidia-driver-devel nvidia-modprobe
# Reboot system
sudo reboot
Method 2: Manual Driver Installation
# Download NVIDIA driver
DRIVER_VERSION="535.129.03"
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/${DRIVER_VERSION}/NVIDIA-Linux-x86_64-${DRIVER_VERSION}.run
# Make executable
chmod +x NVIDIA-Linux-x86_64-${DRIVER_VERSION}.run
# Install driver
sudo ./NVIDIA-Linux-x86_64-${DRIVER_VERSION}.run --silent --dkms
# Verify installation
nvidia-smi
Post-Installation Configuration
# Enable persistence mode
sudo nvidia-smi -pm 1
# Set GPU power limit (optional, adjust based on your GPU)
sudo nvidia-smi -pl 300
# Configure GPU fan speed (if applicable)
sudo nvidia-settings -a "[gpu:0]/GPUFanControlState=1"
sudo nvidia-settings -a "[fan:0]/GPUTargetFanSpeed=80"
# Create NVIDIA device files
sudo nvidia-modprobe -u -c=0
# Set up NVIDIA container runtime
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | \
sudo tee /etc/yum.repos.d/nvidia-docker.repo
sudo dnf install -y nvidia-container-toolkit
sudo systemctl restart docker
Setting Up CUDA Environment
Installing CUDA Toolkit
# Install CUDA Toolkit
sudo dnf install -y cuda-toolkit-12-3
# Set up environment variables
cat << 'EOF' | sudo tee /etc/profile.d/cuda.sh
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export CUDA_HOME=/usr/local/cuda
EOF
# Source the environment
source /etc/profile.d/cuda.sh
# Verify CUDA installation
nvcc --version
nvidia-smi
Installing cuDNN
# Download cuDNN (requires NVIDIA Developer account)
# Visit: https://developer.nvidia.com/cudnn-downloads
# Extract and install cuDNN
tar -xzvf cudnn-linux-x86_64-8.9.7.29_cuda12-archive.tar.xz
sudo cp cudnn-linux-x86_64-8.9.7.29_cuda12-archive/include/cudnn*.h /usr/local/cuda/include
sudo cp cudnn-linux-x86_64-8.9.7.29_cuda12-archive/lib/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
Container Runtime Setup
Installing Docker with NVIDIA Support
# Install Docker
sudo dnf config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo
sudo dnf install -y docker-ce docker-ce-cli containerd.io
# Start and enable Docker
sudo systemctl enable --now docker
# Add user to docker group
sudo usermod -aG docker $USER
# Configure Docker daemon for NVIDIA
sudo tee /etc/docker/daemon.json <<EOF
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
},
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m",
"max-file": "10"
},
"storage-driver": "overlay2"
}
EOF
sudo systemctl restart docker
# Test NVIDIA Docker
docker run --rm --gpus all nvidia/cuda:12.3.0-base-ubuntu22.04 nvidia-smi
Installing Podman with NVIDIA Support
# Install Podman
sudo dnf install -y podman podman-docker
# Install NVIDIA Container Device Interface
sudo dnf install -y nvidia-container-toolkit-base
# Configure Podman for NVIDIA
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
# Test Podman with GPU
podman run --rm --device nvidia.com/gpu=all ubuntu nvidia-smi
Python Environment Setup
Installing Miniconda
# Download and install Miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda3
# Initialize conda
$HOME/miniconda3/bin/conda init bash
source ~/.bashrc
# Update conda
conda update -n base -c defaults conda -y
Creating ML Environment
# Create environment with Python 3.10
conda create -n ml-gpu python=3.10 -y
conda activate ml-gpu
# Install essential packages
pip install --upgrade pip setuptools wheel
# Install CUDA-aware packages
conda install -c conda-forge cudatoolkit=11.8 cudnn=8.9.2 -y
# Install data science packages
pip install numpy pandas scikit-learn matplotlib seaborn jupyter jupyterlab
Installing Deep Learning Frameworks
TensorFlow with GPU Support
# Install TensorFlow
pip install tensorflow[and-cuda]
# Verify GPU support
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
# Test TensorFlow GPU
python << EOF
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
# Simple GPU test
with tf.device('/GPU:0'):
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)
print(c)
EOF
PyTorch with GPU Support
# Install PyTorch with CUDA 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# Verify PyTorch GPU support
python -c "import torch; print(torch.cuda.is_available())"
python -c "import torch; print(torch.cuda.get_device_name(0))"
# Test PyTorch GPU
python << EOF
import torch
# Check CUDA availability
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA device count: {torch.cuda.device_count()}")
print(f"Current device: {torch.cuda.current_device()}")
print(f"Device name: {torch.cuda.get_device_name(0)}")
# Simple GPU computation
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
x = torch.randn(1000, 1000).to(device)
y = torch.randn(1000, 1000).to(device)
z = torch.matmul(x, y)
print(f"Result shape: {z.shape}")
EOF
JAX with GPU Support
# Install JAX with CUDA support
pip install --upgrade "jax[cuda11_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
# Test JAX GPU
python << EOF
import jax
import jax.numpy as jnp
print(f"JAX devices: {jax.devices()}")
# Simple GPU computation
key = jax.random.PRNGKey(0)
x = jax.random.normal(key, (1000, 1000))
y = jnp.dot(x, x.T)
print(f"Result shape: {y.shape}")
EOF
Setting Up Jupyter Lab
Installing and Configuring JupyterLab
# Install JupyterLab with extensions
pip install jupyterlab ipywidgets nbdime jupyterlab-git
# Install useful extensions
pip install jupyterlab_code_formatter black isort
# Generate Jupyter config
jupyter notebook --generate-config
# Configure Jupyter
cat << EOF >> ~/.jupyter/jupyter_notebook_config.py
c.NotebookApp.ip = '0.0.0.0'
c.NotebookApp.port = 8888
c.NotebookApp.open_browser = False
c.NotebookApp.password = '' # Set password with jupyter notebook password
c.NotebookApp.allow_remote_access = True
EOF
# Install kernel for the ML environment
python -m ipykernel install --user --name ml-gpu --display-name "Python (ML-GPU)"
# Start JupyterLab
jupyter lab --no-browser --ip=0.0.0.0
Setting Up GPU Monitoring in Jupyter
# GPU monitoring notebook cell
import GPUtil
import psutil
import time
from IPython.display import clear_output
import matplotlib.pyplot as plt
def monitor_gpu(duration=60, interval=1):
gpu_usage = []
memory_usage = []
timestamps = []
start_time = time.time()
while time.time() - start_time < duration:
gpus = GPUtil.getGPUs()
if gpus:
gpu_usage.append(gpus[0].load * 100)
memory_usage.append(gpus[0].memoryUtil * 100)
timestamps.append(time.time() - start_time)
time.sleep(interval)
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 8))
ax1.plot(timestamps, gpu_usage)
ax1.set_ylabel('GPU Usage (%)')
ax1.set_ylim(0, 100)
ax1.grid(True)
ax2.plot(timestamps, memory_usage)
ax2.set_xlabel('Time (seconds)')
ax2.set_ylabel('Memory Usage (%)')
ax2.set_ylim(0, 100)
ax2.grid(True)
plt.tight_layout()
plt.show()
# Install required package: pip install gputil psutil
Distributed Training Setup
Setting Up Horovod
# Install MPI
sudo dnf install -y openmpi openmpi-devel
# Set up MPI environment
export PATH=/usr/lib64/openmpi/bin:$PATH
export LD_LIBRARY_PATH=/usr/lib64/openmpi/lib:$LD_LIBRARY_PATH
# Install Horovod
HOROVOD_GPU_OPERATIONS=NCCL pip install horovod[tensorflow,pytorch]
# Verify Horovod installation
horovodrun --check-build
Multi-GPU Training Example
# train_multi_gpu.py
import torch
import torch.nn as nn
import torch.optim as optim
import torch.distributed as dist
import torch.multiprocessing as mp
from torch.nn.parallel import DistributedDataParallel as DDP
def setup(rank, world_size):
dist.init_process_group("nccl", rank=rank, world_size=world_size)
def cleanup():
dist.destroy_process_group()
def train(rank, world_size):
setup(rank, world_size)
# Create model and move it to GPU
model = nn.Sequential(
nn.Linear(784, 256),
nn.ReLU(),
nn.Linear(256, 10)
).to(rank)
ddp_model = DDP(model, device_ids=[rank])
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.SGD(ddp_model.parameters(), lr=0.001)
# Training loop
for epoch in range(10):
# Your training code here
pass
cleanup()
def main():
world_size = torch.cuda.device_count()
mp.spawn(train, args=(world_size,), nprocs=world_size, join=True)
if __name__ == "__main__":
main()
Model Serving Infrastructure
Setting Up TorchServe
# Install TorchServe
pip install torchserve torch-model-archiver torch-workflow-archiver
# Install dependencies
sudo dnf install -y java-11-openjdk
# Create model archive
torch-model-archiver --model-name resnet50 \
--version 1.0 \
--model-file model.py \
--serialized-file resnet50.pth \
--handler image_classifier
# Start TorchServe
torchserve --start --ncs --model-store model_store --models resnet50.mar
# Configure for GPU
cat << EOF > config.properties
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
number_of_gpu=1
batch_size=8
max_batch_delay=100
EOF
Setting Up Triton Inference Server
# Pull Triton Docker image
docker pull nvcr.io/nvidia/tritonserver:23.10-py3
# Create model repository structure
mkdir -p models/resnet50/1
cp model.onnx models/resnet50/1/
# Create model configuration
cat << EOF > models/resnet50/config.pbtxt
name: "resnet50"
platform: "onnxruntime_onnx"
max_batch_size: 8
input [
{
name: "input"
data_type: TYPE_FP32
dims: [ 3, 224, 224 ]
}
]
output [
{
name: "output"
data_type: TYPE_FP32
dims: [ 1000 ]
}
]
instance_group [
{
count: 1
kind: KIND_GPU
gpus: [ 0 ]
}
]
EOF
# Run Triton server
docker run --gpus all --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 \
-v $(pwd)/models:/models \
nvcr.io/nvidia/tritonserver:23.10-py3 \
tritonserver --model-repository=/models
Monitoring and Management
Setting Up GPU Monitoring
# Install DCGM (Data Center GPU Manager)
sudo dnf install -y datacenter-gpu-manager
# Start DCGM service
sudo systemctl enable --now nvidia-dcgm
# Install DCGM exporter for Prometheus
docker run -d --gpus all --rm -p 9400:9400 \
nvidia/dcgm-exporter:3.1.7-3.1.4-ubuntu20.04
# Configure Prometheus to scrape DCGM metrics
cat << EOF >> /etc/prometheus/prometheus.yml
- job_name: 'dcgm'
static_configs:
- targets: ['localhost:9400']
EOF
Creating GPU Dashboard
# gpu_dashboard.py
import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
import plotly.graph_objs as go
import GPUtil
import psutil
from datetime import datetime
app = dash.Dash(__name__)
app.layout = html.Div([
html.H1('GPU Monitoring Dashboard'),
dcc.Graph(id='gpu-usage'),
dcc.Graph(id='gpu-memory'),
dcc.Graph(id='gpu-temperature'),
dcc.Interval(id='interval-component', interval=2000)
])
@app.callback(
[Output('gpu-usage', 'figure'),
Output('gpu-memory', 'figure'),
Output('gpu-temperature', 'figure')],
[Input('interval-component', 'n_intervals')]
)
def update_graphs(n):
gpus = GPUtil.getGPUs()
if not gpus:
return {}, {}, {}
gpu = gpus[0]
# GPU Usage
usage_fig = {
'data': [go.Bar(x=['GPU Usage'], y=[gpu.load * 100])],
'layout': go.Layout(title='GPU Usage (%)', yaxis={'range': [0, 100]})
}
# GPU Memory
memory_fig = {
'data': [go.Bar(x=['Used', 'Free'],
y=[gpu.memoryUsed, gpu.memoryFree])],
'layout': go.Layout(title='GPU Memory (MB)')
}
# GPU Temperature
temp_fig = {
'data': [go.Indicator(
mode="gauge+number",
value=gpu.temperature,
title={'text': "GPU Temperature (°C)"},
gauge={'axis': {'range': [None, 100]},
'bar': {'color': "darkblue"},
'steps': [
{'range': [0, 50], 'color': "lightgray"},
{'range': [50, 80], 'color': "yellow"},
{'range': [80, 100], 'color': "red"}],
'threshold': {'line': {'color': "red", 'width': 4},
'thickness': 0.75, 'value': 85}}
)],
'layout': go.Layout(height=400)
}
return usage_fig, memory_fig, temp_fig
if __name__ == '__main__':
app.run_server(debug=True, host='0.0.0.0', port=8050)
Performance Optimization
CUDA Optimization Tips
# optimize_cuda.py
import torch
import torch.nn as nn
from torch.cuda.amp import autocast, GradScaler
# Enable TF32 for Ampere GPUs
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True
# Use automatic mixed precision
model = YourModel().cuda()
optimizer = torch.optim.Adam(model.parameters())
scaler = GradScaler()
for epoch in range(num_epochs):
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.cuda(), target.cuda()
optimizer.zero_grad()
# Automatic mixed precision
with autocast():
output = model(data)
loss = loss_fn(output, target)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
# Memory optimization
torch.cuda.empty_cache()
# Pin memory for faster transfers
train_loader = DataLoader(
dataset,
batch_size=batch_size,
pin_memory=True,
num_workers=4
)
System-Level Optimization
# Set GPU persistence mode
sudo nvidia-smi -pm 1
# Set GPU compute mode to exclusive
sudo nvidia-smi -c 3
# Optimize PCIe settings
sudo setpci -s 01:00.0 68.w=5000 # Adjust based on your GPU
# CPU affinity for GPU processes
numactl --cpunodebind=0 --membind=0 python train.py
# Disable CPU frequency scaling
sudo cpupower frequency-set -g performance
Troubleshooting
Common Issues and Solutions
# Check NVIDIA driver status
systemctl status nvidia-persistenced
journalctl -u nvidia-persistenced
# Reset GPU
sudo nvidia-smi --gpu-reset
# Check CUDA installation
ldconfig -p | grep cuda
ldconfig -p | grep cudnn
# Fix library path issues
echo "/usr/local/cuda/lib64" | sudo tee /etc/ld.so.conf.d/cuda.conf
sudo ldconfig
# Debug GPU memory issues
nvidia-smi --query-gpu=memory.used,memory.free,memory.total --format=csv
# Monitor GPU processes
watch -n 1 nvidia-smi
# Check for thermal throttling
nvidia-smi -q -d PERFORMANCE
Performance Profiling
# profile_gpu.py
import torch
from torch.profiler import profile, record_function, ProfilerActivity
model = YourModel().cuda()
with profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA],
record_shapes=True,
profile_memory=True,
with_stack=True) as prof:
with record_function("model_inference"):
model(input.cuda())
print(prof.key_averages().table(sort_by="cuda_time_total", row_limit=10))
prof.export_chrome_trace("trace.json")
Security Considerations
Securing GPU Access
# Create GPU users group
sudo groupadd gpu-users
# Add users to GPU group
sudo usermod -a -G gpu-users username
# Set GPU device permissions
cat << EOF | sudo tee /etc/udev/rules.d/70-nvidia.rules
KERNEL=="nvidia*", GROUP="gpu-users", MODE="0660"
KERNEL=="nvidia-uvm*", GROUP="gpu-users", MODE="0660"
EOF
# Reload udev rules
sudo udevadm control --reload-rules
sudo udevadm trigger
# Configure cgroups for GPU resource limits
sudo nvidia-ctk cgroup setup
Best Practices
-
Resource Management
- Monitor GPU memory usage continuously
- Implement proper error handling for OOM situations
- Use mixed precision training when possible
-
Performance
- Profile your code regularly
- Use appropriate batch sizes
- Enable cuDNN benchmarking for optimal kernels
-
Maintenance
- Keep drivers and CUDA updated
- Regular system monitoring
- Implement proper logging
-
Security
- Restrict GPU access to authorized users
- Use containers for isolation
- Regular security updates
Conclusion
You’ve successfully built a comprehensive AI/ML infrastructure on Rocky Linux with full NVIDIA GPU support. This foundation provides you with a powerful platform for developing, training, and deploying machine learning models at scale. Remember to continuously monitor system performance and stay updated with the latest driver and framework releases to maintain optimal performance.
The combination of Rocky Linux’s stability and NVIDIA’s GPU acceleration creates a robust environment for your AI/ML workloads, whether you’re conducting research, developing products, or running production inference services.