Building AI/ML Infrastructure with NVIDIA GPU Support on Rocky Linux

The demand for AI and machine learning infrastructure has exploded in 2025, with organizations requiring powerful GPU-accelerated systems for training and inference. This comprehensive guide will walk you through building a production-ready AI/ML infrastructure on Rocky Linux with full NVIDIA GPU support, enabling you to run demanding deep learning workloads efficiently.

Understanding GPU-Accelerated Computing

GPU acceleration has become essential for AI/ML workloads due to:

Parallel Processing: GPUs excel at matrix operations fundamental to neural networks
Performance: 10-100x speedup for training deep learning models
Cost Efficiency: Better performance per watt for AI workloads
Framework Support: Native GPU support in TensorFlow, PyTorch, and other frameworks
Scalability: Multi-GPU and distributed training capabilities

Prerequisites

Before building your AI/ML infrastructure, ensure you have:

Rocky Linux 9 server (fresh installation)
NVIDIA GPU (Tesla, Quadro, or GeForce RTX series)
Minimum 32GB RAM (64GB+ recommended)
500GB+ NVMe SSD storage
Internet connection for package downloads
Root or sudo access

System Preparation

Initial System Configuration

# Update system packages
sudo dnf update -y

# Install development tools
sudo dnf groupinstall -y "Development Tools"
sudo dnf install -y kernel-devel kernel-headers

# Install essential packages
sudo dnf install -y epel-release
sudo dnf install -y wget curl vim git htop nvme-cli pciutils

# Disable nouveau driver
cat << EOF | sudo tee /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
EOF

# Regenerate initramfs
sudo dracut --force

# Set up performance tuning
sudo dnf install -y tuned
sudo tuned-adm profile throughput-performance

Verifying GPU Hardware

# Check for NVIDIA GPUs
lspci | grep -i nvidia

# Get detailed GPU information
sudo lshw -C display

# Check GPU topology
nvidia-smi topo -m  # After driver installation

Installing NVIDIA Drivers

Method 1: Using NVIDIA CUDA Repository

# Install NVIDIA CUDA repository
sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo

# Clean DNF cache
sudo dnf clean all

# Install NVIDIA driver and CUDA
sudo dnf install -y nvidia-driver nvidia-driver-cuda cuda

# Install additional NVIDIA tools
sudo dnf install -y nvidia-driver-cuda-libs nvidia-driver-devel nvidia-modprobe

# Reboot system
sudo reboot

Method 2: Manual Driver Installation

# Download NVIDIA driver
DRIVER_VERSION="535.129.03"
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/${DRIVER_VERSION}/NVIDIA-Linux-x86_64-${DRIVER_VERSION}.run

# Make executable
chmod +x NVIDIA-Linux-x86_64-${DRIVER_VERSION}.run

# Install driver
sudo ./NVIDIA-Linux-x86_64-${DRIVER_VERSION}.run --silent --dkms

# Verify installation
nvidia-smi

Post-Installation Configuration

# Enable persistence mode
sudo nvidia-smi -pm 1

# Set GPU power limit (optional, adjust based on your GPU)
sudo nvidia-smi -pl 300

# Configure GPU fan speed (if applicable)
sudo nvidia-settings -a "[gpu:0]/GPUFanControlState=1"
sudo nvidia-settings -a "[fan:0]/GPUTargetFanSpeed=80"

# Create NVIDIA device files
sudo nvidia-modprobe -u -c=0

# Set up NVIDIA container runtime
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | \
  sudo tee /etc/yum.repos.d/nvidia-docker.repo

sudo dnf install -y nvidia-container-toolkit
sudo systemctl restart docker

Setting Up CUDA Environment

Installing CUDA Toolkit

# Install CUDA Toolkit
sudo dnf install -y cuda-toolkit-12-3

# Set up environment variables
cat << 'EOF' | sudo tee /etc/profile.d/cuda.sh
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export CUDA_HOME=/usr/local/cuda
EOF

# Source the environment
source /etc/profile.d/cuda.sh

# Verify CUDA installation
nvcc --version
nvidia-smi

Installing cuDNN

# Download cuDNN (requires NVIDIA Developer account)
# Visit: https://developer.nvidia.com/cudnn-downloads

# Extract and install cuDNN
tar -xzvf cudnn-linux-x86_64-8.9.7.29_cuda12-archive.tar.xz
sudo cp cudnn-linux-x86_64-8.9.7.29_cuda12-archive/include/cudnn*.h /usr/local/cuda/include
sudo cp cudnn-linux-x86_64-8.9.7.29_cuda12-archive/lib/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*

Container Runtime Setup

Installing Docker with NVIDIA Support

# Install Docker
sudo dnf config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo
sudo dnf install -y docker-ce docker-ce-cli containerd.io

# Start and enable Docker
sudo systemctl enable --now docker

# Add user to docker group
sudo usermod -aG docker $USER

# Configure Docker daemon for NVIDIA
sudo tee /etc/docker/daemon.json <<EOF
{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "exec-opts": ["native.cgroupdriver=systemd"],
    "log-driver": "json-file",
    "log-opts": {
        "max-size": "100m",
        "max-file": "10"
    },
    "storage-driver": "overlay2"
}
EOF

sudo systemctl restart docker

# Test NVIDIA Docker
docker run --rm --gpus all nvidia/cuda:12.3.0-base-ubuntu22.04 nvidia-smi

Installing Podman with NVIDIA Support

# Install Podman
sudo dnf install -y podman podman-docker

# Install NVIDIA Container Device Interface
sudo dnf install -y nvidia-container-toolkit-base

# Configure Podman for NVIDIA
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

# Test Podman with GPU
podman run --rm --device nvidia.com/gpu=all ubuntu nvidia-smi

Python Environment Setup

Installing Miniconda

# Download and install Miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda3

# Initialize conda
$HOME/miniconda3/bin/conda init bash
source ~/.bashrc

# Update conda
conda update -n base -c defaults conda -y

Creating ML Environment

# Create environment with Python 3.10
conda create -n ml-gpu python=3.10 -y
conda activate ml-gpu

# Install essential packages
pip install --upgrade pip setuptools wheel

# Install CUDA-aware packages
conda install -c conda-forge cudatoolkit=11.8 cudnn=8.9.2 -y

# Install data science packages
pip install numpy pandas scikit-learn matplotlib seaborn jupyter jupyterlab

Installing Deep Learning Frameworks

TensorFlow with GPU Support

# Install TensorFlow
pip install tensorflow[and-cuda]

# Verify GPU support
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

# Test TensorFlow GPU
python << EOF
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

# Simple GPU test
with tf.device('/GPU:0'):
    a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
    b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
    c = tf.matmul(a, b)
    print(c)
EOF

PyTorch with GPU Support

# Install PyTorch with CUDA 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Verify PyTorch GPU support
python -c "import torch; print(torch.cuda.is_available())"
python -c "import torch; print(torch.cuda.get_device_name(0))"

# Test PyTorch GPU
python << EOF
import torch

# Check CUDA availability
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA device count: {torch.cuda.device_count()}")
print(f"Current device: {torch.cuda.current_device()}")
print(f"Device name: {torch.cuda.get_device_name(0)}")

# Simple GPU computation
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
x = torch.randn(1000, 1000).to(device)
y = torch.randn(1000, 1000).to(device)
z = torch.matmul(x, y)
print(f"Result shape: {z.shape}")
EOF

JAX with GPU Support

# Install JAX with CUDA support
pip install --upgrade "jax[cuda11_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

# Test JAX GPU
python << EOF
import jax
import jax.numpy as jnp

print(f"JAX devices: {jax.devices()}")

# Simple GPU computation
key = jax.random.PRNGKey(0)
x = jax.random.normal(key, (1000, 1000))
y = jnp.dot(x, x.T)
print(f"Result shape: {y.shape}")
EOF

Setting Up Jupyter Lab

Installing and Configuring JupyterLab

# Install JupyterLab with extensions
pip install jupyterlab ipywidgets nbdime jupyterlab-git

# Install useful extensions
pip install jupyterlab_code_formatter black isort

# Generate Jupyter config
jupyter notebook --generate-config

# Configure Jupyter
cat << EOF >> ~/.jupyter/jupyter_notebook_config.py
c.NotebookApp.ip = '0.0.0.0'
c.NotebookApp.port = 8888
c.NotebookApp.open_browser = False
c.NotebookApp.password = ''  # Set password with jupyter notebook password
c.NotebookApp.allow_remote_access = True
EOF

# Install kernel for the ML environment
python -m ipykernel install --user --name ml-gpu --display-name "Python (ML-GPU)"

# Start JupyterLab
jupyter lab --no-browser --ip=0.0.0.0

Setting Up GPU Monitoring in Jupyter

# GPU monitoring notebook cell
import GPUtil
import psutil
import time
from IPython.display import clear_output
import matplotlib.pyplot as plt

def monitor_gpu(duration=60, interval=1):
    gpu_usage = []
    memory_usage = []
    timestamps = []
    
    start_time = time.time()
    while time.time() - start_time < duration:
        gpus = GPUtil.getGPUs()
        if gpus:
            gpu_usage.append(gpus[0].load * 100)
            memory_usage.append(gpus[0].memoryUtil * 100)
            timestamps.append(time.time() - start_time)
        
        time.sleep(interval)
        
    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 8))
    
    ax1.plot(timestamps, gpu_usage)
    ax1.set_ylabel('GPU Usage (%)')
    ax1.set_ylim(0, 100)
    ax1.grid(True)
    
    ax2.plot(timestamps, memory_usage)
    ax2.set_xlabel('Time (seconds)')
    ax2.set_ylabel('Memory Usage (%)')
    ax2.set_ylim(0, 100)
    ax2.grid(True)
    
    plt.tight_layout()
    plt.show()

# Install required package: pip install gputil psutil

Distributed Training Setup

Setting Up Horovod

# Install MPI
sudo dnf install -y openmpi openmpi-devel

# Set up MPI environment
export PATH=/usr/lib64/openmpi/bin:$PATH
export LD_LIBRARY_PATH=/usr/lib64/openmpi/lib:$LD_LIBRARY_PATH

# Install Horovod
HOROVOD_GPU_OPERATIONS=NCCL pip install horovod[tensorflow,pytorch]

# Verify Horovod installation
horovodrun --check-build

Multi-GPU Training Example

# train_multi_gpu.py
import torch
import torch.nn as nn
import torch.optim as optim
import torch.distributed as dist
import torch.multiprocessing as mp
from torch.nn.parallel import DistributedDataParallel as DDP

def setup(rank, world_size):
    dist.init_process_group("nccl", rank=rank, world_size=world_size)

def cleanup():
    dist.destroy_process_group()

def train(rank, world_size):
    setup(rank, world_size)
    
    # Create model and move it to GPU
    model = nn.Sequential(
        nn.Linear(784, 256),
        nn.ReLU(),
        nn.Linear(256, 10)
    ).to(rank)
    
    ddp_model = DDP(model, device_ids=[rank])
    
    loss_fn = nn.CrossEntropyLoss()
    optimizer = optim.SGD(ddp_model.parameters(), lr=0.001)
    
    # Training loop
    for epoch in range(10):
        # Your training code here
        pass
    
    cleanup()

def main():
    world_size = torch.cuda.device_count()
    mp.spawn(train, args=(world_size,), nprocs=world_size, join=True)

if __name__ == "__main__":
    main()

Model Serving Infrastructure

Setting Up TorchServe

# Install TorchServe
pip install torchserve torch-model-archiver torch-workflow-archiver

# Install dependencies
sudo dnf install -y java-11-openjdk

# Create model archive
torch-model-archiver --model-name resnet50 \
    --version 1.0 \
    --model-file model.py \
    --serialized-file resnet50.pth \
    --handler image_classifier

# Start TorchServe
torchserve --start --ncs --model-store model_store --models resnet50.mar

# Configure for GPU
cat << EOF > config.properties
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
number_of_gpu=1
batch_size=8
max_batch_delay=100
EOF

Setting Up Triton Inference Server

# Pull Triton Docker image
docker pull nvcr.io/nvidia/tritonserver:23.10-py3

# Create model repository structure
mkdir -p models/resnet50/1
cp model.onnx models/resnet50/1/

# Create model configuration
cat << EOF > models/resnet50/config.pbtxt
name: "resnet50"
platform: "onnxruntime_onnx"
max_batch_size: 8
input [
  {
    name: "input"
    data_type: TYPE_FP32
    dims: [ 3, 224, 224 ]
  }
]
output [
  {
    name: "output"
    data_type: TYPE_FP32
    dims: [ 1000 ]
  }
]
instance_group [
  {
    count: 1
    kind: KIND_GPU
    gpus: [ 0 ]
  }
]
EOF

# Run Triton server
docker run --gpus all --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 \
    -v $(pwd)/models:/models \
    nvcr.io/nvidia/tritonserver:23.10-py3 \
    tritonserver --model-repository=/models

Monitoring and Management

Setting Up GPU Monitoring

# Install DCGM (Data Center GPU Manager)
sudo dnf install -y datacenter-gpu-manager

# Start DCGM service
sudo systemctl enable --now nvidia-dcgm

# Install DCGM exporter for Prometheus
docker run -d --gpus all --rm -p 9400:9400 \
    nvidia/dcgm-exporter:3.1.7-3.1.4-ubuntu20.04

# Configure Prometheus to scrape DCGM metrics
cat << EOF >> /etc/prometheus/prometheus.yml
  - job_name: 'dcgm'
    static_configs:
    - targets: ['localhost:9400']
EOF

Creating GPU Dashboard

# gpu_dashboard.py
import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
import plotly.graph_objs as go
import GPUtil
import psutil
from datetime import datetime

app = dash.Dash(__name__)

app.layout = html.Div([
    html.H1('GPU Monitoring Dashboard'),
    dcc.Graph(id='gpu-usage'),
    dcc.Graph(id='gpu-memory'),
    dcc.Graph(id='gpu-temperature'),
    dcc.Interval(id='interval-component', interval=2000)
])

@app.callback(
    [Output('gpu-usage', 'figure'),
     Output('gpu-memory', 'figure'),
     Output('gpu-temperature', 'figure')],
    [Input('interval-component', 'n_intervals')]
)
def update_graphs(n):
    gpus = GPUtil.getGPUs()
    
    if not gpus:
        return {}, {}, {}
    
    gpu = gpus[0]
    
    # GPU Usage
    usage_fig = {
        'data': [go.Bar(x=['GPU Usage'], y=[gpu.load * 100])],
        'layout': go.Layout(title='GPU Usage (%)', yaxis={'range': [0, 100]})
    }
    
    # GPU Memory
    memory_fig = {
        'data': [go.Bar(x=['Used', 'Free'], 
                       y=[gpu.memoryUsed, gpu.memoryFree])],
        'layout': go.Layout(title='GPU Memory (MB)')
    }
    
    # GPU Temperature
    temp_fig = {
        'data': [go.Indicator(
            mode="gauge+number",
            value=gpu.temperature,
            title={'text': "GPU Temperature (°C)"},
            gauge={'axis': {'range': [None, 100]},
                   'bar': {'color': "darkblue"},
                   'steps': [
                       {'range': [0, 50], 'color': "lightgray"},
                       {'range': [50, 80], 'color': "yellow"},
                       {'range': [80, 100], 'color': "red"}],
                   'threshold': {'line': {'color': "red", 'width': 4},
                                'thickness': 0.75, 'value': 85}}
        )],
        'layout': go.Layout(height=400)
    }
    
    return usage_fig, memory_fig, temp_fig

if __name__ == '__main__':
    app.run_server(debug=True, host='0.0.0.0', port=8050)

Performance Optimization

CUDA Optimization Tips

# optimize_cuda.py
import torch
import torch.nn as nn
from torch.cuda.amp import autocast, GradScaler

# Enable TF32 for Ampere GPUs
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True

# Use automatic mixed precision
model = YourModel().cuda()
optimizer = torch.optim.Adam(model.parameters())
scaler = GradScaler()

for epoch in range(num_epochs):
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.cuda(), target.cuda()
        
        optimizer.zero_grad()
        
        # Automatic mixed precision
        with autocast():
            output = model(data)
            loss = loss_fn(output, target)
        
        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()

# Memory optimization
torch.cuda.empty_cache()

# Pin memory for faster transfers
train_loader = DataLoader(
    dataset, 
    batch_size=batch_size,
    pin_memory=True,
    num_workers=4
)

System-Level Optimization

# Set GPU persistence mode
sudo nvidia-smi -pm 1

# Set GPU compute mode to exclusive
sudo nvidia-smi -c 3

# Optimize PCIe settings
sudo setpci -s 01:00.0 68.w=5000  # Adjust based on your GPU

# CPU affinity for GPU processes
numactl --cpunodebind=0 --membind=0 python train.py

# Disable CPU frequency scaling
sudo cpupower frequency-set -g performance

Troubleshooting

Common Issues and Solutions

# Check NVIDIA driver status
systemctl status nvidia-persistenced
journalctl -u nvidia-persistenced

# Reset GPU
sudo nvidia-smi --gpu-reset

# Check CUDA installation
ldconfig -p | grep cuda
ldconfig -p | grep cudnn

# Fix library path issues
echo "/usr/local/cuda/lib64" | sudo tee /etc/ld.so.conf.d/cuda.conf
sudo ldconfig

# Debug GPU memory issues
nvidia-smi --query-gpu=memory.used,memory.free,memory.total --format=csv

# Monitor GPU processes
watch -n 1 nvidia-smi

# Check for thermal throttling
nvidia-smi -q -d PERFORMANCE

Performance Profiling

# profile_gpu.py
import torch
from torch.profiler import profile, record_function, ProfilerActivity

model = YourModel().cuda()

with profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA],
            record_shapes=True,
            profile_memory=True,
            with_stack=True) as prof:
    with record_function("model_inference"):
        model(input.cuda())

print(prof.key_averages().table(sort_by="cuda_time_total", row_limit=10))
prof.export_chrome_trace("trace.json")

Security Considerations

Securing GPU Access

# Create GPU users group
sudo groupadd gpu-users

# Add users to GPU group
sudo usermod -a -G gpu-users username

# Set GPU device permissions
cat << EOF | sudo tee /etc/udev/rules.d/70-nvidia.rules
KERNEL=="nvidia*", GROUP="gpu-users", MODE="0660"
KERNEL=="nvidia-uvm*", GROUP="gpu-users", MODE="0660"
EOF

# Reload udev rules
sudo udevadm control --reload-rules
sudo udevadm trigger

# Configure cgroups for GPU resource limits
sudo nvidia-ctk cgroup setup

Best Practices

Resource Management
- Monitor GPU memory usage continuously
- Implement proper error handling for OOM situations
- Use mixed precision training when possible
Performance
- Profile your code regularly
- Use appropriate batch sizes
- Enable cuDNN benchmarking for optimal kernels
Maintenance
- Keep drivers and CUDA updated
- Regular system monitoring
- Implement proper logging
Security
- Restrict GPU access to authorized users
- Use containers for isolation
- Regular security updates

Conclusion

You’ve successfully built a comprehensive AI/ML infrastructure on Rocky Linux with full NVIDIA GPU support. This foundation provides you with a powerful platform for developing, training, and deploying machine learning models at scale. Remember to continuously monitor system performance and stay updated with the latest driver and framework releases to maintain optimal performance.

The combination of Rocky Linux’s stability and NVIDIA’s GPU acceleration creates a robust environment for your AI/ML workloads, whether you’re conducting research, developing products, or running production inference services.