Production-Ready Grafana LGTM Stack with Docker Compose

TL;DR: Complete guide to deploying Grafana, Loki, Mimir, and Alloy on EC2 with persistent storage, ECS log routing, and OpenTelemetry metrics collection.

Setting up a production observability stack requires careful consideration of data persistence, service orchestration, and integration with your existing infrastructure. This guide walks through deploying the Grafana LGTM stack (Loki, Grafana, Tempo/Mimir) on AWS EC2 with Docker Compose.

Architecture Overview

The stack consists of four core services:

  • Grafana (v11.0.0) - Visualization and dashboards
  • Loki (v3.0.0) - Log aggregation
  • Mimir (v2.12.0) - Long-term metrics storage (Prometheus-compatible)
  • Alloy (v1.1.1) - OpenTelemetry collector and metrics processor
┌─────────────────┐     ┌─────────────────┐
│   ECS Tasks     │────▶│   Fluent Bit    │
│  (Applications) │     │   (Log Router)  │
└─────────────────┘     └────────┬────────┘


┌─────────────────────────────────────────────┐
│              Grafana Stack (EC2)             │
│  ┌─────────┐  ┌──────┐  ┌───────┐  ┌─────┐ │
│  │ Grafana │  │ Loki │  │ Mimir │  │Alloy│ │
│  │  :3000  │  │:3100 │  │ :9009 │  │:9999│ │
│  └─────────┘  └──────┘  └───────┘  └─────┘ │
│                    │                        │
│              EBS Volume (Persistent)        │
└─────────────────────────────────────────────┘

Infrastructure Setup

EC2 Instance Configuration

  1. AMI: Ubuntu LTS (22.04 or newer)
  2. Security Group Ingress Ports:
    • 22 - SSH
    • 3000 - Grafana UI
    • 3100 - Loki API
    • 9009 - Mimir API
    • 9999 - Alloy OTLP receiver
    • 12345 - Alloy UI

User Data Script

This script installs Docker and creates the necessary configuration files:

#!/bin/bash
# Update and install Docker
sudo apt-get update -y

# Remove old Docker installations
for pkg in docker.io docker-doc docker-compose podman-docker containerd runc; do 
  sudo apt-get remove -y $pkg
done

# Add Docker's official GPG key
sudo apt-get install -y ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add Docker repository
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# Install Docker
sudo apt-get update -y
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

# Install AWS CLI
sudo apt-get install -y unzip
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install

# Allow current user to manage Docker
sudo usermod -aG docker $USER

Docker Compose Configuration

Create the directory structure:

mkdir -p ~/grafana-stack/{loki,alloy,mimir}
cd ~/grafana-stack

docker-compose.yml

services:
  grafana:
    image: grafana/grafana:11.0.0
    container_name: grafana
    ports:
      - "3000:3000"
    volumes:
      - /mnt/ebs/grafana_storage:/var/lib/grafana
    environment:
      - GF_SECURITY_ALLOW_EMBEDDING=true
      - GF_AUTH_ANONYMOUS_ENABLED=true
    restart: always
    networks:
      - grafana
    logging:
      options:
        max-size: 200m
        max-file: '3'

  loki:
    image: grafana/loki:3.0.0
    container_name: loki
    ports:
      - "3100:3100"
    command: -config.file=/etc/loki/loki-local-config.yaml
    volumes:
      - /mnt/ebs/loki_storage:/tmp/loki
    configs:
      - source: loki-config
        target: /etc/loki/loki-local-config.yaml
    restart: always
    networks:
      - grafana
    logging:
      options:
        max-size: 200m
        max-file: '3'

  alloy:
    image: grafana/alloy:v1.1.1
    container_name: alloy
    ports:
      - "12345:12345"
      - "9999:9999"
    volumes:
      - ./alloy/config.alloy:/etc/alloy/config.alloy
      - /mnt/ebs/alloy_storage:/var/lib/alloy/data
    command: >
      run --server.http.listen-addr=0.0.0.0:12345 
      --storage.path=/var/lib/alloy/data 
      /etc/alloy/config.alloy
    restart: always
    networks:
      - grafana
    logging:
      options:
        max-size: 200m
        max-file: '3'
        
  mimir:
    image: grafana/mimir:2.12.0
    container_name: mimir
    ports:
      - "9009:9009"
    volumes:
      - ./mimir/config.yaml:/etc/mimir/config.yaml
      - /mnt/ebs/mimir_storage:/tmp/mimir
    command: --config.file=/etc/mimir/config.yaml
    restart: always
    networks:
      - grafana
    logging:
      options:
        max-size: 200m
        max-file: '3'

configs:
  loki-config:
    file: ./loki/loki-config.yaml

networks:
  grafana:

Service Configurations

Loki Configuration (loki/loki-config.yaml)

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

common:
  instance_addr: 127.0.0.1
  path_prefix: /tmp/loki
  storage:
    filesystem:
      chunks_directory: /tmp/loki/chunks
      rules_directory: /tmp/loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

query_range:
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        max_size_mb: 100

schema_config:
  configs:
    - from: 2020-10-24
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

compactor:
  working_directory: /tmp/loki/compactor
  retention_enabled: true
  retention_delete_delay: 1h
  compaction_interval: 10m
  delete_request_store: inmemory

limits_config:
  retention_period: 720h  # 30 days

Alloy Configuration (alloy/config.alloy)

Alloy serves as the OpenTelemetry collector, receiving metrics and forwarding them to Mimir:

logging {
  level  = "info"
  format = "logfmt"
}

prometheus.remote_write "default" {
  endpoint {
    url = "http://mimir:9009/api/v1/push"
    basic_auth {
      username = "admin"
      password = "your-secure-password"
    }
  }
}

// OTLP Receiver for metrics
otelcol.receiver.otlp "receiver" {
  http {
    endpoint = "0.0.0.0:9999"
  }
  
  output {
    metrics = [otelcol.processor.batch.default.input]
  }
}

otelcol.processor.batch "default" {
  output {
    metrics = [otelcol.exporter.prometheus.default.input]
  }
}

otelcol.exporter.prometheus "default" {
  forward_to = [prometheus.remote_write.default.receiver]
}

Mimir Configuration (mimir/config.yaml)

multitenancy_enabled: false

blocks_storage:
  backend: filesystem
  bucket_store:
    sync_dir: /tmp/mimir/tsdb-sync
  filesystem:
    dir: /tmp/mimir/data/tsdb
  tsdb:
    dir: /tmp/mimir/tsdb

compactor:
  data_dir: /tmp/mimir/compactor
  sharding_ring:
    kvstore:
      store: memberlist

limits:
  compactor_blocks_retention_period: 180d  # 6 months

distributor:
  ring:
    instance_addr: 127.0.0.1
    kvstore:
      store: memberlist

ingester:
  ring:
    instance_addr: 127.0.0.1
    kvstore:
      store: memberlist
    replication_factor: 1

ruler_storage:
  backend: filesystem
  filesystem:
    dir: /tmp/mimir/rules

server:
  http_listen_port: 9009
  log_level: error

store_gateway:
  sharding_ring:
    replication_factor: 1

Persistent Storage Setup

Mount EBS Volume

# List block devices
lsblk

# Format the volume (only for new volumes - data will be lost!)
yes | sudo mkfs -t ext4 /dev/nvme1n1

# Create mount point
sudo mkdir /mnt/ebs

# Mount the volume
sudo mount /dev/nvme1n1 /mnt/ebs

# Create storage directories
sudo mkdir -p /mnt/ebs/{grafana_storage,loki_storage,mimir_storage,alloy_storage}

Configure Auto-Mount on Reboot

# Get UUID
sudo blkid /dev/nvme1n1
# Output: /dev/nvme1n1: UUID="abc-123-def" TYPE="ext4"

# Add to /etc/fstab
echo 'UUID=abc-123-def /mnt/ebs ext4 defaults,nofail 0 2' | sudo tee -a /etc/fstab

Set Directory Permissions

# Grafana runs as UID 472
sudo chown -R 472:0 /mnt/ebs/grafana_storage
sudo chmod -R 755 /mnt/ebs/grafana_storage

# Loki runs as UID 10001
sudo chown -R 10001:10001 /mnt/ebs/loki_storage
sudo chmod -R 755 /mnt/ebs/loki_storage

# Mimir runs as root
sudo chown -R 0:0 /mnt/ebs/mimir_storage
sudo chmod -R 755 /mnt/ebs/mimir_storage

ECS Integration with Fluent Bit

To send logs from ECS tasks to Loki, add a Fluent Bit sidecar:

Task Definition Updates

Add the log configuration to your main container:

{
  "logConfiguration": {
    "logDriver": "awsfirelens",
    "options": {
      "RemoveKeys": "container_id,source,ecs_task_arn",
      "LineFormat": "key_value",
      "max-size": "200m",
      "max-file": "3",
      "Labels": "{service=\"your-service-name\"}",
      "LabelKeys": "container_name,ecs_task_definition,ecs_cluster",
      "Url": "http://your-grafana-host:3100/loki/api/v1/push",
      "Name": "loki"
    }
  }
}

Add the Fluent Bit log router container:

{
  "name": "log_router",
  "image": "grafana/fluent-bit-plugin-loki:2.9.1",
  "cpu": 0,
  "memoryReservation": 50,
  "essential": true,
  "firelensConfiguration": {
    "type": "fluentbit",
    "options": {
      "enable-ecs-log-metadata": "true"
    }
  }
}

Grafana Data Sources

After deployment, configure these data sources in Grafana:

  1. Loki (for logs):

    • URL: http://loki:3100
  2. Prometheus/Mimir (for metrics):

    • URL: http://mimir:9009/prometheus

Production Considerations

High Availability

For production environments, consider:

  • Running multiple Grafana instances behind a load balancer
  • Using S3 backend for Loki instead of filesystem
  • Deploying Mimir in microservices mode with multiple replicas

Security

  • Enable authentication on Grafana (GF_AUTH_ANONYMOUS_ENABLED=false)
  • Use HTTPS with proper certificates
  • Restrict security group access to known IP ranges
  • Use AWS Secrets Manager for credentials

Monitoring the Monitors

Create alerts for:

  • Disk usage on EBS volume
  • Loki ingestion rate drops
  • Mimir query latency spikes
  • Container health checks

Conclusion

The Grafana LGTM stack provides a powerful, cost-effective observability solution. With persistent EBS storage and proper configuration, you get:

  • 30 days of log retention with Loki
  • 6 months of metrics retention with Mimir
  • Native OpenTelemetry support via Alloy
  • Seamless ECS integration with Fluent Bit

This setup can handle millions of log lines and metrics series while remaining operationally simple compared to managed solutions like CloudWatch or Datadog.

Acknowledgements
  • Srinivas — Due diligence of stack deployment