MongoDB Compact Operation Guide

TL;DR: Running MongoDB compact operations to reclaim disk space and improve performance

Overview

MongoDB’s compact command rewrites and defragments collection data and indexes, reclaiming disk space from deleted or updated documents.

When to Run Compact

  • After bulk delete operations
  • When disk usage is significantly higher than actual data size
  • When query performance has degraded
  • Before increasing disk allocation

Prerequisites

1. Disk Space Assessment

Current Status:

  • Database size: 102.5 GB
  • Disk capacity: 116 GB
  • Available space: 13.5 GB (insufficient)

Requirement: The compact operation needs temporary space for data rewriting. Calculate:

Required space = Collection size × (1 + fragmentation ratio + safety margin)

Recommendation: Increase disk to at least 150 GB before proceeding.

2. Backup

Always create a backup before running compact:

# Using mongodump
mongodump --uri="mongodb://user:pass@host:port/db_name" \
  --out=/backup/pre-compact-$(date +%Y%m%d)

# Or create a snapshot if using cloud provider

3. Trial Run in Non-Production

Execute the compact operation in a development/staging environment first and document:

  • Duration
  • Resource utilization
  • Any errors encountered

Running Compact

Basic Syntax

// Connect to MongoDB
use target_database

// Compact a collection
db.runCommand({ compact: "collection_name" })

With Force Option

// Force compact even if node is primary (use cautiously)
db.runCommand({ 
    compact: "collection_name",
    force: true 
})

Compact All Collections

// Get all collection names and compact each
db.getCollectionNames().forEach(function(collName) {
    print("Compacting: " + collName);
    printjson(db.runCommand({ compact: collName }));
});

Monitoring During Compact

Watch Progress

// Check current operations
db.currentOp({ "command.compact": { $exists: true } })

Monitor Disk Usage

# Watch disk space during operation
watch -n 5 'df -h /data/db'

Check Collection Stats

// Before compact
db.collection_name.stats()

// Note: dataSize, storageSize, and indexSizes

Post-Compact Verification

Compare Storage Statistics

// Check improved storage utilization
var stats = db.collection_name.stats();

print("Data Size: " + (stats.size / 1024 / 1024).toFixed(2) + " MB");
print("Storage Size: " + (stats.storageSize / 1024 / 1024).toFixed(2) + " MB");
print("Index Size: " + (stats.totalIndexSize / 1024 / 1024).toFixed(2) + " MB");

Verify Query Performance

Run representative queries and compare execution times:

// Example query with explain
db.collection_name.find({ 
    status: "active",
    created_at: { $gte: ISODate("2023-01-01") }
}).explain("executionStats")

Impact Considerations

What Happens During Compact

AspectImpact
Collection AccessCollection locked (not available for reads/writes)
Replica SetOther nodes remain available
Disk I/OHigh disk activity
DurationProportional to collection size

Best Practices

  1. Schedule During Low Traffic

    • Run during maintenance windows
    • Notify stakeholders of potential latency
  2. Run on Secondaries First

    // On secondary node
    rs.secondaryOk()
    db.runCommand({ compact: "collection_name" })
    
  3. Monitor Replication Lag

    rs.printSlaveReplicationInfo()
    

Automation Script

#!/bin/bash
# compact_collection.sh

MONGO_URI="mongodb://user:pass@host:port/dbname"
COLLECTION=$1
LOG_FILE="/var/log/mongo-compact-$(date +%Y%m%d).log"

echo "Starting compact for $COLLECTION at $(date)" >> $LOG_FILE

# Get pre-compact stats
mongosh "$MONGO_URI" --eval "JSON.stringify(db.$COLLECTION.stats())" >> $LOG_FILE

# Run compact
mongosh "$MONGO_URI" --eval "db.runCommand({compact: '$COLLECTION'})" >> $LOG_FILE

# Get post-compact stats
mongosh "$MONGO_URI" --eval "JSON.stringify(db.$COLLECTION.stats())" >> $LOG_FILE

echo "Completed compact for $COLLECTION at $(date)" >> $LOG_FILE

Alternative: Rolling Compact for Replica Sets

For minimal downtime, perform rolling compaction:

  1. Compact each secondary node
  2. Step down primary
  3. Compact former primary (now secondary)
// Step down primary (on primary node)
rs.stepDown(300)  // 5-minute stepdown

Troubleshooting

”Not enough disk space”

# Check actual disk usage
du -sh /data/db/*

# Temporarily clean up logs or old backups

Operation Running Too Long

// Check if compact is still running
db.currentOp({"msg": /compact/})

// If needed, kill the operation
db.killOp(<opid>)

Replication Lag After Compact

Monitor and wait for secondaries to catch up before proceeding to next node:

// Check replication lag
rs.printSecondaryReplicationInfo()