Ghost on AWS: Backup and Disaster Recovery

Running Ghost in production requires more than just high availability - you need a comprehensive backup strategy that ensures business continuity when disasters strike. This post details a production-ready backup implementation using AWS Backup that provides automated backups, monitoring, and rapid recovery capabilities for both your database and content.

Understanding RPO and RTO

Before diving into implementation, it's crucial to understand two key disaster recovery metrics:

Recovery Point Objective (RPO): The maximum acceptable amount of data loss measured in time. Our implementation achieves:
- 24-hour RPO with daily backups (standard)
- 1-hour RPO with continuous backups (enhanced)
Recovery Time Objective (RTO): The maximum acceptable time to restore service after a disaster. Our target:
- 4-hour RTO for full restoration

These targets balance cost with business requirements for a typical Ghost deployment.

The Three-Tier Backup Strategy

The backup architecture implements a grandfather-father-son approach with lifecycle management:

graph TD
    subgraph "Backup Tiers"
        Daily[Daily Backups<br/>30 days retention<br/>Warm storage]
        Weekly[Weekly Backups<br/>120 days retention<br/>Cold after 30 days]
        Monthly[Monthly Backups<br/>365 days retention<br/>Cold after 90 days]
        Continuous[Continuous Backups<br/>7 days retention<br/>Hourly snapshots]
    end

    subgraph "Protected Resources"
        Aurora[(Aurora Database<br/>Ghost & ActivityPub)]
        EFS[EFS File System<br/>Content & Images]
    end

    subgraph "Recovery Options"
        PITR[Point-in-Time Recovery<br/>Any moment within 7 days]
        Snapshot[Snapshot Recovery<br/>Specific backup points]
    end

    Aurora --> Daily
    Aurora --> Weekly
    Aurora --> Monthly
    Aurora --> Continuous
    EFS --> Daily
    EFS --> Weekly
    EFS --> Monthly

    Continuous --> PITR
    Daily --> Snapshot
    Weekly --> Snapshot
    Monthly --> Snapshot

Implementation with AWS CDK

The backup construct implements AWS best practices with comprehensive monitoring and automation:

import { Construct } from 'constructs';
import * as backup from 'aws-cdk-lib/aws-backup';
import * as sns from 'aws-cdk-lib/aws-sns';
import * as cloudwatch from 'aws-cdk-lib/aws-cloudwatch';

export interface GhostBackupProps {
  databaseCluster: rds.IDatabaseCluster;
  fileSystem: efs.IFileSystem;
  alertEmail?: string;
  enableVaultLock?: boolean;
  enableCrossRegionBackup?: boolean;
  crossRegionBackupDestination?: string;
  enableRestoreTesting?: boolean;
  continuousBackupEnabled?: boolean;
}

export class GhostBackup extends Construct {
  constructor(scope: Construct, id: string, props: GhostBackupProps) {
    super(scope, id);

    // Create SNS topic for backup notifications
    if (props.alertEmail) {
      this.notificationTopic = new sns.Topic(this, 'NotificationTopic', {
        displayName: 'Ghost Backup Notifications',
      });

      this.notificationTopic.addSubscription(
        new snsSubscriptions.EmailSubscription(props.alertEmail),
      );
    }

    // Create backup vault with notifications
    this.backupVault = new backup.BackupVault(this, 'Vault', {
      backupVaultName: 'ghost-backup-vault',
      notificationTopic: this.notificationTopic,
      notificationEvents: [
        backup.BackupVaultEvents.BACKUP_JOB_FAILED,
        backup.BackupVaultEvents.RESTORE_JOB_FAILED,
      ],
    });
  }
}

Backup Scheduling and Retention

Each backup tier serves a specific recovery scenario:

Daily Backups

const dailyRule = new backup.BackupPlanRule({
  ruleName: 'DailyBackup',
  scheduleExpression: events.Schedule.cron({
    hour: '3',
    minute: '0',
  }),
  deleteAfter: cdk.Duration.days(30),
});

Daily backups provide operational recovery for recent issues like accidental deletions or corrupted data. The 30-day retention in warm storage ensures fast recovery without cold storage retrieval delays.

Weekly Backups

const weeklyRule = new backup.BackupPlanRule({
  ruleName: 'WeeklyBackup',
  scheduleExpression: events.Schedule.cron({
    weekDay: '1', // Monday
    hour: '4',
    minute: '0',
  }),
  deleteAfter: cdk.Duration.days(120),
  moveToColdStorageAfter: cdk.Duration.days(30),
});

Weekly backups balance retention with cost, moving to cold storage after 30 days to reduce expenses by up to 90%.

Monthly Backups

const monthlyRule = new backup.BackupPlanRule({
  ruleName: 'MonthlyBackup',
  scheduleExpression: events.Schedule.cron({
    day: '1',
    hour: '5',
    minute: '0',
  }),
  deleteAfter: cdk.Duration.days(365),
  moveToColdStorageAfter: cdk.Duration.days(90),
});

Monthly backups provide long-term retention for compliance and historical recovery needs.

Continuous Backup for Minimal RPO

For production environments requiring minimal data loss, continuous backup provides hourly snapshots:

if (props.continuousBackupEnabled !== false) {
  this.continuousBackupPlan = new backup.BackupPlan(this, 'ContinuousPlan', {
    backupPlanName: 'ghost-continuous-backup',
  });

  this.continuousBackupPlan.addRule(
    new backup.BackupPlanRule({
      ruleName: 'HourlyBackup',
      scheduleExpression: events.Schedule.cron({
        minute: '0',
        hour: '*', // Every hour
      }),
      deleteAfter: cdk.Duration.days(7),
      enableContinuousBackup: true,
    }),
  );

  // Only database needs continuous backup
  this.continuousBackupPlan.addSelection('DatabaseContinuous', {
    resources: [
      backup.BackupResource.fromRdsDatabaseCluster(props.databaseCluster),
    ],
  });
}

This provides 1-hour RPO with the ability to restore to any point within the last 7 days.

Monitoring and Alerting

Proactive monitoring ensures backup health and rapid issue detection:

// Alarm for backup job failures
const backupFailureAlarm = new cloudwatch.Alarm(this, 'BackupFailureAlarm', {
  metric: new cloudwatch.Metric({
    namespace: 'AWS/Backup',
    metricName: 'NumberOfBackupJobsFailed',
    dimensionsMap: {
      BackupVaultName: this.backupVault.backupVaultName,
    },
  }),
  threshold: 1,
  evaluationPeriods: 1,
  alarmDescription: 'Alert when backup jobs fail',
});

// Alarm for missing backups (no success in 25 hours)
const missingBackupAlarm = new cloudwatch.Alarm(this, 'MissingBackupAlarm', {
  metric: new cloudwatch.Metric({
    namespace: 'AWS/Backup',
    metricName: 'NumberOfBackupJobsCompleted',
    period: cdk.Duration.hours(25),
  }),
  threshold: 1,
  comparisonOperator: cloudwatch.ComparisonOperator.LESS_THAN_THRESHOLD,
  alarmDescription: 'Alert when daily backups are missed',
});

Vault Lock for Compliance

For production environments requiring regulatory compliance, vault lock provides Write-Once-Read-Many (WORM) protection:

if (props.enableVaultLock) {
  const cfnVault = this.backupVault.node.defaultChild as backup.CfnBackupVault;
  cfnVault.lockConfiguration = {
    minRetentionDays: 7,
    maxRetentionDays: 365,
  };
}

Once enabled, vault lock prevents deletion or modification of backups, protecting against accidental or malicious data loss.

Restore Procedures

Database Restoration

To restore the Aurora database from a backup:

# List available recovery points
aws backup list-recovery-points-by-backup-vault \
  --backup-vault-name ghost-backup-vault \
  --by-resource-type RDS

# Initiate restore job
aws backup start-restore-job \
  --recovery-point-arn "arn:aws:backup:..." \
  --iam-role-arn "arn:aws:iam::..." \
  --metadata "DBClusterIdentifier=ghost-restored"

EFS Restoration

To restore the EFS file system:

# List EFS recovery points
aws backup list-recovery-points-by-backup-vault \
  --backup-vault-name ghost-backup-vault \
  --by-resource-type EFS

# Create restore job
aws backup start-restore-job \
  --recovery-point-arn "arn:aws:backup:..." \
  --iam-role-arn "arn:aws:iam::..." \
  --metadata "file-system-id=fs-restored,Encrypted=true"

Point-in-Time Recovery

For continuous backups, restore to a specific moment:

# Restore to specific time (within 7-day window)
aws rds restore-db-cluster-to-point-in-time \
  --source-db-cluster-identifier ghost-database \
  --db-cluster-identifier ghost-database-pitr \
  --restore-to-time 2024-09-25T10:30:00.000Z

Restore Test Procedure (Quarterly Recommended)

Select Test Recovery Point
- Choose a recent backup from each tier
- Document selection for audit trail
Restore to Test Environment
- Create isolated VPC for testing
- Restore both database and EFS
- Measure restoration time
Validate Data Integrity
- Verify database consistency
- Check file system contents
- Test application functionality
Document Results
- Record actual RTO achieved
- Note any issues encountered
- Update procedures as needed

Deployment

Deploy the backup configuration:

# Set environment variables
export ALERT_EMAIL="your-email@example.com"
export ENABLE_CONTINUOUS_BACKUP=true
export ENABLE_VAULT_LOCK=false  # Set true for production

# Deploy the stack
npm run cdk deploy GhostStack

# Verify backup plans
aws backup list-backup-plans
# Output:
# - ghost-backup-plan (ID: 6b1545ad-fcce-4705-9f29-7c82fa6a8c95)
# - ghost-continuous-backup (ID: 79123ed6-efd3-434d-8acf-92217fa33802)

# Check backup jobs
aws backup list-backup-jobs \
  --by-backup-vault-name ghost-backup-vault
# Shows 72 existing recovery points from production usage

Best Practices

Regular Testing: Perform quarterly restore tests to validate RTO (manual process)
Monitor Actively: Set up SNS notifications for all backup events
Document Procedures: Maintain runbooks for various recovery scenarios
Audit Compliance: Review backup logs and metrics monthly
Update Retention: Adjust policies based on actual recovery needs
Consider DR Region: Enable cross-region backup for critical data (requires manual configuration in target region)

Conclusion

This production-ready backup implementation provides comprehensive disaster recovery capabilities for Ghost on AWS. With automated backups, intelligent lifecycle management, proactive monitoring, and documented restore procedures, you can confidently maintain business continuity while optimizing costs.

The system achieves a 24-hour RPO (1-hour with continuous backup) and 4-hour RTO target, suitable for most Ghost deployments. Regular testing and monitoring ensure these targets remain achievable as your deployment grows.

Remember: backups are only valuable if you can restore from them. Test regularly, monitor actively, and document thoroughly to ensure rapid recovery when disasters strike.