Running Ghost in production requires more than just high availability - you need a comprehensive backup strategy that ensures business continuity when disasters strike. This post details a production-ready backup implementation using AWS Backup that provides automated backups, monitoring, and rapid recovery capabilities for both your database and content.
Understanding RPO and RTO
Before diving into implementation, it's crucial to understand two key disaster recovery metrics:
-
Recovery Point Objective (RPO): The maximum acceptable amount of data loss measured in time. Our implementation achieves:
- 24-hour RPO with daily backups (standard)
- 1-hour RPO with continuous backups (enhanced)
-
Recovery Time Objective (RTO): The maximum acceptable time to restore service after a disaster. Our target:
- 4-hour RTO for full restoration
These targets balance cost with business requirements for a typical Ghost deployment.
The Three-Tier Backup Strategy
The backup architecture implements a grandfather-father-son approach with lifecycle management:
graph TD
subgraph "Backup Tiers"
Daily[Daily Backups<br/>30 days retention<br/>Warm storage]
Weekly[Weekly Backups<br/>120 days retention<br/>Cold after 30 days]
Monthly[Monthly Backups<br/>365 days retention<br/>Cold after 90 days]
Continuous[Continuous Backups<br/>7 days retention<br/>Hourly snapshots]
end
subgraph "Protected Resources"
Aurora[(Aurora Database<br/>Ghost & ActivityPub)]
EFS[EFS File System<br/>Content & Images]
end
subgraph "Recovery Options"
PITR[Point-in-Time Recovery<br/>Any moment within 7 days]
Snapshot[Snapshot Recovery<br/>Specific backup points]
end
Aurora --> Daily
Aurora --> Weekly
Aurora --> Monthly
Aurora --> Continuous
EFS --> Daily
EFS --> Weekly
EFS --> Monthly
Continuous --> PITR
Daily --> Snapshot
Weekly --> Snapshot
Monthly --> Snapshot
Implementation with AWS CDK
The backup construct implements AWS best practices with comprehensive monitoring and automation:
import { Construct } from 'constructs';
import * as backup from 'aws-cdk-lib/aws-backup';
import * as sns from 'aws-cdk-lib/aws-sns';
import * as cloudwatch from 'aws-cdk-lib/aws-cloudwatch';
export interface GhostBackupProps {
databaseCluster: rds.IDatabaseCluster;
fileSystem: efs.IFileSystem;
alertEmail?: string;
enableVaultLock?: boolean;
enableCrossRegionBackup?: boolean;
crossRegionBackupDestination?: string;
enableRestoreTesting?: boolean;
continuousBackupEnabled?: boolean;
}
export class GhostBackup extends Construct {
constructor(scope: Construct, id: string, props: GhostBackupProps) {
super(scope, id);
// Create SNS topic for backup notifications
if (props.alertEmail) {
this.notificationTopic = new sns.Topic(this, 'NotificationTopic', {
displayName: 'Ghost Backup Notifications',
});
this.notificationTopic.addSubscription(
new snsSubscriptions.EmailSubscription(props.alertEmail),
);
}
// Create backup vault with notifications
this.backupVault = new backup.BackupVault(this, 'Vault', {
backupVaultName: 'ghost-backup-vault',
notificationTopic: this.notificationTopic,
notificationEvents: [
backup.BackupVaultEvents.BACKUP_JOB_FAILED,
backup.BackupVaultEvents.RESTORE_JOB_FAILED,
],
});
}
}
Backup Scheduling and Retention
Each backup tier serves a specific recovery scenario:
Daily Backups
const dailyRule = new backup.BackupPlanRule({
ruleName: 'DailyBackup',
scheduleExpression: events.Schedule.cron({
hour: '3',
minute: '0',
}),
deleteAfter: cdk.Duration.days(30),
});
Daily backups provide operational recovery for recent issues like accidental deletions or corrupted data. The 30-day retention in warm storage ensures fast recovery without cold storage retrieval delays.
Weekly Backups
const weeklyRule = new backup.BackupPlanRule({
ruleName: 'WeeklyBackup',
scheduleExpression: events.Schedule.cron({
weekDay: '1', // Monday
hour: '4',
minute: '0',
}),
deleteAfter: cdk.Duration.days(120),
moveToColdStorageAfter: cdk.Duration.days(30),
});
Weekly backups balance retention with cost, moving to cold storage after 30 days to reduce expenses by up to 90%.
Monthly Backups
const monthlyRule = new backup.BackupPlanRule({
ruleName: 'MonthlyBackup',
scheduleExpression: events.Schedule.cron({
day: '1',
hour: '5',
minute: '0',
}),
deleteAfter: cdk.Duration.days(365),
moveToColdStorageAfter: cdk.Duration.days(90),
});
Monthly backups provide long-term retention for compliance and historical recovery needs.
Continuous Backup for Minimal RPO
For production environments requiring minimal data loss, continuous backup provides hourly snapshots:
if (props.continuousBackupEnabled !== false) {
this.continuousBackupPlan = new backup.BackupPlan(this, 'ContinuousPlan', {
backupPlanName: 'ghost-continuous-backup',
});
this.continuousBackupPlan.addRule(
new backup.BackupPlanRule({
ruleName: 'HourlyBackup',
scheduleExpression: events.Schedule.cron({
minute: '0',
hour: '*', // Every hour
}),
deleteAfter: cdk.Duration.days(7),
enableContinuousBackup: true,
}),
);
// Only database needs continuous backup
this.continuousBackupPlan.addSelection('DatabaseContinuous', {
resources: [
backup.BackupResource.fromRdsDatabaseCluster(props.databaseCluster),
],
});
}
This provides 1-hour RPO with the ability to restore to any point within the last 7 days.
Monitoring and Alerting
Proactive monitoring ensures backup health and rapid issue detection:
// Alarm for backup job failures
const backupFailureAlarm = new cloudwatch.Alarm(this, 'BackupFailureAlarm', {
metric: new cloudwatch.Metric({
namespace: 'AWS/Backup',
metricName: 'NumberOfBackupJobsFailed',
dimensionsMap: {
BackupVaultName: this.backupVault.backupVaultName,
},
}),
threshold: 1,
evaluationPeriods: 1,
alarmDescription: 'Alert when backup jobs fail',
});
// Alarm for missing backups (no success in 25 hours)
const missingBackupAlarm = new cloudwatch.Alarm(this, 'MissingBackupAlarm', {
metric: new cloudwatch.Metric({
namespace: 'AWS/Backup',
metricName: 'NumberOfBackupJobsCompleted',
period: cdk.Duration.hours(25),
}),
threshold: 1,
comparisonOperator: cloudwatch.ComparisonOperator.LESS_THAN_THRESHOLD,
alarmDescription: 'Alert when daily backups are missed',
});
Vault Lock for Compliance
For production environments requiring regulatory compliance, vault lock provides Write-Once-Read-Many (WORM) protection:
if (props.enableVaultLock) {
const cfnVault = this.backupVault.node.defaultChild as backup.CfnBackupVault;
cfnVault.lockConfiguration = {
minRetentionDays: 7,
maxRetentionDays: 365,
};
}
Once enabled, vault lock prevents deletion or modification of backups, protecting against accidental or malicious data loss.
Restore Procedures
Database Restoration
To restore the Aurora database from a backup:
# List available recovery points
aws backup list-recovery-points-by-backup-vault \
--backup-vault-name ghost-backup-vault \
--by-resource-type RDS
# Initiate restore job
aws backup start-restore-job \
--recovery-point-arn "arn:aws:backup:..." \
--iam-role-arn "arn:aws:iam::..." \
--metadata "DBClusterIdentifier=ghost-restored"
EFS Restoration
To restore the EFS file system:
# List EFS recovery points
aws backup list-recovery-points-by-backup-vault \
--backup-vault-name ghost-backup-vault \
--by-resource-type EFS
# Create restore job
aws backup start-restore-job \
--recovery-point-arn "arn:aws:backup:..." \
--iam-role-arn "arn:aws:iam::..." \
--metadata "file-system-id=fs-restored,Encrypted=true"
Point-in-Time Recovery
For continuous backups, restore to a specific moment:
# Restore to specific time (within 7-day window)
aws rds restore-db-cluster-to-point-in-time \
--source-db-cluster-identifier ghost-database \
--db-cluster-identifier ghost-database-pitr \
--restore-to-time 2024-09-25T10:30:00.000Z
Restore Test Procedure (Quarterly Recommended)
-
Select Test Recovery Point
- Choose a recent backup from each tier
- Document selection for audit trail
-
Restore to Test Environment
- Create isolated VPC for testing
- Restore both database and EFS
- Measure restoration time
-
Validate Data Integrity
- Verify database consistency
- Check file system contents
- Test application functionality
-
Document Results
- Record actual RTO achieved
- Note any issues encountered
- Update procedures as needed
Deployment
Deploy the backup configuration:
# Set environment variables
export ALERT_EMAIL="your-email@example.com"
export ENABLE_CONTINUOUS_BACKUP=true
export ENABLE_VAULT_LOCK=false # Set true for production
# Deploy the stack
npm run cdk deploy GhostStack
# Verify backup plans
aws backup list-backup-plans
# Output:
# - ghost-backup-plan (ID: 6b1545ad-fcce-4705-9f29-7c82fa6a8c95)
# - ghost-continuous-backup (ID: 79123ed6-efd3-434d-8acf-92217fa33802)
# Check backup jobs
aws backup list-backup-jobs \
--by-backup-vault-name ghost-backup-vault
# Shows 72 existing recovery points from production usage
Best Practices
- Regular Testing: Perform quarterly restore tests to validate RTO (manual process)
- Monitor Actively: Set up SNS notifications for all backup events
- Document Procedures: Maintain runbooks for various recovery scenarios
- Audit Compliance: Review backup logs and metrics monthly
- Update Retention: Adjust policies based on actual recovery needs
- Consider DR Region: Enable cross-region backup for critical data (requires manual configuration in target region)
Conclusion
This production-ready backup implementation provides comprehensive disaster recovery capabilities for Ghost on AWS. With automated backups, intelligent lifecycle management, proactive monitoring, and documented restore procedures, you can confidently maintain business continuity while optimizing costs.
The system achieves a 24-hour RPO (1-hour with continuous backup) and 4-hour RTO target, suitable for most Ghost deployments. Regular testing and monitoring ensure these targets remain achievable as your deployment grows.
Remember: backups are only valuable if you can restore from them. Test regularly, monitor actively, and document thoroughly to ensure rapid recovery when disasters strike.