Running Ghost on AWS requires visibility into application performance, system health, and user experience. Without proper monitoring, you're flying blind - unable to detect issues before users complain or understand what went wrong when problems occur. This post details a comprehensive monitoring setup that provides complete operational visibility at a reasonable cost.
The monitoring infrastructure we've built provides comprehensive visibility with seven operational alarms plus intelligent deployment suppression, dashboard widgets showing real-time metrics, and automated Log Insights queries for rapid troubleshooting. The system is designed to detect issues like database connection spikes, memory leaks, and potential DDoS attempts before they impact users.
The Monitoring Architecture
The monitoring system consists of CloudWatch dashboards displaying real-time metrics, SNS topics delivering email alerts, and Log Insights queries for troubleshooting. Every component generates metrics that flow into a unified dashboard providing a single pane of glass for operations.
graph TB
subgraph Ghost Infrastructure
ECS[ECS Fargate<br/>Ghost + Nginx]
ALB[Application<br/>Load Balancer]
RDS[(Aurora<br/>Serverless)]
WAF[AWS WAF]
end
subgraph Monitoring Layer
CW[CloudWatch<br/>Metrics]
Logs[CloudWatch<br/>Logs]
Insights[Container<br/>Insights]
end
subgraph Alerting
Alarms[CloudWatch<br/>Alarms]
SNS[SNS Topics]
Email[Email<br/>Notifications]
end
subgraph Visualization
Dashboard[CloudWatch<br/>Dashboard]
Queries[Log Insights<br/>Queries]
end
ECS --> CW
ECS --> Logs
ECS --> Insights
ALB --> CW
RDS --> CW
WAF --> CW
CW --> Dashboard
Logs --> Queries
CW --> Alarms
Alarms --> SNS
SNS --> Email
Dashboard --> Ops[Operations Team]
Email --> Ops
Each layer serves a specific purpose. The infrastructure layer generates metrics and logs. CloudWatch aggregates these into actionable insights. Alarms detect anomalies and trigger notifications. Dashboards provide visual confirmation and historical context.
CloudWatch Dashboard Setup
The CDK creates a comprehensive dashboard with six widget groups monitoring different aspects of the system. The dashboard auto-refreshes every minute during incidents and hourly during normal operations.
import * as cloudwatch from 'aws-cdk-lib/aws-cloudwatch';
export class GhostMonitoring extends Construct {
constructor(scope: Construct, id: string, props: GhostMonitoringProps) {
super(scope, id);
// Create unified monitoring dashboard
this.dashboard = new cloudwatch.Dashboard(this, 'Dashboard', {
dashboardName: `ghost-cms-${props.domainName.replace('.', '-')}`,
defaultInterval: cdk.Duration.hours(1),
});
// ECS Service Metrics - CPU, Memory, Task Count
const ecsMetricsWidget = new cloudwatch.GraphWidget({
title: 'ECS Service Metrics',
left: [
new cloudwatch.Metric({
namespace: 'AWS/ECS',
metricName: 'CPUUtilization',
dimensionsMap: {
ClusterName: props.cluster.clusterName,
ServiceName: props.service.serviceName,
},
statistic: 'Average',
label: 'CPU Utilization',
}),
new cloudwatch.Metric({
namespace: 'AWS/ECS',
metricName: 'MemoryUtilization',
dimensionsMap: {
ClusterName: props.cluster.clusterName,
ServiceName: props.service.serviceName,
},
statistic: 'Average',
label: 'Memory Utilization',
}),
],
right: [
new cloudwatch.Metric({
namespace: 'AWS/ECS',
metricName: 'RunningTaskCount',
dimensionsMap: {
ClusterName: props.cluster.clusterName,
ServiceName: props.service.serviceName,
},
statistic: 'Average',
label: 'Running Tasks',
}),
],
});
The dashboard displays ECS metrics showing container resource utilization, ALB metrics tracking request patterns and response times, target health indicating container availability, HTTP status code distribution revealing error rates, database metrics monitoring connections and capacity, and performance metrics tracking query latency.
Alerting Strategy
The alerting system uses seven operational alarms covering the most critical failure modes, plus a composite alarm that intelligently suppresses false positives during deployments.
// Create SNS topic for all alarms
this.alarmTopic = new sns.Topic(this, 'AlarmTopic', {
displayName: 'Ghost CMS Alarms',
});
// Add email subscription for immediate notification
if (props.alertEmail) {
this.alarmTopic.addSubscription(
new snsSubscriptions.EmailSubscription(props.alertEmail),
);
}
// High CPU Alarm - indicates scaling need or runaway process
const cpuAlarm = new cloudwatch.Alarm(this, 'HighCpuAlarm', {
metric: new cloudwatch.Metric({
namespace: 'AWS/ECS',
metricName: 'CPUUtilization',
dimensionsMap: {
ClusterName: props.cluster.clusterName,
ServiceName: props.service.serviceName,
},
statistic: 'Average',
}),
threshold: 80,
evaluationPeriods: 2,
datapointsToAlarm: 2,
treatMissingData: cloudwatch.TreatMissingData.NOT_BREACHING,
alarmDescription: 'Alarm when CPU exceeds 80%',
});
cpuAlarm.addAlarmAction(new cloudwatchActions.SnsAction(this.alarmTopic));
The operational alarms deployed cover:
- CPU utilization over 80% - indicates scaling needs or runaway processes
- Memory utilization over 80% - suggests memory leaks in themes or plugins
- Unhealthy targets (composite) - alerts only for sustained issues, not deployments
- 5xx errors exceeding 10 in 5 minutes - reveals application issues
- Response times over 2 seconds - indicates performance degradation
- Database connections over 40 - prevents connection exhaustion
Intelligent Deployment Suppression
A key innovation in our monitoring is the composite alarm pattern that prevents false alerts during deployments:
// Extract correct dimensions from ARNs for CloudWatch metrics
const targetGroupFullName = cdk.Fn.select(
5,
cdk.Fn.split(':', props.targetGroup.targetGroupArn),
);
const arnParts = cdk.Fn.split('/', props.loadBalancer.loadBalancerArn);
const loadBalancerFullName = cdk.Fn.join('/', [
cdk.Fn.select(1, arnParts),
cdk.Fn.select(2, arnParts),
cdk.Fn.select(3, arnParts),
]);
// Base unhealthy alarm (no direct SNS action)
const unhealthyAlarm = new cloudwatch.Alarm(this, 'UnhealthyHostsAlarm', {
metric: unhealthyMetric,
threshold: 1,
evaluationPeriods: 3,
datapointsToAlarm: 3,
treatMissingData: cloudwatch.TreatMissingData.NOT_BREACHING,
});
// Deployment detector alarm
const deploymentDetector = new cloudwatch.Alarm(this, 'DeploymentDetector', {
metric: new cloudwatch.Metric({
namespace: 'AWS/ECS',
metricName: 'RunningTaskCount',
dimensionsMap: {
ClusterName: props.cluster.clusterName,
ServiceName: props.service.serviceName,
},
}),
threshold: 2,
comparisonOperator:
cloudwatch.ComparisonOperator.GREATER_THAN_OR_EQUAL_TO_THRESHOLD,
});
// Composite alarm: only alert when unhealthy AND not deploying
const unhealthyNotDeployingAlarm = new cloudwatch.CompositeAlarm(
this,
'UnhealthyNotDeploying',
{
compositeAlarmName: `ghost-unhealthy-not-deploying-${props.domainName.replace(
'.',
'-',
)}`,
alarmRule: cloudwatch.AlarmRule.allOf(
cloudwatch.AlarmRule.fromAlarm(
unhealthyAlarm,
cloudwatch.AlarmState.ALARM,
),
cloudwatch.AlarmRule.not(
cloudwatch.AlarmRule.fromAlarm(
deploymentDetector,
cloudwatch.AlarmState.ALARM,
),
),
),
},
);
unhealthyNotDeployingAlarm.addAlarmAction(
new cloudwatchActions.SnsAction(this.alarmTopic),
);
This pattern ensures you're only alerted for real issues, not normal deployment transitions.
Multi-Layered Health Check Architecture
The Ghost on AWS setup employs a sophisticated multi-layered health check system that ensures high availability and enables zero-downtime deployments. Understanding these different layers is crucial for maintaining a resilient production environment.
Three Levels of Health Monitoring
The architecture implements health checks at three distinct levels, each serving a specific purpose in maintaining service reliability:
graph TD
subgraph "External Layer"
ALB[Application Load Balancer]
TG[Target Group Health Check]
end
subgraph "Container Layer"
Nginx[Nginx Container<br/>Port 80]
Ghost[Ghost Container<br/>Port 2368]
ActivityPub[ActivityPub Container<br/>Port 8080]
end
subgraph "Application Layer"
HealthEndpoint[/health endpoint<br/>Returns 200 OK]
GhostAPI[Ghost Admin API<br/>/ghost/api/admin/site/]
end
ALB --> TG
TG --> Nginx
Nginx --> HealthEndpoint
Ghost --> GhostAPI
1. ALB Target Group Health Checks
The Application Load Balancer continuously monitors container health through target group health checks. These checks determine whether traffic should be routed to a container:
// Configure health check for Nginx
this.service.targetGroup.configureHealthCheck({
path: '/health',
healthyHttpCodes: '200',
interval: cdk.Duration.seconds(30),
timeout: cdk.Duration.seconds(5),
healthyThresholdCount: 2,
unhealthyThresholdCount: 3,
});
Key configuration details:
- Health check path:
/health- A simple endpoint in Nginx that returns 200 OK - Check frequency: Every 30 seconds
- Timeout: 5 seconds per check
- Healthy threshold: 2 consecutive successful checks to mark healthy
- Unhealthy threshold: 3 consecutive failures to mark unhealthy
- Deregistration delay: 30 seconds before removing unhealthy targets
2. Container-Level Configuration
While Docker supports HEALTHCHECK instructions, the current implementation relies on ALB health checks rather than container-level health checks. This design choice simplifies the architecture while maintaining reliability:
// Nginx container serves as the health check responder
const nginxContainer = taskDefinition.addContainer('NginxContainer', {
image: ecs.ContainerImage.fromAsset('containers/nginx-proxy'),
portMappings: [
{
containerPort: 80,
protocol: ecs.Protocol.TCP,
},
],
essential: true, // Container must be running for task to be healthy
});
The Nginx configuration includes a dedicated health endpoint:
# Health check endpoint for ALB
location /health {
access_log off;
return 200 "healthy\n";
add_header Content-Type text/plain;
}
This approach provides several benefits:
- Fast response times: Simple 200 response without backend calls
- Reduced load: No database queries or application logic
- Clear separation: Health checks don't impact application performance
3. Deployment Health Management
The ECS service configuration includes sophisticated deployment controls that work with health checks to ensure safe rollouts:
// Create Fargate service with health check grace period
this.service = new ecsPatterns.ApplicationLoadBalancedFargateService(
this,
'Service',
{
cluster: this.cluster,
taskDefinition: taskDefinition,
desiredCount: 1,
healthCheckGracePeriod: cdk.Duration.minutes(5),
enableExecuteCommand: true,
},
);
// Configure deployment circuit breaker with automatic rollback
const cfnService = this.service.service.node.defaultChild as ecs.CfnService;
cfnService.deploymentConfiguration = {
minimumHealthyPercent: 100, // Never go below desired count
maximumPercent: 200, // Can temporarily double containers
deploymentCircuitBreaker: {
enable: true,
rollback: true, // Auto-rollback on failure
},
};
Health Check Flow During Deployments
Understanding how health checks interact during deployments is crucial for zero-downtime updates:
- New task starts: ECS launches new container with updated image
- Grace period: 5-minute window where health checks are ignored
- Initial checks: After grace period, ALB begins health checks
- Healthy threshold: Container must pass 2 consecutive checks (1 minute)
- Traffic routing: ALB begins routing traffic to new container
- Old task draining: Existing connections complete gracefully
- Circuit breaker: If new tasks fail repeatedly, automatic rollback occurs
Why Nginx as the Health Check Target?
The architecture uses Nginx as the primary health check target rather than Ghost directly for several reasons:
- Proxy readiness: Ensures the entire request path is functional
- Fast response: No application processing required
- Isolation: Health checks don't impact Ghost performance
- Multiple backends: Can verify connectivity to Ghost, ActivityPub, and analytics
Container Dependencies and Startup Order
For complex deployments with multiple containers, proper dependency management ensures healthy startup:
// ActivityPub depends on database migration completion
activityPubContainer.addContainerDependencies({
container: initContainer,
condition: ecs.ContainerDependencyCondition.SUCCESS,
});
// Ghost waits for ActivityPub initialization if enabled
ghostContainer.addContainerDependencies({
container: initContainer,
condition: ecs.ContainerDependencyCondition.SUCCESS,
});
This ensures services start in the correct order and are fully initialized before receiving traffic.
Best Practices for Production Health Checks
Based on this implementation, here are key recommendations:
- Use simple health endpoints: Avoid complex logic that could fail
- Set appropriate grace periods: Allow time for initialization
- Configure circuit breakers: Enable automatic rollback for safety
- Monitor all layers: Track both ALB and application metrics
- Test deployment scenarios: Verify health checks during updates
The multi-layered health check architecture provides robust monitoring while minimizing false positives and ensuring smooth deployments. This design has proven effective for maintaining high availability in production Ghost deployments.
Container Logging Architecture
Each container type has its own log group with specific retention periods and structured logging formats. This separation allows targeted troubleshooting and cost optimization.
// Ghost container logs with structured JSON format
const ghostLogGroup = new logs.LogGroup(this, 'GhostLogs', {
logGroupName: '/ecs/ghost',
retention: logs.RetentionDays.ONE_WEEK,
removalPolicy: cdk.RemovalPolicy.DESTROY,
});
// Nginx sidecar logs for request analysis
const nginxLogGroup = new logs.LogGroup(this, 'NginxLogs', {
logGroupName: '/ecs/ghost/nginx',
retention: logs.RetentionDays.THREE_DAYS,
removalPolicy: cdk.RemovalPolicy.DESTROY,
});
// Container configuration with CloudWatch logging
const ghostContainer = taskDefinition.addContainer('ghost', {
image: ecs.ContainerImage.fromRegistry('ghost:5-alpine'),
logging: ecs.LogDrivers.awsLogs({
streamPrefix: 'ghost',
logGroup: ghostLogGroup,
}),
environment: {
NODE_ENV: 'production',
logging__level: 'info',
logging__transports: '["stdout"]',
},
});
The logging configuration includes Ghost application logs with one-week retention for debugging, Nginx access logs with three-day retention for traffic analysis, ActivityPub federation logs for troubleshooting federation issues, and init container logs capturing startup and migration output.
Log Insights Queries
Pre-configured queries enable rapid troubleshooting of common issues. These queries search across all container logs to identify patterns and correlate events.
// Pre-configured query for finding errors
const errorLogQuery = new logs.QueryDefinition(this, 'ErrorLogQuery', {
queryDefinitionName: 'Ghost-Errors',
queryString: new logs.QueryString({
fields: ['@timestamp', '@message'],
filter: '@message like /ERROR/',
sort: '@timestamp desc',
limit: 100,
}),
logGroups: [ghostLogGroup],
});
// Query for slow database queries
const slowQueryLog = new logs.QueryDefinition(this, 'SlowQueries', {
queryDefinitionName: 'Ghost-Slow-DB-Queries',
queryString: new logs.QueryString({
fields: ['@timestamp', 'query', 'duration'],
filter: 'duration > 1000',
sort: 'duration desc',
limit: 50,
}),
logGroups: [ghostLogGroup],
});
Additional queries identify failed login attempts for security monitoring, track newsletter send progress, monitor image upload failures, and analyze traffic patterns from Nginx logs.
Auto-scaling Configuration
The monitoring metrics drive auto-scaling decisions, ensuring the application scales smoothly under load while controlling costs during quiet periods.
const scaling = service.autoScaleTaskCount({
minCapacity: 1,
maxCapacity: 3,
});
// Scale on CPU utilization
scaling.scaleOnCpuUtilization('CpuScaling', {
targetUtilizationPercent: 70,
scaleInCooldown: cdk.Duration.minutes(5),
scaleOutCooldown: cdk.Duration.minutes(1),
});
// Scale on memory utilization
scaling.scaleOnMemoryUtilization('MemoryScaling', {
targetUtilizationPercent: 70,
scaleInCooldown: cdk.Duration.minutes(5),
scaleOutCooldown: cdk.Duration.minutes(1),
});
The scaling configuration maintains 70% target utilization for both CPU and memory, scales out quickly in one minute to handle traffic spikes, scales in slowly after five minutes to avoid flapping, and supports one to three container instances based on load.
Note: Auto-scaling creates its own control alarms (separate from monitoring alarms) that will show as ALARM when utilization is low - this is normal behavior as they signal the auto-scaler not to add more capacity.
Health Check Configuration
Health checks ensure only healthy containers receive traffic, with circuit breaker protection for safe deployments.
const targetGroup = new elbv2.ApplicationTargetGroup(this, 'TargetGroup', {
vpc: props.vpc,
port: 80,
protocol: elbv2.ApplicationProtocol.HTTP,
targetType: elbv2.TargetType.IP,
healthCheck: {
enabled: true,
path: '/ghost/api/admin/site/',
protocol: elbv2.Protocol.HTTP,
healthyHttpCodes: '200',
interval: cdk.Duration.seconds(30),
timeout: cdk.Duration.seconds(5),
healthyThresholdCount: 2,
unhealthyThresholdCount: 3,
},
deregistrationDelay: cdk.Duration.seconds(30),
});
// ECS service with circuit breaker
const service = new ecs.FargateService(this, 'GhostService', {
cluster: props.cluster,
taskDefinition,
desiredCount: 1,
assignPublicIp: false,
circuitBreaker: { rollback: true },
healthCheckGracePeriod: cdk.Duration.minutes(5),
});
Health checks verify the Ghost admin API every 30 seconds, require two consecutive successful checks for healthy status, mark containers unhealthy after three failures, and automatically roll back failed deployments.
WAF Monitoring
The WAF integration provides security metrics and blocks malicious traffic before it reaches the application.
const webAcl = new wafv2.CfnWebACL(this, 'WebACL', {
defaultAction: { allow: {} },
rules: [
{
name: 'RateLimitRule',
priority: 1,
statement: {
rateBasedStatement: {
limit: 2000,
aggregateKeyType: 'IP',
},
},
action: { block: {} },
visibilityConfig: {
sampledRequestsEnabled: true,
cloudWatchMetricsEnabled: true,
metricName: 'RateLimitRule',
},
},
{
name: 'AWSManagedRulesCommonRuleSet',
priority: 2,
statement: {
managedRuleGroupStatement: {
vendorName: 'AWS',
name: 'AWSManagedRulesCommonRuleSet',
},
},
overrideAction: { none: {} },
visibilityConfig: {
sampledRequestsEnabled: true,
cloudWatchMetricsEnabled: true,
metricName: 'CommonRuleSet',
},
},
],
visibilityConfig: {
sampledRequestsEnabled: true,
cloudWatchMetricsEnabled: true,
metricName: 'ghost-waf',
},
});
WAF monitoring tracks rate limit violations identifying potential DDoS attacks, blocked requests by rule showing attack patterns, sampled requests for security analysis, and geographical distribution of blocked traffic.
Cost Optimization
The monitoring setup uses several strategies to minimize costs while maintaining visibility.
Log Retention Policies
Different retention periods for different log types optimize storage costs:
// Critical application logs - 1 week
const appLogs = new logs.LogGroup(this, 'AppLogs', {
retention: logs.RetentionDays.ONE_WEEK,
});
// Access logs - 3 days
const accessLogs = new logs.LogGroup(this, 'AccessLogs', {
retention: logs.RetentionDays.THREE_DAYS,
});
// Debug logs - 1 day
const debugLogs = new logs.LogGroup(this, 'DebugLogs', {
retention: logs.RetentionDays.ONE_DAY,
});
Metric Filters Instead of Lambda
Using CloudWatch metric filters avoids Lambda costs for simple metric extraction:
new logs.MetricFilter(this, 'ErrorCountMetric', {
logGroup: ghostLogGroup,
filterPattern: logs.FilterPattern.literal('[ERROR]'),
metricNamespace: 'Ghost/Application',
metricName: 'ErrorCount',
metricValue: '1',
});
Container Insights Optimization
Container Insights provides deep visibility but can be expensive. We enable it selectively:
const cluster = new ecs.Cluster(this, 'Cluster', {
vpc,
containerInsights: true, // Enable for production only
enableFargateCapacityProviders: true,
});
Cost Considerations
The monitoring setup is designed to be cost-effective while providing comprehensive visibility. Based on typical Ghost deployment patterns, the estimated monthly costs are approximately $10-20 depending on traffic volume and log retention settings. The investment provides complete operational visibility and peace of mind for production deployments.
Subscirbing to Notifications
After deployment:
-
Subscribe to SNS alerts: The email subscription requires confirmation
aws sns subscribe \ --topic-arn arn:aws:sns:region:account:GhostStack-MonitoringAlarmTopic* \ --protocol email \ --notification-endpoint your-email@example.com -
Verify alarm configuration: Check that all alarms are in OK state initially
-
Customize thresholds: Adjust based on your traffic patterns and requirements
-
Monitor auto-scaling alarms separately: These will show as ALARM when load is low (this is normal)
The monitoring system provides the operational excellence needed for production Ghost deployments. With comprehensive metrics, intelligent alerting, and cost-effective logging, you can confidently run Ghost on AWS knowing issues will be detected and resolved quickly.