Deep Dive: Building 'alphabetizi.ng'

Exploring the architecture and interesting features of a serverless survey application built with AWS CDK, React, DynamoDB, and Lambda@Edge

Deep Dive: Building 'alphabetizi.ng'

Introduction

Recently, I built a small web application called alphabetizi.ng. It's a fun survey designed to explore how different people alphabetize their music collections, ultimately revealing their underlying organizational style. This post dives into the technical details and interesting architectural choices made during its development. While the frontend is a standard React application built with Vite and TypeScript, the main focus here is the serverless backend, constructed entirely using the AWS Cloud Development Kit (CDK) with TypeScript. We'll explore how services like API Gateway, Lambda (including Lambda@Edge), DynamoDB, S3, and CloudFront were combined to create this application.

Core Serverless Architecture with AWS CDK

Defining the infrastructure as code using the AWS CDK allows for repeatable, version-controlled deployments. The backend architecture is fully serverless, relying on several key AWS services working in concert. API Gateway provides the RESTful endpoints (/survey, /analysis) for the frontend to interact with. These endpoints trigger AWS Lambda functions that handle the core application logic, such as recording survey responses, fetching results, running the AI analysis, and performing security checks via Lambda@Edge.

Survey responses are stored efficiently in DynamoDB, while the static frontend assets (the React build output) are hosted in an S3 bucket. Amazon CloudFront acts as the content delivery network, serving the frontend from S3, routing API requests to API Gateway, and handling caching, security headers (via Response Headers Policies), WAF integration for rate limiting, and crucially, integrating the Lambda@Edge function for request verification. DNS is managed through Route 53, with SSL certificates provided by AWS Certificate Manager (ACM). Finally, IAM roles and policies define the necessary permissions, including an OIDC provider setup to enable secure, passwordless deployments from GitHub Actions.

To keep the CDK code manageable and reusable, it's organized into logical Stacks like AlphabetizeStack (for the main application) and OidcStack (for GitHub Actions integration), and further broken down into modular Constructs such as ApiConstruct, WebsiteConstruct, and DynamoDbConstruct.

Storing & Querying Responses in DynamoDB

Choosing the right database is key for a survey application that needs to handle potentially many concurrent responses. We selected DynamoDB for its scalability and serverless nature. The core of the data model involves efficiently tallying votes for each answer option per question.

To achieve this, the table uses a composite primary key:

  • Partition Key: questionId (string) - Groups all responses for a single question.
  • Sort Key: response (string) - Represents the specific answer option chosen.

Each item in the table represents a unique answer to a specific question and stores a count attribute (number). The real power comes from leveraging DynamoDB Atomic Counters. When a user submits their choice, the backend Lambda function doesn't need to read the current count, increment it, and write it back (which could lead to race conditions). Instead, it performs a single UpdateItem operation using an ADD update expression on the count attribute for the specific questionId and response. This operation is atomic, ensuring that even with many simultaneous submissions, the counts are incremented accurately and efficiently without conflicts.

// Simplified Lambda logic for recording a response
const updateParams = {
  TableName: TABLE_NAME,
  Key: {
    questionId: response.questionId,
    response: response.selectedOption,
  },
  // Atomically increment the count, initializing to 0 if it doesn't exist
  UpdateExpression: "SET #count = if_not_exists(#count, :zero) + :inc",
  ExpressionAttributeNames: { "#count": "count" },
  ExpressionAttributeValues: { ":zero": 0, ":inc": 1 },
};
await docClient.send(new UpdateCommand(updateParams));
// From infra/src/constructs/dynamodb-construct.ts
this.table = new Table(this, "RecordOrganizationSurvey", {
  partitionKey: { name: "questionId", type: AttributeType.STRING },
  sortKey: { name: "response", type: AttributeType.STRING },
  billingMode: BillingMode.PAY_PER_REQUEST,
  // ...
});
 
// GSI for querying all responses for a single question
this.table.addGlobalSecondaryIndex({
  indexName: "QuestionIndex",
  partitionKey: { name: "questionId", type: AttributeType.STRING },
  projectionType: ProjectionType.ALL,
});

While storing responses is efficient, retrieving the aggregated results for display requires fetching all answer counts for a given question. To optimize this read pattern, a Global Secondary Index (GSI) named QuestionIndex was added. This GSI uses questionId as its partition key, allowing the results Lambda function to query all items (answer options and their counts) for a specific question using a single, efficient Query operation against the GSI. This is essential for populating the results charts on the frontend without resorting to less efficient table scans.

Securing the API with Lambda@Edge and API Gateway Authorizer

To protect the backend API endpoints without exposing secrets client-side, we implemented the two-layered security pattern using Lambda@Edge and an API Gateway Request Authorizer, as detailed in our post on Securing API Gateway with Lambda@Edge. This ensures only valid requests originating from our frontend can reach the backend Lambdas.

// Simplified logic from infra/src/lambdas/edge-verify/index.ts
export const handler: CloudFrontRequestHandler = async (event) => {
  const request = event.Records[0].cf.request;
  const referer = request.headers.referer?.[0]?.value;
  const domainName = process.env.DOMAIN_NAME;
  const apiKey = process.env.API_KEY;
 
  // Check if referer matches allowed domains
  const isValidReferer = /* ... check logic ... */;
 
  if (!isValidReferer) {
    return { status: '403', /* ... */ };
  }
 
  // Inject API key
  request.headers['x-api-key'] = [{ key: 'x-api-key', value: apiKey }];
  return request;
};

Advanced CloudFront and Logging Setup

The CloudFront distribution (WebsiteConstruct) is also configured with AWS WAF for rate limiting and a ResponseHeadersPolicy for enhanced browser security. Furthermore, detailed access logs are captured and made queryable using AWS Glue and Athena via the LoggingConstruct, a setup explained in our post on Analyzing Blog Traffic with Athena.

Frontend and Deployment

While this post focuses on the backend, the frontend provides the user interface for the survey. It's a modern React application built using Vite, chosen for its fast development server and optimized builds, with TypeScript ensuring type safety throughout the codebase. Components are styled using Tailwind CSS, leveraging shadcn/ui for accessible and composable UI primitives like buttons, cards, and progress bars. This approach involves copying component code directly into the project rather than relying on a traditional UI library dependency.

To enhance the user experience and prevent potential bias in responses, the application implements randomization in two key areas: the order of question groups is shuffled each time the survey loads, and within each question, the answer options are also randomized (with the exception of any "Other" option, which remains last). After submitting an answer, users can see aggregated results visualized dynamically using the recharts library, with the ability to toggle between pie and bar chart views.

For local development, Vite's proxy feature (server.proxy in vite.config.ts) is configured to seamlessly route API calls (like /survey and /analysis) from the local development server (localhost:5173) to the deployed backend URL. This simplifies testing by mimicking the production routing handled by CloudFront and avoiding CORS issues.

Deployment itself is automated via GitHub Actions. Leveraging the OIDC stack (OidcStack) created by the CDK, the GitHub Actions workflow securely authenticates to AWS without needing long-lived credentials. The workflow builds the React/Vite application, deploys the static assets to the S3 bucket using the WebsiteDeploymentConstruct, and likely triggers a CloudFront invalidation to ensure users receive the latest version.

Conclusion

Building alphabetizi.ng was a great exercise in combining various AWS serverless services using the CDK. The specific implementation of DynamoDB atomic counters and a GSI for efficient survey data handling, coupled with a modern React/Vite frontend featuring dynamic elements like randomization and charting, showcases how to build a scalable and engaging web application. While leveraging established patterns for security and logging (detailed in other posts), this project integrates them into a cohesive serverless architecture.