Refactoring Blog Search from Pinecone to Claude
How we replaced vector search with AI-powered semantic search using Claude via AWS Bedrock

When I initially implemented search for this blog, I chose Pinecone's vector database for its promise of semantic search capabilities. After running it for several months, I decided to refactor the entire search implementation to use Claude via AWS Bedrock instead. This post walks through the technical journey of that migration.
The Original Architecture
The Pinecone implementation followed a standard vector search pattern. During deployment, a GitHub Actions workflow would read all blog posts, generate embeddings using Pinecone's multilingual-e5-large model, and store them in a Pinecone index. When users searched, the Lambda function would generate an embedding for their query, search the vector database, and use Pinecone's reranking model to improve results.
The architecture looked like this:
This worked well enough, but came with some challenges. The deployment process required managing embeddings and keeping them synchronized with content changes. Most importantly, while vector search found semantically similar content, it sometimes missed obvious matches that a more context-aware system would catch.
The New Approach
The refactored architecture simplifies the search flow while leveraging Claude's natural language understanding:
Instead of maintaining vector embeddings, the system now generates a single JSON file containing all blog content during the deployment process. The Next.js build process (via yarn build
) generates the static site, RSS feeds, and llms.txt files for each post. Then during CDK deployment, the bundling process generates the search corpus and deploys everything to S3. When a search request comes in, the Lambda function fetches this corpus (with 5-minute caching), constructs a prompt for Claude, and returns the most relevant results.
Corpus Generation
The blog corpus generation happens during the CDK deployment's bundling process. When CDK bundles the website assets, it runs a TypeScript script that reads the Velite-generated blog data and creates a structured JSON file:
interface BlogCorpusEntry {
id: string;
title: string;
description: string;
content: string;
date: string;
tags: string[];
categories: string[];
author: string;
}
export function generateBlogCorpus(outputPath: string): void {
try {
// Read the Velite-generated blogs.json
const blogsPath = join(process.cwd(), "../web/.velite/blogs.json");
if (!existsSync(blogsPath)) {
console.error(`Blog data not found at ${blogsPath}`);
console.log("Make sure you've built the web app first with 'yarn build'");
return;
}
const blogsJson = readFileSync(blogsPath, "utf-8");
const blogs: VeliteBlogPost[] = JSON.parse(blogsJson);
// Filter published posts and transform to corpus format
const corpus: BlogCorpusEntry[] = blogs
.filter((post) => post.published)
.map((post) => ({
id: post.slug,
title: post.title,
description: post.description,
content: post.body, // Full content for best search results
date: post.date,
tags: post.tags || [],
categories: post.categories || [],
author: post.author,
}));
// Write corpus to output file
writeFileSync(outputPath, JSON.stringify(corpus, null, 2));
console.log(
`Blog corpus generated with ${corpus.length} posts at ${outputPath}`,
);
// Log size for monitoring
const sizeInMB = (
Buffer.byteLength(JSON.stringify(corpus)) /
1024 /
1024
).toFixed(2);
console.log(`Corpus size: ${sizeInMB} MB`);
} catch (error) {
console.error("Error generating blog corpus:", error);
throw error;
}
}
Then, during deployment of the CDK, the generateBlogCorpus
is run as part of the websiteSource
bundling. Instead of running separate scripts to create indexes and generate embeddings, the corpus generation happens during the local bundling:
const websiteSource = Source.asset(webOutPath, {
bundling: {
local: {
tryBundle(outputDir: string) {
// Copy the Next.js build output
execSync(`cp -r ${webOutPath}/* ${outputDir}/`, {
stdio: "inherit",
});
// Generate the blog corpus
const corpusPath = path.join(outputDir, "blog-corpus.json");
generateBlogCorpus(corpusPath);
return true;
},
},
},
});
This approach ensures that the blog corpus is always synchronized with the deployed content, as it's generated fresh during each deployment.
Search Lambda
The search Lambda underwent a complete rewrite. Instead of managing Pinecone clients and generating embeddings, it now focuses on prompt engineering and Claude integration:
async function searchWithClaude(
query: string,
corpus: BlogCorpusEntry[],
): Promise<SearchResult[]> {
const prompt = `You are a search engine for a technical blog. Given the following blog posts and a search query, return the 5 most relevant blog posts.
Blog posts:
${JSON.stringify(
corpus.map((post) => ({
id: post.id,
title: post.title,
description: post.description,
tags: post.tags,
categories: post.categories,
content: post.content.substring(0, 500),
})),
null,
2,
)}
Search query: "${query}"
Return ONLY a JSON array with the 10 most relevant blog posts in this exact format, with no additional text:
[
{
"id": "blog-post-slug",
"text": "Title - Description",
"score": 0.95
}
]`;
const command = new InvokeModelCommand({
modelId: "us.anthropic.claude-3-5-haiku-20241022-v1:0",
contentType: "application/json",
accept: "application/json",
body: JSON.stringify({
anthropic_version: "bedrock-2023-05-31",
max_tokens: 1000,
messages: [
{
role: "user",
content: [{ type: "text", text: prompt }],
},
],
temperature: 0,
top_p: 0.999,
}),
});
const response = await bedrockClient.send(command);
// Parse and return results...
}
The Lambda caches the blog corpus in memory for 5 minutes to avoid repeated S3 fetches. This provides a good balance between performance and ensuring fresh content after deployments.
Here's the complete search flow:
Frontend Enhancements
The refactoring provided an opportunity to improve the search user experience. The original implementation simply displayed a loading spinner, which felt inadequate for searches that now take 2-3 seconds instead of being nearly instant.
The new search component implements several enhancements:
type SearchState = "idle" | "typing" | "searching" | "complete" | "error";
export function Search() {
const [searchState, setSearchState] = useState<SearchState>("idle");
const [cachedResults, setCachedResults] = useState<
Record<string, SearchResult[]>
>({});
const { recentSearches, addRecentSearch } = useRecentSearches();
// Show "Searching with AI..." during search
// Display recent searches in dropdown
// Cache results for instant repeated searches
// Progressive result display
}
Recent searches are stored in localStorage and displayed in a dropdown, making it easy to revisit previous queries. Results are cached in memory during the session, so clicking on a recent search displays results instantly without another API call.
The search state management provides clear feedback throughout the process. Users see when they're typing, when the search is processing, and when results are ready. If searches take longer than expected, the interface communicates this rather than leaving users wondering if something is broken.
Infrastructure Changes
The CDK infrastructure required several modifications. The search construct no longer needs Pinecone secrets or API keys, but does require access to the S3 Bucket and permission to InvokeModel
on Amazon Bedrock.
export class SearchConstruct extends Construct {
constructor(scope: Construct, id: string, props: SearchConstructProps) {
super(scope, id);
const searchFunction = new NodejsFunction(this, "SearchFunction", {
runtime: Runtime.NODEJS_22_X,
handler: "handler",
timeout: Duration.seconds(30),
entry: join(__dirname, "../lambda/search/index.ts"),
environment: {
WEBSITE_BUCKET_NAME: props.websiteBucket.bucketName,
},
});
// Grant S3 read permissions
props.websiteBucket.grantRead(searchFunction);
// Add Bedrock permissions
searchFunction.addToRolePolicy(
new PolicyStatement({
actions: ["bedrock:InvokeModel"],
resources: [
`arn:aws:bedrock:${Stack.of(this).region}::foundation-model/*`,
],
}),
);
}
}
Performance and Cost Analysis
The shift from Pinecone to Claude brought interesting tradeoffs. Search latency increased from near-instant to 2-3 seconds, but this feels acceptable given the improved search quality. The loading states and recent searches help mitigate the perception of slowness.
The search quality improved noticeably. Claude understands context and intent better than pure vector similarity. Searches for "CDK deployment" now surface posts about CDK pipelines, CDK constructs, and deployment strategies in a more intuitive order. The AI can infer relationships between concepts that vector embeddings might miss.
Conclusion
This refactoring taught me several valuable lessons about choosing the right tool for the job. Vector databases excel at scale and when millisecond latency matters, but for smaller applications, the complexity and cost might not justify their use.
The importance of user experience during longer operations became clear. The original implementation assumed fast responses, while the new one embraces the slower speed with better feedback and caching strategies.
The simplified deployment process reduces operational overhead significantly. No more managing embeddings, synchronizing vector databases, or handling index migrations. The entire search system rebuilds automatically with each deployment.
While the search is significantly slower the results are dramatically better. Combined with not having to worry about embeddings, this makes the switch from a vector database to LLM powered search a big win.