RAG Pipeline on AWS
A full-stack Retrieval-Augmented Generation system deployed entirely on AWS. A React SPA behind CloudFront authenticates users via Cognito, then streams LLM answers token-by-token from a FastAPI service on ECS Fargate. Documents uploaded to S3 trigger an ingest Lambda that chunks text, embeds it with Bedrock Titan, and stores vectors in pgvector on RDS PostgreSQL. Conversation history persists in DynamoDB with automatic 7-day TTL expiration. The entire stack — 71 resources across 15 services — is provisioned, deployed, and torn down with a single command.
Key Achievements
- Streaming RAG responses: Built a FastAPI service on ECS Fargate that embeds queries with Bedrock Titan, retrieves top-5 chunks via pgvector cosine similarity (HNSW index), and streams LLM answers token-by-token as Server-Sent Events through an ALB and CloudFront.
- Event-driven document ingestion: Designed an S3 event-triggered Lambda pipeline that extracts text from PDFs and plaintext, chunks documents into 512-word segments with 50-word overlap, embeds each chunk via Bedrock Titan V2, and inserts vectors into pgvector.
- Cost-optimized vector search: Chose pgvector on a db.t4g.micro RDS instance over OpenSearch Serverless (Bedrock Knowledge Bases default), achieving equivalent vector search at a fraction of the cost while gaining full control over chunking, similarity thresholds, and prompt construction.
- Full-stack authentication: Integrated Cognito User Pool authentication with SRP-based sign-in (no server-side secrets), JWT validation on the FastAPI service via JWKS, and secure API routing through CloudFront.
- Single-command infrastructure: Automated provisioning of 71 AWS resources across 15 services (VPC, RDS, ECS, ECR, Lambda, S3, DynamoDB, CloudFront, Cognito, Bedrock, ALB, NAT Gateway, API Gateway) with Terraform, including Docker image builds, frontend deployment, and CloudFront cache invalidation.
Technologies
- Terraform
- Amazon Bedrock
- Amazon ECS Fargate
- FastAPI
- React
- TypeScript
- Amazon RDS (pgvector)
- Amazon CloudFront
- Amazon Cognito
- AWS Lambda (Python 3.12)
- Amazon S3
- Amazon DynamoDB
- Amazon ALB
- Docker
- Python
Year
2026
Links