
08 May 2026 Building a Lightweight RAG System with the AWS S3 Vector Buckets
Large language models (LLMs) can be used to help engineers analyze design documents and verify compliance with standards such as the Australian Government Information Security Manual (ISM). In practice when attempting to add these documents to an LLM’s context, three issues often appear:
- Token limits: ISM controls and design docs can run to hundreds of pages—too large for a single prompt.
- Noise vs signal: Much of the text is irrelevant to the question at hand.
- Semantic gap: Abstract controls don’t map cleanly to very specific design language.
A naive “paste the whole PDF” approach is slow, expensive, and often less accurate. A common way to approach this is using Retrieval Augmented Generation (RAG) to provide the LLM with just the right data it needs for the task. Typically RAG requires the creation of either Bedrock Knowledge Bases, or the usage of specialised vector databases. However AWS has recently released S3 Vector Buckets which greatly simplify this approach.
Retrieval with S3 Vector Buckets

S3 Vector Buckets let you store and query vector embeddings directly in Amazon S3. You create a vector bucket, define one or more indexes with a dimension and distance metric, write vectors with compact metadata, and perform similarity search. This removes the need to stand up a separate vector database. Compared with a Bedrock Knowledge Base, this approach provides more control over how embeddings are generated, stored, and queried, and allows the vector store to be reused by multiple applications beyond Bedrock. It is particularly useful when you want a lightweight, flexible vector storage layer integrated with existing S3-based data architectures.
Our Implementation: Serverless RAG on S3 Vectors
High-Level Architecture
- A Job Engine, with a task queue, orchestrates work. Artifacts such as SIR, HTML, PDF, DOCX, and XLSX files are uploaded to an artifacts S3 bucket.
- EventBridge triggers a doc-to-vector Lambda that downloads the file, chunks the text around 1,000 characters with around 200 characters overlap, generates Bedrock Titan V2 embeddings, and writes vectors to an S3 Vector Bucket, auto-creating the index if needed.
- For a GenAI task such as genai-report, the engine embeds the user/query text, then runs QueryVectors to fetch the top-K most relevant chunks, and injects those chunks into the LLM prompt to generate the final answer.
- Other Lambdas, such as sir-to-xlsx, run in parallel for exports or side effects.
How It Works
Stage 1 — Vectorisation: Document to Vectors
This stage prepares documents so they can be searched semantically. Raw document text is split into smaller chunks, converted into numerical embeddings using a Bedrock embedding model, and stored as vectors in an S3 Vector Bucket with compact metadata. This transforms unstructured text into a vector index that can later be queried by meaning rather than keyword matching.
1. Create the vector bucket and index
Set up the S3 Vector storage location and index configuration used to hold and search embeddings.
aws_cli s3vectors create-vector-bucket \ --vector-bucket-name "$bucket_name" \ --encryption-configuration '{"sseType":"AES256"}'2. Chunk document text
Split the document into overlapping, sentence-aware sections so meaning is preserved while keeping chunk sizes suitable for embedding and retrieval.
const lines = text.split('\n');let currentChunk = "", chunkId = 0;if (currentChunk.length >= chunkSize) { const chunkText = currentChunk.substring(0, breakPoint + 1).trim(); chunks.push({ id: `${prefix}_chunk_${String(chunkId).padStart(4, '0')}`, text: chunkText, metadata: { chunk_index: chunkId } }); chunkId++;}3. Embed each chunk
Convert each text chunk into a numerical embedding vector using Amazon Bedrock Titan Text Embeddings V2. These embeddings capture semantic meaning and allow similarity search. S3 Vector Buckets currently support the following embedding dimensions: 256, 512, and 1024.
const modelId = "amazon.titan-embed-text-v2:0";const resp = await bedrockClient.send(new InvokeModelCommand({ modelId, contentType: "application/json", body: JSON.stringify({ inputText: text })}));return JSON.parse(new TextDecoder().decode(resp.body)).embedding;4. Store vectors with compact metadata
Each embedding is stored in the S3 Vector index together with lightweight metadata, around 2 KB or less, that helps identify the original document location.
const vectors = chunks.map((c, i) => ({ key: c.id, data: { float32: embeddings[i] }, metadata}));await s3VectorsClient.send(new PutVectorsCommand({ vectorBucketName: vectorBucket, indexName, vectors}));Stage 2 — Retrieval: Query to Ranked Chunks
This stage retrieves the most relevant document content for a user query. The query text is embedded using the same model used for documents, then compared against stored vectors in the index. The most similar chunks are returned, ranked by similarity score, and filtered by a threshold to ensure relevance.
1. Embed the query text
Convert the incoming search query into an embedding vector in the same semantic space as the stored document vectors.
const queryEmbedding = await this.embeddingService.generateEmbedding(query);2. Query top-K vectors from the index
Perform a similarity search against the vector index and return the most relevant chunks.
const queryCommand = new QueryVectorsCommand({ vectorBucketName: vectorBucket, indexName: indexName, queryVector: { float32: queryEmbedding }, topK: maxChunks, returnMetadata: true, returnDistance: true});const queryResult = await this.s3VectorsClient.send(queryCommand);Returned results are then converted from distance to similarity score, filtered by threshold, and surfaced with their associated metadata.
Security and Operations
Permission Strategy
We adopted least-privilege permission control with function-level separation. The doc-to-vector Lambda needs s3vectors:CreateIndex and s3vectors:PutVectors permissions for writing. The GenAI report task has read-only access to the vectors, using permissions such as QueryVectors and GetIndex, which limits the infrastructure-level blast radius.
For production use cases, this should be paired with metadata-based document permissions so the retrieval layer only returns chunks the current user is authorised to access. Ethan Hollins covers this broader defence-in-depth problem in his article on moving Bedrock solutions to production beyond standard guardrails.
Results
Two configurations were evaluated against the same ISM conformance task: a full-text prompt approach with no RAG, and RAG using S3-backed vector search. Results are reported across two dimensions: cost efficiency and detection accuracy.
Cost Comparison
Cost was estimated from token counts logged during each run, multiplied by Bedrock on-demand pricing as of April 2026: $3.00 per million input tokens and $15.00 per million output tokens for both Claude 3.5 Sonnet and Claude Sonnet 4.
| Aspect | Full-Text Prompt, No RAG | RAG with S3 Vector Buckets | Bedrock Knowledge Base, OpenSearch |
|---|---|---|---|
| LLM token usage | 31,044 tokens | ~5,602 tokens | Depends on retrieved context |
| LLM cost per run | ~$0.13 | ~$0.02–$0.03 estimated | ~$0.01 |
| Vector infrastructure cost | $0 | S3 storage only, approximately cents per month | ~$350/month |
| Net cost profile | Token-heavy | Cost-efficient at scale | Higher baseline infrastructure cost |
Performance Comparison
We ran the ISM conformance task 6 times across two configurations: one using RAG with S3-backed vector search, and one passing the full AWS Config conformance pack results directly as plain text in the prompt. Each run was scored against a defined ground truth of 6 known non-compliant findings derived from live AWS Config data.
| Metric | RAG with S3 Vectors | Full Text, No RAG |
|---|---|---|
| Recall | 4/6 = 67% | 3/6 = 50% |
| Precision | 4/4 = 100% | 3/3 = 100% |
| F1 Score | 0.80 | 0.67 |
| False positives | 0 | 0 |
| False negatives, missed real findings | 2 | 3 |
| Duration | 2m 20s | ~2m |
Full-text prompting reached 50% recall. It identified the obvious Config findings, such as EBS, CloudTrail, and S3 logging, because they appeared prominently in the raw data. However, it missed subtler findings related to the backup vault, ElastiCache, and IAM.
RAG reached 67% recall and 100% precision. It added the backup vault finding that full-text prompting missed and produced zero false positives. The vector index retrieval surfaced more specific findings.
The remaining gap, including ElastiCache and IAM unused credentials, is a semantic gap. Those Config rule names do not embed closely to ISM control language, so RAG does not reliably retrieve them. This is a known limitation and is discussed in more detail, with some potential solutions later in this blog.
Lessons Learned
The Semantic Gap Challenge
When querying ISM controls against technical design documents, we saw 60-70% accuracy with pure vector similarity. Abstract compliance language such as “implement encryption at rest” doesn’t embed close to specific technical terms such as “AWS KMS key rotation policy”.
Potential Solution: Two-Stage Retrieval with Domain Tags
First, tag chunks during vectorization with domain categories such as security, networking, compute, and data-storage. At query time, filter by relevant tags to narrow the corpus, then vector-search within that subset and re-rank with a cross-encoder model. Take the top 5-10 chunks as final context. This approach could push F1 score to more than 0.8 by bridging the abstract-to-concrete gap with explicit categorization before semantic matching.
Infrastructure Constraints
- Permission control: Our index-per-tenant approach requires fan-out queries across indexes because S3 Vectors queries one index at a time. A shared index with tenant_id metadata would be cleaner, but S3 Vectors doesn’t support metadata filtering in queries yet. (Update: This is now supported – https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-vectors-metadata-filtering.html)
- IaC lag: S3 Vectors lacks full CloudFormation and Terraform support, so we fell back to SDK and CLI operations for bucket and index creation.
- Index constraints: Dimension and distance metric are immutable per index. Changing your embedding model means creating a new index and re-vectorizing everything.
- Metadata budget: There is a 2 KB hard limit for filterable metadata. We spent hours optimizing metadata packing to squeeze in text snippets for debugging.
- Metadata key limit: There is a 10-field metadata limit per vector maximum. Plan your taxonomy carefully because adding fields later may require re-indexing.
Final Thoughts
S3 Vector Buckets are a good fit when you want a lightweight, AWS-native RAG setup without running a separate vector database. They work especially well for internal tools, compliance checks, document search, and serverless GenAI workflows where your source files already live in S3.
I would choose S3 Vectors when I want control over chunking, embeddings, metadata, and retrieval logic, but do not need advanced vector database features like rich metadata filtering, hybrid search, or complex ranking.
The key lesson is that S3 Vectors simplify the storage layer, but they do not solve retrieval quality by themselves. Good RAG still depends on careful chunking, metadata design, query strategy, and evaluation against real ground truth.



No Comments