π€ Ghostwritten by Claude Opus 4.5 Β· Curated by Tom Hundley
This article was written by Claude Opus 4.5 and curated for publication by Tom Hundley.
When you want RAG without the infrastructure headache.
What if you could deploy a production RAG system without managing vector databases, chunking pipelines, or embedding infrastructure? AWS Bedrock Knowledge Bases offers exactly that: a fully managed RAG service that handles the entire pipeline from document ingestion to response generation.
If you have followed this series through LangChain (Part 2), LlamaIndex (Part 3), and Haystack (Part 4), you have seen how much infrastructure these frameworks require you to build and maintain. Bedrock Knowledge Bases abstracts all of that away.
Upload documents to S3. Point Knowledge Bases at them. Query with an API call. Done.
But this simplicity comes with trade-offs, particularly around cost, that every architect needs to understand before committing.
Choose Bedrock Knowledge Bases when:
The managed nature means:
Under the hood, Bedrock Knowledge Bases orchestrates several AWS services:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β BEDROCK KNOWLEDGE BASES ARCHITECTURE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββ ββββββββββββββββββ βββββββββββββββββββββββββββββ β
β β β β β β β β
β β DATA SOURCE βββββΆβ INGESTION βββββΆβ VECTOR STORE β β
β β (S3, etc.) β β (Chunking, β β (OpenSearch, Pinecone, β β
β β β β Embedding) β β Aurora, etc.) β β
β ββββββββββββββββ ββββββββββββββββββ βββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββ ββββββββββββββββββ βββββββββββββββββββββββββββββ β
β β β β β β β β
β β RESPONSE ββββββ GENERATION ββββββ RETRIEVAL β β
β β β β (Claude, etc)β β (Semantic + Optional β β
β β β β β β Hybrid Search) β β
β ββββββββββββββββ ββββββββββββββββββ βββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββThe key components:
Before creating a Knowledge Base, ensure you have:
# Install AWS CLI
brew install awscli # macOS
# or
pip install awscli
# Configure credentials
aws configure
# Enter: AWS Access Key ID, Secret Access Key, Region (us-east-1, us-west-2, etc.)Create an IAM role for Bedrock with these permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:*",
"s3:GetObject",
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"aoss:*"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"iam:PassRole"
],
"Resource": "arn:aws:iam::*:role/AmazonBedrockExecutionRoleForKnowledgeBase*"
}
]
}In the AWS Console, navigate to Bedrock and request access to:
Model access requests are typically approved instantly for most models.
# Create the knowledge base
aws bedrock-agent create-knowledge-base \
--name "company-docs-kb" \
--description "Internal documentation knowledge base" \
--role-arn "arn:aws:iam::123456789012:role/BedrockKBRole" \
--knowledge-base-configuration '{
"type": "VECTOR",
"vectorKnowledgeBaseConfiguration": {
"embeddingModelArn": "arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0"
}
}' \
--storage-configuration '{
"type": "OPENSEARCH_SERVERLESS",
"opensearchServerlessConfiguration": {
"collectionArn": "arn:aws:aoss:us-east-1:123456789012:collection/abc123",
"fieldMapping": {
"metadataField": "metadata",
"textField": "text",
"vectorField": "vector"
},
"vectorIndexName": "company-docs-index"
}
}'
# Note the knowledgeBaseId from the response
KNOWLEDGE_BASE_ID="ABC123XYZ"
# Add S3 data source
aws bedrock-agent create-data-source \
--knowledge-base-id $KNOWLEDGE_BASE_ID \
--name "s3-documents" \
--data-source-configuration '{
"type": "S3",
"s3Configuration": {
"bucketArn": "arn:aws:s3:::my-company-docs"
}
}' \
--vector-ingestion-configuration '{
"chunkingConfiguration": {
"chunkingStrategy": "FIXED_SIZE",
"fixedSizeChunkingConfiguration": {
"maxTokens": 300,
"overlapPercentage": 10
}
}
}'
# Start ingestion
aws bedrock-agent start-ingestion-job \
--knowledge-base-id $KNOWLEDGE_BASE_ID \
--data-source-id "data-source-id-from-above"// lib/knowledge-base-stack.ts
import * as cdk from 'aws-cdk-lib';
import * as s3 from 'aws-cdk-lib/aws-s3';
import * as iam from 'aws-cdk-lib/aws-iam';
import * as bedrock from 'aws-cdk-lib/aws-bedrock';
import { Construct } from 'constructs';
export class KnowledgeBaseStack extends cdk.Stack {
constructor(scope: Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props);
// S3 bucket for documents
const docsBucket = new s3.Bucket(this, 'DocumentsBucket', {
bucketName: 'company-docs-knowledge-base',
removalPolicy: cdk.RemovalPolicy.RETAIN,
versioned: true,
});
// IAM role for Bedrock
const kbRole = new iam.Role(this, 'KnowledgeBaseRole', {
assumedBy: new iam.ServicePrincipal('bedrock.amazonaws.com'),
managedPolicies: [
iam.ManagedPolicy.fromAwsManagedPolicyName('AmazonBedrockFullAccess'),
],
});
// Grant S3 read access
docsBucket.grantRead(kbRole);
// Knowledge Base (using L1 construct as L2 may not be available)
const knowledgeBase = new bedrock.CfnKnowledgeBase(this, 'KnowledgeBase', {
name: 'company-docs-kb',
description: 'Company documentation knowledge base',
roleArn: kbRole.roleArn,
knowledgeBaseConfiguration: {
type: 'VECTOR',
vectorKnowledgeBaseConfiguration: {
embeddingModelArn: `arn:aws:bedrock:${this.region}::foundation-model/amazon.titan-embed-text-v2:0`,
},
},
storageConfiguration: {
type: 'OPENSEARCH_SERVERLESS',
opensearchServerlessConfiguration: {
collectionArn: 'YOUR_COLLECTION_ARN', // Create separately or use Pinecone
fieldMapping: {
metadataField: 'metadata',
textField: 'text',
vectorField: 'vector',
},
vectorIndexName: 'docs-index',
},
},
});
// Data source
const dataSource = new bedrock.CfnDataSource(this, 'S3DataSource', {
knowledgeBaseId: knowledgeBase.attrKnowledgeBaseId,
name: 's3-documents',
dataSourceConfiguration: {
type: 'S3',
s3Configuration: {
bucketArn: docsBucket.bucketArn,
},
},
vectorIngestionConfiguration: {
chunkingConfiguration: {
chunkingStrategy: 'HIERARCHICAL',
hierarchicalChunkingConfiguration: {
levelConfigurations: [
{ maxTokens: 1500 }, // Parent chunks
{ maxTokens: 300 }, // Child chunks
],
overlapTokens: 60,
},
},
},
});
// Outputs
new cdk.CfnOutput(this, 'KnowledgeBaseId', {
value: knowledgeBase.attrKnowledgeBaseId,
});
}
}# main.tf
provider "aws" {
region = "us-east-1"
}
# S3 bucket for documents
resource "aws_s3_bucket" "docs" {
bucket = "company-docs-knowledge-base-${random_id.suffix.hex}"
}
resource "random_id" "suffix" {
byte_length = 4
}
# IAM role for Bedrock
resource "aws_iam_role" "bedrock_kb" {
name = "AmazonBedrockExecutionRoleForKnowledgeBase"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "bedrock.amazonaws.com"
}
}
]
})
}
resource "aws_iam_role_policy" "bedrock_kb_policy" {
name = "bedrock-kb-policy"
role = aws_iam_role.bedrock_kb.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"s3:GetObject",
"s3:ListBucket"
]
Resource = [
aws_s3_bucket.docs.arn,
"${aws_s3_bucket.docs.arn}/*"
]
},
{
Effect = "Allow"
Action = [
"bedrock:InvokeModel"
]
Resource = "*"
},
{
Effect = "Allow"
Action = [
"aoss:APIAccessAll"
]
Resource = "*"
}
]
})
}
# Knowledge Base
resource "aws_bedrockagent_knowledge_base" "main" {
name = "company-docs-kb"
role_arn = aws_iam_role.bedrock_kb.arn
knowledge_base_configuration {
type = "VECTOR"
vector_knowledge_base_configuration {
embedding_model_arn = "arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0"
}
}
storage_configuration {
type = "OPENSEARCH_SERVERLESS"
opensearch_serverless_configuration {
collection_arn = aws_opensearchserverless_collection.kb.arn
vector_index_name = "docs-index"
field_mapping {
metadata_field = "metadata"
text_field = "text"
vector_field = "vector"
}
}
}
}
# Data source
resource "aws_bedrockagent_data_source" "s3" {
knowledge_base_id = aws_bedrockagent_knowledge_base.main.id
name = "s3-documents"
data_source_configuration {
type = "S3"
s3_configuration {
bucket_arn = aws_s3_bucket.docs.arn
}
}
vector_ingestion_configuration {
chunking_configuration {
chunking_strategy = "FIXED_SIZE"
fixed_size_chunking_configuration {
max_tokens = 300
overlap_percentage = 10
}
}
}
}
output "knowledge_base_id" {
value = aws_bedrockagent_knowledge_base.main.id
}Bedrock Knowledge Bases supports multiple data source types.
The most common and flexible option:
import boto3
s3 = boto3.client('s3')
# Upload documents to S3
s3.upload_file(
'company_handbook.pdf',
'my-kb-bucket',
'documents/company_handbook.pdf'
)
# Supported formats:
# - PDF (.pdf)
# - Plain text (.txt)
# - Markdown (.md)
# - HTML (.html)
# - Microsoft Word (.doc, .docx)
# - CSV (.csv)
# - Excel (.xls, .xlsx)Folder structure recommendations:
my-kb-bucket/
βββ documents/
β βββ policies/
β β βββ hr_handbook.pdf
β β βββ security_policy.docx
β βββ technical/
β β βββ api_reference.md
β β βββ architecture.pdf
β βββ training/
β βββ onboarding_guide.pdf
βββ metadata/
βββ custom_metadata.json# Using AWS CLI
aws bedrock-agent create-data-source \
--knowledge-base-id $KB_ID \
--name "confluence-docs" \
--data-source-configuration '{
"type": "CONFLUENCE",
"confluenceConfiguration": {
"sourceConfiguration": {
"hostUrl": "https://your-domain.atlassian.net",
"hostType": "CLOUD"
},
"authType": "OAUTH2_CLIENT_CREDENTIALS",
"credentialsSecretArn": "arn:aws:secretsmanager:us-east-1:123456789012:secret:confluence-creds"
}
}'aws bedrock-agent create-data-source \
--knowledge-base-id $KB_ID \
--name "sharepoint-docs" \
--data-source-configuration '{
"type": "SHAREPOINT",
"sharePointConfiguration": {
"sourceConfiguration": {
"tenantId": "your-tenant-id",
"domain": "your-domain",
"siteUrls": ["https://your-domain.sharepoint.com/sites/docs"]
},
"authType": "OAUTH2_CLIENT_CREDENTIALS",
"credentialsSecretArn": "arn:aws:secretsmanager:us-east-1:123456789012:secret:sharepoint-creds"
}
}'aws bedrock-agent create-data-source \
--knowledge-base-id $KB_ID \
--name "salesforce-knowledge" \
--data-source-configuration '{
"type": "SALESFORCE",
"salesforceConfiguration": {
"sourceConfiguration": {
"hostUrl": "https://your-org.salesforce.com"
},
"authType": "OAUTH2_CLIENT_CREDENTIALS",
"credentialsSecretArn": "arn:aws:secretsmanager:us-east-1:123456789012:secret:salesforce-creds"
}
}'aws bedrock-agent create-data-source \
--knowledge-base-id $KB_ID \
--name "web-docs" \
--data-source-configuration '{
"type": "WEB",
"webConfiguration": {
"sourceConfiguration": {
"urlConfiguration": {
"seedUrls": [
{"url": "https://docs.example.com"}
]
}
},
"crawlerConfiguration": {
"crawlerLimits": {
"rateLimit": 10
},
"inclusionFilters": [".*\\.html$", ".*\\.md$"],
"exclusionFilters": [".*login.*", ".*admin.*"],
"scope": "HOST_ONLY"
}
}
}'Chunking configuration significantly impacts retrieval quality. Bedrock offers several strategies:
Simple and predictable. Good for uniform content.
{
"chunkingStrategy": "FIXED_SIZE",
"fixedSizeChunkingConfiguration": {
"maxTokens": 300,
"overlapPercentage": 10
}
}Creates parent and child chunks. Parents provide context, children provide precision.
{
"chunkingStrategy": "HIERARCHICAL",
"hierarchicalChunkingConfiguration": {
"levelConfigurations": [
{ "maxTokens": 1500 },
{ "maxTokens": 300 }
],
"overlapTokens": 60
}
}Best for: Long documents where context matters, technical documentation, legal contracts.
Uses the embedding model to find natural boundaries. Creates chunks based on semantic similarity.
{
"chunkingStrategy": "SEMANTIC",
"semanticChunkingConfiguration": {
"maxTokens": 300,
"bufferSize": 0,
"breakpointPercentileThreshold": 95
}
}Best for: Varied content types, documents without clear structure.
Treats each document as a single chunk. Use only for short documents.
{
"chunkingStrategy": "NONE"
}For maximum control, preprocess documents yourself and use custom metadata:
# Preprocess documents with custom chunking logic
import json
def custom_chunk(document_text, chunk_size=500):
# Your custom logic here
chunks = []
# ... chunking implementation
return chunks
# Upload preprocessed chunks with metadata
for i, chunk in enumerate(chunks):
s3.put_object(
Bucket='my-kb-bucket',
Key=f'preprocessed/doc1_chunk_{i}.txt',
Body=chunk,
Metadata={
'x-amz-meta-source': 'original_document.pdf',
'x-amz-meta-chunk-index': str(i),
'x-amz-meta-section': 'chapter-3'
}
)Combine semantic (vector) and keyword (lexical) search:
import boto3
bedrock_agent = boto3.client('bedrock-agent-runtime')
response = bedrock_agent.retrieve(
knowledgeBaseId='YOUR_KB_ID',
retrievalQuery={
'text': 'What is our vacation policy?'
},
retrievalConfiguration={
'vectorSearchConfiguration': {
'numberOfResults': 10,
'overrideSearchType': 'HYBRID' # SEMANTIC or HYBRID
}
}
)Filter retrieval based on document metadata:
response = bedrock_agent.retrieve(
knowledgeBaseId='YOUR_KB_ID',
retrievalQuery={
'text': 'security requirements'
},
retrievalConfiguration={
'vectorSearchConfiguration': {
'numberOfResults': 10,
'filter': {
'andAll': [
{
'equals': {
'key': 'department',
'value': 'engineering'
}
},
{
'greaterThan': {
'key': 'year',
'value': 2023
}
}
]
}
}
}
)Supported filter operators:
equals, notEqualsgreaterThan, greaterThanOrEqualslessThan, lessThanOrEqualsin, notInstartsWithstringContains (partial string matching)listContains (check if value exists in list field)andAll, orAll (for combining conditions)Bedrock can automatically rewrite queries for better retrieval:
response = bedrock_agent.retrieve_and_generate(
input={
'text': 'pto days' # Informal query
},
retrieveAndGenerateConfiguration={
'type': 'KNOWLEDGE_BASE',
'knowledgeBaseConfiguration': {
'knowledgeBaseId': 'YOUR_KB_ID',
'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0',
'orchestrationConfiguration': {
'queryTransformationConfiguration': {
'type': 'QUERY_DECOMPOSITION' # Breaks complex queries into sub-queries
}
}
}
}
)Improve retrieval precision with a reranking model:
response = bedrock_agent.retrieve(
knowledgeBaseId='YOUR_KB_ID',
retrievalQuery={
'text': 'How do I configure SSO?'
},
retrievalConfiguration={
'vectorSearchConfiguration': {
'numberOfResults': 20, # Retrieve more candidates
'rerankingConfiguration': {
'type': 'BEDROCK_RERANKING_MODEL',
'bedrockRerankingConfiguration': {
'modelConfiguration': {
'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/cohere.rerank-v3-5:0'
},
'numberOfRerankedResults': 5 # Return top 5 after reranking
}
}
}
}
)Bedrock provides two primary APIs for querying Knowledge Bases.
Returns relevant chunks without generating a response. Use when you want to control generation yourself.
import boto3
bedrock_agent = boto3.client('bedrock-agent-runtime')
def retrieve_documents(query: str, kb_id: str, top_k: int = 5):
"""Retrieve relevant documents from Knowledge Base."""
response = bedrock_agent.retrieve(
knowledgeBaseId=kb_id,
retrievalQuery={'text': query},
retrievalConfiguration={
'vectorSearchConfiguration': {
'numberOfResults': top_k
}
}
)
results = []
for result in response['retrievalResults']:
results.append({
'content': result['content']['text'],
'score': result['score'],
'location': result.get('location', {}),
'metadata': result.get('metadata', {})
})
return results
# Usage
docs = retrieve_documents(
query="What is our remote work policy?",
kb_id="YOUR_KB_ID"
)
for doc in docs:
print(f"Score: {doc['score']:.3f}")
print(f"Content: {doc['content'][:200]}...")
print("---")End-to-end RAG: retrieves documents and generates a response.
def query_knowledge_base(
query: str,
kb_id: str,
model_id: str = "anthropic.claude-3-sonnet-20240229-v1:0"
) -> dict:
"""Query Knowledge Base with retrieval and generation."""
response = bedrock_agent.retrieve_and_generate(
input={'text': query},
retrieveAndGenerateConfiguration={
'type': 'KNOWLEDGE_BASE',
'knowledgeBaseConfiguration': {
'knowledgeBaseId': kb_id,
'modelArn': f'arn:aws:bedrock:us-east-1::foundation-model/{model_id}',
'retrievalConfiguration': {
'vectorSearchConfiguration': {
'numberOfResults': 5
}
},
'generationConfiguration': {
'promptTemplate': {
'textPromptTemplate': '''
You are a helpful assistant. Answer the question based on the provided context.
If you cannot find the answer in the context, say "I don't have information about that."
Context:
$search_results$
Question: $query$
Answer:'''
},
'inferenceConfig': {
'textInferenceConfig': {
'temperature': 0.0,
'topP': 0.9,
'maxTokens': 1000
}
}
}
}
}
)
return {
'answer': response['output']['text'],
'citations': response.get('citations', []),
'session_id': response.get('sessionId')
}
# Usage
result = query_knowledge_base(
query="What are the vacation day policies?",
kb_id="YOUR_KB_ID"
)
print(f"Answer: {result['answer']}")
print(f"\nCitations: {len(result['citations'])} sources used")For true streaming responses, first retrieve context using the Retrieve API, then stream generation using the Bedrock runtime:
import json
bedrock_runtime = boto3.client('bedrock-runtime')
def stream_with_context(query: str, context: list[str], model_id: str = "anthropic.claude-3-sonnet-20240229-v1:0"):
"""Stream generation with retrieved context."""
context_text = "\n\n".join(context)
body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1000,
"messages": [
{
"role": "user",
"content": f"Answer based on this context:\n\n{context_text}\n\nQuestion: {query}"
}
]
}
response = bedrock_runtime.invoke_model_with_response_stream(
modelId=model_id,
body=json.dumps(body)
)
for event in response['body']:
chunk = json.loads(event['chunk']['bytes'])
if chunk['type'] == 'content_block_delta':
yield chunk['delta'].get('text', '')This is the section that can save you thousands of dollars. Bedrock Knowledge Bases has hidden costs that catch nearly every team.
When you create a Knowledge Base using the default OpenSearch Serverless vector store, AWS creates an OpenSearch Serverless collection. Here is what they do not tell you upfront:
OpenSearch Serverless charges by OpenSearch Compute Units (OCUs).
Each OCU costs $0.24/hour.
Since June 2024, AWS supports fractional 0.5 OCU deployments, cutting minimum costs in half:
Production (with redundancy):
2 OCUs x $0.24/hour x 24 hours x 30 days = $345.60/month MINIMUMDev/Test (without redundancy):
1 OCU x $0.24/hour x 24 hours x 30 days = $172.80/month MINIMUMThat is still $175-350/month just for the vector store, before any queries, before any LLM calls, before any data transfer. And it never scales to zero. Even if you have zero queries, you pay the minimum.
Here is what a typical Knowledge Base actually costs:
| Component | Monthly Cost | Notes |
|---|---|---|
| OpenSearch Serverless (min) | $173-346 | 1-2 OCUs, never scales to zero |
| Titan Embeddings | $5-50 | Depends on document volume |
| Claude 3 Sonnet (generation) | $50-500 | Depends on query volume |
| S3 Storage | $1-10 | Usually negligible |
| Data Transfer | $5-20 | Depends on document size |
| Total (dev/test) | ~$300/month | With 1 OCU minimum |
| Total (production) | ~$500-800/month | With 2 OCU minimum |
Compare this to self-managed alternatives:
Option 1: Use Pinecone Instead
Bedrock Knowledge Bases supports Pinecone as a vector store. Pinecone:
# When creating Knowledge Base, specify Pinecone
storage_configuration = {
"type": "PINECONE",
"pineconeConfiguration": {
"connectionString": "https://your-index-name.svc.environment.pinecone.io",
"credentialsSecretArn": "arn:aws:secretsmanager:us-east-1:123456789012:secret:pinecone-api-key",
"fieldMapping": {
"metadataField": "metadata",
"textField": "text"
},
"namespace": "bedrock-kb"
}
}Option 2: Use Aurora PostgreSQL with pgvector
If you already have Aurora PostgreSQL, add pgvector:
CREATE EXTENSION IF NOT EXISTS vector;Then configure Knowledge Base to use it:
storage_configuration = {
"type": "RDS",
"rdsConfiguration": {
"credentialsSecretArn": "arn:aws:secretsmanager:...:secret:aurora-creds",
"databaseName": "knowledge_base",
"resourceArn": "arn:aws:rds:us-east-1:123456789012:cluster:my-aurora-cluster",
"tableName": "bedrock_kb_vectors",
"fieldMapping": {
"metadataField": "metadata",
"primaryKeyField": "id",
"textField": "text",
"vectorField": "embedding"
}
}
}Option 3: Use OpenSearch Managed Clusters
If you need OpenSearch specifically, use a managed cluster instead of serverless:
If you enable reranking (recommended for quality), add these costs:
| Model | Cost per 1000 queries |
|---|---|
| Cohere Rerank | $1.00 |
| Amazon Rerank (preview) | TBD |
For moderate usage (10K queries/month), add ~$10/month.
For SaaS applications, you need tenant isolation. Bedrock Knowledge Bases offers several approaches.
Single Knowledge Base, filter by tenant metadata:
def query_for_tenant(query: str, tenant_id: str, kb_id: str):
"""Query with tenant isolation via metadata filter."""
response = bedrock_agent.retrieve_and_generate(
input={'text': query},
retrieveAndGenerateConfiguration={
'type': 'KNOWLEDGE_BASE',
'knowledgeBaseConfiguration': {
'knowledgeBaseId': kb_id,
'modelArn': MODEL_ARN,
'retrievalConfiguration': {
'vectorSearchConfiguration': {
'numberOfResults': 5,
'filter': {
'equals': {
'key': 'tenant_id',
'value': tenant_id
}
}
}
}
}
}
)
return response['output']['text']Pros: Simple, cost-effective (single KB)
Cons: Metadata leakage risk if filter fails, all tenants share capacity
If using Pinecone, leverage namespaces:
# Different namespace per tenant
storage_configuration = {
"type": "PINECONE",
"pineconeConfiguration": {
"namespace": f"tenant-{tenant_id}",
# ... other config
}
}For maximum isolation, create separate KBs per tenant:
def get_tenant_kb(tenant_id: str) -> str:
"""Get or create Knowledge Base for tenant."""
kb_mapping = {
"tenant-a": "KB_ID_A",
"tenant-b": "KB_ID_B",
}
return kb_mapping.get(tenant_id)Pros: Complete isolation, per-tenant scaling
Cons: Management overhead, higher cost (especially with OpenSearch)
Here is a production-ready implementation bringing together everything covered:
"""
Production Bedrock Knowledge Base RAG Implementation
Complete example with error handling, streaming, and cost awareness.
"""
import boto3
import json
import logging
from typing import Generator, Optional
from dataclasses import dataclass
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
@dataclass
class RAGConfig:
"""Configuration for Bedrock RAG."""
knowledge_base_id: str
region: str = "us-east-1"
embedding_model: str = "amazon.titan-embed-text-v2:0"
generation_model: str = "anthropic.claude-3-sonnet-20240229-v1:0"
max_results: int = 5
temperature: float = 0.0
class BedrockRAG:
"""Production-ready Bedrock Knowledge Base RAG client."""
def __init__(self, config: RAGConfig):
self.config = config
self.bedrock_agent = boto3.client(
'bedrock-agent-runtime',
region_name=config.region
)
self.bedrock_runtime = boto3.client(
'bedrock-runtime',
region_name=config.region
)
def retrieve(
self,
query: str,
filters: Optional[dict] = None,
top_k: Optional[int] = None
) -> list[dict]:
"""
Retrieve relevant documents without generation.
Args:
query: Search query
filters: Optional metadata filters
top_k: Number of results (defaults to config)
Returns:
List of retrieved documents with scores
"""
retrieval_config = {
'vectorSearchConfiguration': {
'numberOfResults': top_k or self.config.max_results
}
}
if filters:
retrieval_config['vectorSearchConfiguration']['filter'] = filters
try:
response = self.bedrock_agent.retrieve(
knowledgeBaseId=self.config.knowledge_base_id,
retrievalQuery={'text': query},
retrievalConfiguration=retrieval_config
)
results = []
for result in response.get('retrievalResults', []):
results.append({
'content': result['content']['text'],
'score': result.get('score', 0),
'source': result.get('location', {}).get('s3Location', {}).get('uri', 'Unknown'),
'metadata': result.get('metadata', {})
})
logger.info(f"Retrieved {len(results)} documents for query: {query[:50]}...")
return results
except Exception as e:
logger.error(f"Retrieval failed: {e}")
raise
def query(
self,
question: str,
filters: Optional[dict] = None,
system_prompt: Optional[str] = None
) -> dict:
"""
Full RAG: retrieve and generate response.
Args:
question: User question
filters: Optional metadata filters
system_prompt: Optional custom system prompt
Returns:
Dict with answer, citations, and metadata
"""
model_arn = f"arn:aws:bedrock:{self.config.region}::foundation-model/{self.config.generation_model}"
prompt_template = system_prompt or """
You are a helpful assistant. Answer the question based only on the provided context.
If the context does not contain enough information, say "I don't have enough information to answer that."
Context:
$search_results$
Question: $query$
Answer:"""
generation_config = {
'type': 'KNOWLEDGE_BASE',
'knowledgeBaseConfiguration': {
'knowledgeBaseId': self.config.knowledge_base_id,
'modelArn': model_arn,
'retrievalConfiguration': {
'vectorSearchConfiguration': {
'numberOfResults': self.config.max_results
}
},
'generationConfiguration': {
'promptTemplate': {
'textPromptTemplate': prompt_template
},
'inferenceConfig': {
'textInferenceConfig': {
'temperature': self.config.temperature,
'topP': 0.9,
'maxTokens': 1000
}
}
}
}
}
if filters:
generation_config['knowledgeBaseConfiguration']['retrievalConfiguration']['vectorSearchConfiguration']['filter'] = filters
try:
response = self.bedrock_agent.retrieve_and_generate(
input={'text': question},
retrieveAndGenerateConfiguration=generation_config
)
# Extract citations
citations = []
for citation in response.get('citations', []):
for ref in citation.get('retrievedReferences', []):
citations.append({
'content': ref.get('content', {}).get('text', ''),
'source': ref.get('location', {}).get('s3Location', {}).get('uri', 'Unknown')
})
return {
'answer': response['output']['text'],
'citations': citations,
'session_id': response.get('sessionId')
}
except Exception as e:
logger.error(f"Query failed: {e}")
raise
def stream_response(
self,
question: str,
context: Optional[list[str]] = None
) -> Generator[str, None, None]:
"""
Stream a response with optional pre-retrieved context.
Args:
question: User question
context: Optional list of context strings (if None, retrieves automatically)
Yields:
Response tokens as they are generated
"""
# Get context if not provided
if context is None:
docs = self.retrieve(question)
context = [doc['content'] for doc in docs]
context_text = "\n\n---\n\n".join(context)
body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1000,
"temperature": self.config.temperature,
"messages": [
{
"role": "user",
"content": f"""Answer based on this context:
{context_text}
Question: {question}
Provide a clear, concise answer based only on the context above."""
}
]
}
try:
response = self.bedrock_runtime.invoke_model_with_response_stream(
modelId=self.config.generation_model,
body=json.dumps(body)
)
for event in response['body']:
chunk = json.loads(event['chunk']['bytes'])
if chunk['type'] == 'content_block_delta':
text = chunk['delta'].get('text', '')
if text:
yield text
except Exception as e:
logger.error(f"Streaming failed: {e}")
raise
def main():
"""Example usage."""
# Configure
config = RAGConfig(
knowledge_base_id="YOUR_KNOWLEDGE_BASE_ID",
region="us-east-1",
generation_model="anthropic.claude-3-haiku-20240307-v1:0", # Cheaper for demos
max_results=5,
temperature=0.0
)
rag = BedrockRAG(config)
# Example 1: Simple query
print("=" * 60)
print("SIMPLE QUERY")
print("=" * 60)
result = rag.query("What is our vacation policy?")
print(f"Answer: {result['answer']}")
print(f"\nSources: {len(result['citations'])} citations")
# Example 2: Filtered query (multi-tenant)
print("\n" + "=" * 60)
print("FILTERED QUERY (TENANT ISOLATION)")
print("=" * 60)
result = rag.query(
"What are the security requirements?",
filters={
'equals': {
'key': 'department',
'value': 'engineering'
}
}
)
print(f"Answer: {result['answer']}")
# Example 3: Streaming
print("\n" + "=" * 60)
print("STREAMING RESPONSE")
print("=" * 60)
print("Answer: ", end="", flush=True)
for token in rag.stream_response("How do I request time off?"):
print(token, end="", flush=True)
print("\n")
if __name__ == "__main__":
main()# 1. Install dependencies
pip install boto3
# 2. Configure AWS credentials
aws configure
# 3. Update YOUR_KNOWLEDGE_BASE_ID in the script
# 4. Run
python bedrock_rag.pyAWS Bedrock Knowledge Bases offers a compelling value proposition: production RAG without infrastructure management. But the trade-offs are significant.
Strengths:
Critical Considerations:
When to Choose Bedrock Knowledge Bases:
When to Choose Alternatives:
This article covered AWS's managed RAG offering. Continue with the series:
For production deployments, also explore:
This is Part 6 of the "Building RAG Systems: A Platform-by-Platform Guide" series. Next up: Azure AI Search RAG.
Discover more content: