AWS Bedrock Knowledge Bases: Managed RAG at Scale

When you want RAG without the infrastructure headache.

The Promise of Managed RAG

What if you could deploy a production RAG system without managing vector databases, chunking pipelines, or embedding infrastructure? AWS Bedrock Knowledge Bases offers exactly that: a fully managed RAG service that handles the entire pipeline from document ingestion to response generation.

If you have followed this series through LangChain (Part 2), LlamaIndex (Part 3), and Haystack (Part 4), you have seen how much infrastructure these frameworks require you to build and maintain. Bedrock Knowledge Bases abstracts all of that away.

Upload documents to S3. Point Knowledge Bases at them. Query with an API call. Done.

But this simplicity comes with trade-offs, particularly around cost, that every architect needs to understand before committing.

When Bedrock Knowledge Bases Excels

Choose Bedrock Knowledge Bases when:

You are already on AWS: Deep integration with S3, IAM, CloudWatch, and other services
You need zero infrastructure management: No servers, databases, or scaling configuration
Compliance matters: SOC 2, HIPAA, and FedRAMP compliance built in
Time-to-production is critical: Go from documents to deployed RAG in hours, not weeks
Your team lacks ML infrastructure expertise: Abstracts away vector databases, chunking, and embeddings

What You Give Up

The managed nature means:

Less control: You cannot customize chunking algorithms or retrieval logic at a deep level
Vendor lock-in: Your RAG pipeline is tightly coupled to AWS
Cost opacity: Easy to get surprised by bills (more on this below)
Limited embedding choices: Restricted to Bedrock-supported models

How Bedrock Knowledge Bases Works

Under the hood, Bedrock Knowledge Bases orchestrates several AWS services:

Diagram 1 from AWS Bedrock Knowledge Bases: Managed RAG at Scale

The key components:

Data Sources: S3 buckets, Confluence, SharePoint, Salesforce, or web crawlers
Ingestion Pipeline: Automatic chunking, cleaning, and embedding of documents
Vector Store: OpenSearch Serverless, OpenSearch Managed Cluster, Aurora PostgreSQL with pgvector, MongoDB Atlas, Neptune Analytics, Pinecone, Redis Enterprise Cloud, or Amazon S3 Vectors (preview)
Retrieval Engine: Semantic search with optional hybrid (keyword + semantic) mode
Generation: Integration with Bedrock foundation models for response synthesis

Prerequisites

Before creating a Knowledge Base, ensure you have:

AWS Account Setup

# Install AWS CLI
brew install awscli  # macOS
# or
pip install awscli

# Configure credentials
aws configure
# Enter: AWS Access Key ID, Secret Access Key, Region (us-east-1, us-west-2, etc.)

Required IAM Permissions

Create an IAM role for Bedrock with these permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:*",
        "s3:GetObject",
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "aoss:*"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "iam:PassRole"
      ],
      "Resource": "arn:aws:iam::*:role/AmazonBedrockExecutionRoleForKnowledgeBase*"
    }
  ]
}

Enable Bedrock Models

In the AWS Console, navigate to Bedrock and request access to:

Embedding models: Titan Embeddings, Cohere Embed
Foundation models: Claude 3 Sonnet/Haiku, Titan Text, etc.

Model access requests are typically approved instantly for most models.

Creating a Knowledge Base

Option 1: AWS Console (Quick Start)

Navigate to Amazon Bedrock > Knowledge bases > Create knowledge base
Enter a name and description
Create or select an IAM role (Bedrock can create one automatically)
Choose your vector store:
- Amazon OpenSearch Serverless (default, managed, always-on billing)
- Amazon OpenSearch Managed Cluster (self-managed, more control)
- Amazon Aurora PostgreSQL (existing database with pgvector)
- Amazon Neptune Analytics (graph + vector, for relationship-heavy data)
- Amazon S3 Vectors (preview, cost-effective for large scale)
- MongoDB Atlas (third-party, existing MongoDB users)
- Pinecone (third-party, avoids OpenSearch costs)
- Redis Enterprise Cloud (third-party, low-latency)
Select embedding model (Titan Embeddings recommended for cost)
Add data source (S3 bucket with your documents)
Configure chunking strategy
Create and sync

Option 2: AWS CLI

# Create the knowledge base
aws bedrock-agent create-knowledge-base \
  --name "company-docs-kb" \
  --description "Internal documentation knowledge base" \
  --role-arn "arn:aws:iam::123456789012:role/BedrockKBRole" \
  --knowledge-base-configuration '{
    "type": "VECTOR",
    "vectorKnowledgeBaseConfiguration": {
      "embeddingModelArn": "arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0"
    }
  }' \
  --storage-configuration '{
    "type": "OPENSEARCH_SERVERLESS",
    "opensearchServerlessConfiguration": {
      "collectionArn": "arn:aws:aoss:us-east-1:123456789012:collection/abc123",
      "fieldMapping": {
        "metadataField": "metadata",
        "textField": "text",
        "vectorField": "vector"
      },
      "vectorIndexName": "company-docs-index"
    }
  }'

# Note the knowledgeBaseId from the response
KNOWLEDGE_BASE_ID="ABC123XYZ"

# Add S3 data source
aws bedrock-agent create-data-source \
  --knowledge-base-id $KNOWLEDGE_BASE_ID \
  --name "s3-documents" \
  --data-source-configuration '{
    "type": "S3",
    "s3Configuration": {
      "bucketArn": "arn:aws:s3:::my-company-docs"
    }
  }' \
  --vector-ingestion-configuration '{
    "chunkingConfiguration": {
      "chunkingStrategy": "FIXED_SIZE",
      "fixedSizeChunkingConfiguration": {
        "maxTokens": 300,
        "overlapPercentage": 10
      }
    }
  }'

# Start ingestion
aws bedrock-agent start-ingestion-job \
  --knowledge-base-id $KNOWLEDGE_BASE_ID \
  --data-source-id "data-source-id-from-above"

Option 3: AWS CDK (Infrastructure as Code)

// lib/knowledge-base-stack.ts
import * as cdk from 'aws-cdk-lib';
import * as s3 from 'aws-cdk-lib/aws-s3';
import * as iam from 'aws-cdk-lib/aws-iam';
import * as bedrock from 'aws-cdk-lib/aws-bedrock';
import { Construct } from 'constructs';

export class KnowledgeBaseStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // S3 bucket for documents
    const docsBucket = new s3.Bucket(this, 'DocumentsBucket', {
      bucketName: 'company-docs-knowledge-base',
      removalPolicy: cdk.RemovalPolicy.RETAIN,
      versioned: true,
    });

    // IAM role for Bedrock
    const kbRole = new iam.Role(this, 'KnowledgeBaseRole', {
      assumedBy: new iam.ServicePrincipal('bedrock.amazonaws.com'),
      managedPolicies: [
        iam.ManagedPolicy.fromAwsManagedPolicyName('AmazonBedrockFullAccess'),
      ],
    });

    // Grant S3 read access
    docsBucket.grantRead(kbRole);

    // Knowledge Base (using L1 construct as L2 may not be available)
    const knowledgeBase = new bedrock.CfnKnowledgeBase(this, 'KnowledgeBase', {
      name: 'company-docs-kb',
      description: 'Company documentation knowledge base',
      roleArn: kbRole.roleArn,
      knowledgeBaseConfiguration: {
        type: 'VECTOR',
        vectorKnowledgeBaseConfiguration: {
          embeddingModelArn: `arn:aws:bedrock:${this.region}::foundation-model/amazon.titan-embed-text-v2:0`,
        },
      },
      storageConfiguration: {
        type: 'OPENSEARCH_SERVERLESS',
        opensearchServerlessConfiguration: {
          collectionArn: 'YOUR_COLLECTION_ARN', // Create separately or use Pinecone
          fieldMapping: {
            metadataField: 'metadata',
            textField: 'text',
            vectorField: 'vector',
          },
          vectorIndexName: 'docs-index',
        },
      },
    });

    // Data source
    const dataSource = new bedrock.CfnDataSource(this, 'S3DataSource', {
      knowledgeBaseId: knowledgeBase.attrKnowledgeBaseId,
      name: 's3-documents',
      dataSourceConfiguration: {
        type: 'S3',
        s3Configuration: {
          bucketArn: docsBucket.bucketArn,
        },
      },
      vectorIngestionConfiguration: {
        chunkingConfiguration: {
          chunkingStrategy: 'HIERARCHICAL',
          hierarchicalChunkingConfiguration: {
            levelConfigurations: [
              { maxTokens: 1500 },  // Parent chunks
              { maxTokens: 300 },   // Child chunks
            ],
            overlapTokens: 60,
          },
        },
      },
    });

    // Outputs
    new cdk.CfnOutput(this, 'KnowledgeBaseId', {
      value: knowledgeBase.attrKnowledgeBaseId,
    });
  }
}

Option 4: Terraform

# main.tf
provider "aws" {
  region = "us-east-1"
}

# S3 bucket for documents
resource "aws_s3_bucket" "docs" {
  bucket = "company-docs-knowledge-base-${random_id.suffix.hex}"
}

resource "random_id" "suffix" {
  byte_length = 4
}

# IAM role for Bedrock
resource "aws_iam_role" "bedrock_kb" {
  name = "AmazonBedrockExecutionRoleForKnowledgeBase"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "bedrock.amazonaws.com"
        }
      }
    ]
  })
}

resource "aws_iam_role_policy" "bedrock_kb_policy" {
  name = "bedrock-kb-policy"
  role = aws_iam_role.bedrock_kb.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "s3:GetObject",
          "s3:ListBucket"
        ]
        Resource = [
          aws_s3_bucket.docs.arn,
          "${aws_s3_bucket.docs.arn}/*"
        ]
      },
      {
        Effect = "Allow"
        Action = [
          "bedrock:InvokeModel"
        ]
        Resource = "*"
      },
      {
        Effect = "Allow"
        Action = [
          "aoss:APIAccessAll"
        ]
        Resource = "*"
      }
    ]
  })
}

# Knowledge Base
resource "aws_bedrockagent_knowledge_base" "main" {
  name     = "company-docs-kb"
  role_arn = aws_iam_role.bedrock_kb.arn

  knowledge_base_configuration {
    type = "VECTOR"
    vector_knowledge_base_configuration {
      embedding_model_arn = "arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0"
    }
  }

  storage_configuration {
    type = "OPENSEARCH_SERVERLESS"
    opensearch_serverless_configuration {
      collection_arn    = aws_opensearchserverless_collection.kb.arn
      vector_index_name = "docs-index"
      field_mapping {
        metadata_field = "metadata"
        text_field     = "text"
        vector_field   = "vector"
      }
    }
  }
}

# Data source
resource "aws_bedrockagent_data_source" "s3" {
  knowledge_base_id = aws_bedrockagent_knowledge_base.main.id
  name              = "s3-documents"

  data_source_configuration {
    type = "S3"
    s3_configuration {
      bucket_arn = aws_s3_bucket.docs.arn
    }
  }

  vector_ingestion_configuration {
    chunking_configuration {
      chunking_strategy = "FIXED_SIZE"
      fixed_size_chunking_configuration {
        max_tokens         = 300
        overlap_percentage = 10
      }
    }
  }
}

output "knowledge_base_id" {
  value = aws_bedrockagent_knowledge_base.main.id
}

Data Sources

Bedrock Knowledge Bases supports multiple data source types.

Amazon S3 (Primary)

The most common and flexible option:

import boto3

s3 = boto3.client('s3')

# Upload documents to S3
s3.upload_file(
    'company_handbook.pdf',
    'my-kb-bucket',
    'documents/company_handbook.pdf'
)

# Supported formats:
# - PDF (.pdf)
# - Plain text (.txt)
# - Markdown (.md)
# - HTML (.html)
# - Microsoft Word (.doc, .docx)
# - CSV (.csv)
# - Excel (.xls, .xlsx)

Folder structure recommendations:

my-kb-bucket/
![Diagram 2 from AWS Bedrock Knowledge Bases: Managed RAG at Scale](/images/blog/diagrams/building-rag-systems-aws-bedrock-diagram-2.webp)

Confluence Integration

# Using AWS CLI
aws bedrock-agent create-data-source \
  --knowledge-base-id $KB_ID \
  --name "confluence-docs" \
  --data-source-configuration '{
    "type": "CONFLUENCE",
    "confluenceConfiguration": {
      "sourceConfiguration": {
        "hostUrl": "https://your-domain.atlassian.net",
        "hostType": "CLOUD"
      },
      "authType": "OAUTH2_CLIENT_CREDENTIALS",
      "credentialsSecretArn": "arn:aws:secretsmanager:us-east-1:123456789012:secret:confluence-creds"
    }
  }'

SharePoint Integration

aws bedrock-agent create-data-source \
  --knowledge-base-id $KB_ID \
  --name "sharepoint-docs" \
  --data-source-configuration '{
    "type": "SHAREPOINT",
    "sharePointConfiguration": {
      "sourceConfiguration": {
        "tenantId": "your-tenant-id",
        "domain": "your-domain",
        "siteUrls": ["https://your-domain.sharepoint.com/sites/docs"]
      },
      "authType": "OAUTH2_CLIENT_CREDENTIALS",
      "credentialsSecretArn": "arn:aws:secretsmanager:us-east-1:123456789012:secret:sharepoint-creds"
    }
  }'

Salesforce Integration

aws bedrock-agent create-data-source \
  --knowledge-base-id $KB_ID \
  --name "salesforce-knowledge" \
  --data-source-configuration '{
    "type": "SALESFORCE",
    "salesforceConfiguration": {
      "sourceConfiguration": {
        "hostUrl": "https://your-org.salesforce.com"
      },
      "authType": "OAUTH2_CLIENT_CREDENTIALS",
      "credentialsSecretArn": "arn:aws:secretsmanager:us-east-1:123456789012:secret:salesforce-creds"
    }
  }'

Web Crawler

aws bedrock-agent create-data-source \
  --knowledge-base-id $KB_ID \
  --name "web-docs" \
  --data-source-configuration '{
    "type": "WEB",
    "webConfiguration": {
      "sourceConfiguration": {
        "urlConfiguration": {
          "seedUrls": [
            {"url": "https://docs.example.com"}
          ]
        }
      },
      "crawlerConfiguration": {
        "crawlerLimits": {
          "rateLimit": 10
        },
        "inclusionFilters": [".*\\.html$", ".*\\.md$"],
        "exclusionFilters": [".*login.*", ".*admin.*"],
        "scope": "HOST_ONLY"
      }
    }
  }'

Chunking Strategies

Chunking configuration significantly impacts retrieval quality. Bedrock offers several strategies:

Fixed Size Chunking (Default)

Simple and predictable. Good for uniform content.

{
  "chunkingStrategy": "FIXED_SIZE",
  "fixedSizeChunkingConfiguration": {
    "maxTokens": 300,
    "overlapPercentage": 10
  }
}

Hierarchical Chunking

Creates parent and child chunks. Parents provide context, children provide precision.

{
  "chunkingStrategy": "HIERARCHICAL",
  "hierarchicalChunkingConfiguration": {
    "levelConfigurations": [
      { "maxTokens": 1500 },
      { "maxTokens": 300 }
    ],
    "overlapTokens": 60
  }
}

Best for: Long documents where context matters, technical documentation, legal contracts.

Semantic Chunking

Uses the embedding model to find natural boundaries. Creates chunks based on semantic similarity.

{
  "chunkingStrategy": "SEMANTIC",
  "semanticChunkingConfiguration": {
    "maxTokens": 300,
    "bufferSize": 0,
    "breakpointPercentileThreshold": 95
  }
}

Best for: Varied content types, documents without clear structure.

No Chunking

Treats each document as a single chunk. Use only for short documents.

{
  "chunkingStrategy": "NONE"
}

Custom Chunking

For maximum control, preprocess documents yourself and use custom metadata:

# Preprocess documents with custom chunking logic
import json

def custom_chunk(document_text, chunk_size=500):
    # Your custom logic here
    chunks = []
    # ... chunking implementation
    return chunks

# Upload preprocessed chunks with metadata
for i, chunk in enumerate(chunks):
    s3.put_object(
        Bucket='my-kb-bucket',
        Key=f'preprocessed/doc1_chunk_{i}.txt',
        Body=chunk,
        Metadata={
            'x-amz-meta-source': 'original_document.pdf',
            'x-amz-meta-chunk-index': str(i),
            'x-amz-meta-section': 'chapter-3'
        }
    )

Advanced Features

Hybrid Search

Combine semantic (vector) and keyword (lexical) search:

import boto3

bedrock_agent = boto3.client('bedrock-agent-runtime')

response = bedrock_agent.retrieve(
    knowledgeBaseId='YOUR_KB_ID',
    retrievalQuery={
        'text': 'What is our vacation policy?'
    },
    retrievalConfiguration={
        'vectorSearchConfiguration': {
            'numberOfResults': 10,
            'overrideSearchType': 'HYBRID'  # SEMANTIC or HYBRID
        }
    }
)

Metadata Filtering

Filter retrieval based on document metadata:

response = bedrock_agent.retrieve(
    knowledgeBaseId='YOUR_KB_ID',
    retrievalQuery={
        'text': 'security requirements'
    },
    retrievalConfiguration={
        'vectorSearchConfiguration': {
            'numberOfResults': 10,
            'filter': {
                'andAll': [
                    {
                        'equals': {
                            'key': 'department',
                            'value': 'engineering'
                        }
                    },
                    {
                        'greaterThan': {
                            'key': 'year',
                            'value': 2023
                        }
                    }
                ]
            }
        }
    }
)

Supported filter operators:

equals, notEquals
greaterThan, greaterThanOrEquals
lessThan, lessThanOrEquals
in, notIn
startsWith
stringContains (partial string matching)
listContains (check if value exists in list field)
andAll, orAll (for combining conditions)

Query Reformulation

Bedrock can automatically rewrite queries for better retrieval:

response = bedrock_agent.retrieve_and_generate(
    input={
        'text': 'pto days'  # Informal query
    },
    retrieveAndGenerateConfiguration={
        'type': 'KNOWLEDGE_BASE',
        'knowledgeBaseConfiguration': {
            'knowledgeBaseId': 'YOUR_KB_ID',
            'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0',
            'orchestrationConfiguration': {
                'queryTransformationConfiguration': {
                    'type': 'QUERY_DECOMPOSITION'  # Breaks complex queries into sub-queries
                }
            }
        }
    }
)

Reranking

Improve retrieval precision with a reranking model:

response = bedrock_agent.retrieve(
    knowledgeBaseId='YOUR_KB_ID',
    retrievalQuery={
        'text': 'How do I configure SSO?'
    },
    retrievalConfiguration={
        'vectorSearchConfiguration': {
            'numberOfResults': 20,  # Retrieve more candidates
            'rerankingConfiguration': {
                'type': 'BEDROCK_RERANKING_MODEL',
                'bedrockRerankingConfiguration': {
                    'modelConfiguration': {
                        'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/cohere.rerank-v3-5:0'
                    },
                    'numberOfRerankedResults': 5  # Return top 5 after reranking
                }
            }
        }
    }
)

API Usage

Bedrock provides two primary APIs for querying Knowledge Bases.

Retrieve API

Returns relevant chunks without generating a response. Use when you want to control generation yourself.

import boto3

bedrock_agent = boto3.client('bedrock-agent-runtime')

def retrieve_documents(query: str, kb_id: str, top_k: int = 5):
    """Retrieve relevant documents from Knowledge Base."""

    response = bedrock_agent.retrieve(
        knowledgeBaseId=kb_id,
        retrievalQuery={'text': query},
        retrievalConfiguration={
            'vectorSearchConfiguration': {
                'numberOfResults': top_k
            }
        }
    )

    results = []
    for result in response['retrievalResults']:
        results.append({
            'content': result['content']['text'],
            'score': result['score'],
            'location': result.get('location', {}),
            'metadata': result.get('metadata', {})
        })

    return results

# Usage
docs = retrieve_documents(
    query="What is our remote work policy?",
    kb_id="YOUR_KB_ID"
)

for doc in docs:
    print(f"Score: {doc['score']:.3f}")
    print(f"Content: {doc['content'][:200]}...")
    print("---")

RetrieveAndGenerate API

End-to-end RAG: retrieves documents and generates a response.

def query_knowledge_base(
    query: str,
    kb_id: str,
    model_id: str = "anthropic.claude-3-sonnet-20240229-v1:0"
) -> dict:
    """Query Knowledge Base with retrieval and generation."""

    response = bedrock_agent.retrieve_and_generate(
        input={'text': query},
        retrieveAndGenerateConfiguration={
            'type': 'KNOWLEDGE_BASE',
            'knowledgeBaseConfiguration': {
                'knowledgeBaseId': kb_id,
                'modelArn': f'arn:aws:bedrock:us-east-1::foundation-model/{model_id}',
                'retrievalConfiguration': {
                    'vectorSearchConfiguration': {
                        'numberOfResults': 5
                    }
                },
                'generationConfiguration': {
                    'promptTemplate': {
                        'textPromptTemplate': '''
You are a helpful assistant. Answer the question based on the provided context.
If you cannot find the answer in the context, say "I don't have information about that."

Context:
$search_results$

Question: $query$

Answer:'''
                    },
                    'inferenceConfig': {
                        'textInferenceConfig': {
                            'temperature': 0.0,
                            'topP': 0.9,
                            'maxTokens': 1000
                        }
                    }
                }
            }
        }
    )

    return {
        'answer': response['output']['text'],
        'citations': response.get('citations', []),
        'session_id': response.get('sessionId')
    }

# Usage
result = query_knowledge_base(
    query="What are the vacation day policies?",
    kb_id="YOUR_KB_ID"
)

print(f"Answer: {result['answer']}")
print(f"\nCitations: {len(result['citations'])} sources used")

Streaming Responses

For true streaming responses, first retrieve context using the Retrieve API, then stream generation using the Bedrock runtime:

import json

bedrock_runtime = boto3.client('bedrock-runtime')

def stream_with_context(query: str, context: list[str], model_id: str = "anthropic.claude-3-sonnet-20240229-v1:0"):
    """Stream generation with retrieved context."""

    context_text = "\n\n".join(context)

    body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": [
            {
                "role": "user",
                "content": f"Answer based on this context:\n\n{context_text}\n\nQuestion: {query}"
            }
        ]
    }

    response = bedrock_runtime.invoke_model_with_response_stream(
        modelId=model_id,
        body=json.dumps(body)
    )

    for event in response['body']:
        chunk = json.loads(event['chunk']['bytes'])
        if chunk['type'] == 'content_block_delta':
            yield chunk['delta'].get('text', '')

CRITICAL: Cost Gotchas

This is the section that can save you thousands of dollars. Bedrock Knowledge Bases has hidden costs that catch nearly every team.

The OpenSearch Serverless Trap

When you create a Knowledge Base using the default OpenSearch Serverless vector store, AWS creates an OpenSearch Serverless collection. Here is what they do not tell you upfront:

OpenSearch Serverless charges by OpenSearch Compute Units (OCUs).

Each OCU costs $0.24/hour.

Since June 2024, AWS supports fractional 0.5 OCU deployments, cutting minimum costs in half:

Production (with redundancy):

1 OCU for indexing (0.5 OCU x 2 for HA)
1 OCU for search (0.5 OCU x 2 for HA)
Minimum: 2 OCUs total

2 OCUs x $0.24/hour x 24 hours x 30 days = $345.60/month MINIMUM

Dev/Test (without redundancy):

0.5 OCU for indexing
0.5 OCU for search
Minimum: 1 OCU total

1 OCU x $0.24/hour x 24 hours x 30 days = $172.80/month MINIMUM

That is still $175-350/month just for the vector store, before any queries, before any LLM calls, before any data transfer. And it never scales to zero. Even if you have zero queries, you pay the minimum.

Real Cost Breakdown

Here is what a typical Knowledge Base actually costs:

Component	Monthly Cost	Notes
OpenSearch Serverless (min)	$173-346	1-2 OCUs, never scales to zero
Titan Embeddings	$5-50	Depends on document volume
Claude 3 Sonnet (generation)	$50-500	Depends on query volume
S3 Storage	$1-10	Usually negligible
Data Transfer	$5-20	Depends on document size
Total (dev/test)	~$300/month	With 1 OCU minimum
Total (production)	~$500-800/month	With 2 OCU minimum

Compare this to self-managed alternatives:

Pinecone Starter: Free tier available, then ~$70/month
Supabase pgvector: ~$25/month
Self-hosted Qdrant: EC2 costs only (~$50-100/month)

How to Avoid the OpenSearch Tax

Option 1: Use Pinecone Instead

Bedrock Knowledge Bases supports Pinecone as a vector store. Pinecone:

Has a free tier (up to 100K vectors)
Scales to zero when idle
Costs ~$70/month for reasonable usage

# When creating Knowledge Base, specify Pinecone
storage_configuration = {
    "type": "PINECONE",
    "pineconeConfiguration": {
        "connectionString": "https://your-index-name.svc.environment.pinecone.io",
        "credentialsSecretArn": "arn:aws:secretsmanager:us-east-1:123456789012:secret:pinecone-api-key",
        "fieldMapping": {
            "metadataField": "metadata",
            "textField": "text"
        },
        "namespace": "bedrock-kb"
    }
}

Option 2: Use Aurora PostgreSQL with pgvector

If you already have Aurora PostgreSQL, add pgvector:

CREATE EXTENSION IF NOT EXISTS vector;

Then configure Knowledge Base to use it:

storage_configuration = {
    "type": "RDS",
    "rdsConfiguration": {
        "credentialsSecretArn": "arn:aws:secretsmanager:...:secret:aurora-creds",
        "databaseName": "knowledge_base",
        "resourceArn": "arn:aws:rds:us-east-1:123456789012:cluster:my-aurora-cluster",
        "tableName": "bedrock_kb_vectors",
        "fieldMapping": {
            "metadataField": "metadata",
            "primaryKeyField": "id",
            "textField": "text",
            "vectorField": "embedding"
        }
    }
}

Option 3: Use OpenSearch Managed Clusters

If you need OpenSearch specifically, use a managed cluster instead of serverless:

Managed clusters can use smaller instance types
Can be stopped when not in use
More predictable pricing

Reranking Model Costs

If you enable reranking (recommended for quality), add these costs:

Model	Cost per 1000 queries
Cohere Rerank	$1.00
Amazon Rerank (preview)	TBD

For moderate usage (10K queries/month), add ~$10/month.

Cost Optimization Checklist

Never use OpenSearch Serverless for non-production workloads
Use Pinecone or Aurora for dev/test environments
Monitor OCU usage in CloudWatch - you cannot turn them off, but you can understand burn rate
Use Titan Embeddings over Cohere - Titan is cheaper for most use cases
Use Claude 3 Haiku for simple queries - 10x cheaper than Sonnet
Set up billing alerts - before the first $700 surprise

Multi-Tenant Architecture

For SaaS applications, you need tenant isolation. Bedrock Knowledge Bases offers several approaches.

Option 1: Metadata-Based Isolation

Single Knowledge Base, filter by tenant metadata:

def query_for_tenant(query: str, tenant_id: str, kb_id: str):
    """Query with tenant isolation via metadata filter."""

    response = bedrock_agent.retrieve_and_generate(
        input={'text': query},
        retrieveAndGenerateConfiguration={
            'type': 'KNOWLEDGE_BASE',
            'knowledgeBaseConfiguration': {
                'knowledgeBaseId': kb_id,
                'modelArn': MODEL_ARN,
                'retrievalConfiguration': {
                    'vectorSearchConfiguration': {
                        'numberOfResults': 5,
                        'filter': {
                            'equals': {
                                'key': 'tenant_id',
                                'value': tenant_id
                            }
                        }
                    }
                }
            }
        }
    )

    return response['output']['text']

Pros: Simple, cost-effective (single KB)
Cons: Metadata leakage risk if filter fails, all tenants share capacity

Option 2: Namespace Isolation (Pinecone)

If using Pinecone, leverage namespaces:

# Different namespace per tenant
storage_configuration = {
    "type": "PINECONE",
    "pineconeConfiguration": {
        "namespace": f"tenant-{tenant_id}",
        # ... other config
    }
}

Option 3: Separate Knowledge Bases

For maximum isolation, create separate KBs per tenant:

def get_tenant_kb(tenant_id: str) -> str:
    """Get or create Knowledge Base for tenant."""

    kb_mapping = {
        "tenant-a": "KB_ID_A",
        "tenant-b": "KB_ID_B",
    }

    return kb_mapping.get(tenant_id)

Pros: Complete isolation, per-tenant scaling
Cons: Management overhead, higher cost (especially with OpenSearch)

Complete Working Example

Here is a production-ready implementation bringing together everything covered:

"""
Production Bedrock Knowledge Base RAG Implementation
Complete example with error handling, streaming, and cost awareness.
"""

import boto3
import json
import logging
from typing import Generator, Optional
from dataclasses import dataclass

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


@dataclass
class RAGConfig:
    """Configuration for Bedrock RAG."""
    knowledge_base_id: str
    region: str = "us-east-1"
    embedding_model: str = "amazon.titan-embed-text-v2:0"
    generation_model: str = "anthropic.claude-3-sonnet-20240229-v1:0"
    max_results: int = 5
    temperature: float = 0.0


class BedrockRAG:
    """Production-ready Bedrock Knowledge Base RAG client."""

    def __init__(self, config: RAGConfig):
        self.config = config
        self.bedrock_agent = boto3.client(
            'bedrock-agent-runtime',
            region_name=config.region
        )
        self.bedrock_runtime = boto3.client(
            'bedrock-runtime',
            region_name=config.region
        )

    def retrieve(
        self,
        query: str,
        filters: Optional[dict] = None,
        top_k: Optional[int] = None
    ) -> list[dict]:
        """
        Retrieve relevant documents without generation.

        Args:
            query: Search query
            filters: Optional metadata filters
            top_k: Number of results (defaults to config)

        Returns:
            List of retrieved documents with scores
        """
        retrieval_config = {
            'vectorSearchConfiguration': {
                'numberOfResults': top_k or self.config.max_results
            }
        }

        if filters:
            retrieval_config['vectorSearchConfiguration']['filter'] = filters

        try:
            response = self.bedrock_agent.retrieve(
                knowledgeBaseId=self.config.knowledge_base_id,
                retrievalQuery={'text': query},
                retrievalConfiguration=retrieval_config
            )

            results = []
            for result in response.get('retrievalResults', []):
                results.append({
                    'content': result['content']['text'],
                    'score': result.get('score', 0),
                    'source': result.get('location', {}).get('s3Location', {}).get('uri', 'Unknown'),
                    'metadata': result.get('metadata', {})
                })

            logger.info(f"Retrieved {len(results)} documents for query: {query[:50]}...")
            return results

        except Exception as e:
            logger.error(f"Retrieval failed: {e}")
            raise

    def query(
        self,
        question: str,
        filters: Optional[dict] = None,
        system_prompt: Optional[str] = None
    ) -> dict:
        """
        Full RAG: retrieve and generate response.

        Args:
            question: User question
            filters: Optional metadata filters
            system_prompt: Optional custom system prompt

        Returns:
            Dict with answer, citations, and metadata
        """
        model_arn = f"arn:aws:bedrock:{self.config.region}::foundation-model/{self.config.generation_model}"

        prompt_template = system_prompt or """
You are a helpful assistant. Answer the question based only on the provided context.
If the context does not contain enough information, say "I don't have enough information to answer that."

Context:
$search_results$

Question: $query$

Answer:"""

        generation_config = {
            'type': 'KNOWLEDGE_BASE',
            'knowledgeBaseConfiguration': {
                'knowledgeBaseId': self.config.knowledge_base_id,
                'modelArn': model_arn,
                'retrievalConfiguration': {
                    'vectorSearchConfiguration': {
                        'numberOfResults': self.config.max_results
                    }
                },
                'generationConfiguration': {
                    'promptTemplate': {
                        'textPromptTemplate': prompt_template
                    },
                    'inferenceConfig': {
                        'textInferenceConfig': {
                            'temperature': self.config.temperature,
                            'topP': 0.9,
                            'maxTokens': 1000
                        }
                    }
                }
            }
        }

        if filters:
            generation_config['knowledgeBaseConfiguration']['retrievalConfiguration']['vectorSearchConfiguration']['filter'] = filters

        try:
            response = self.bedrock_agent.retrieve_and_generate(
                input={'text': question},
                retrieveAndGenerateConfiguration=generation_config
            )

            # Extract citations
            citations = []
            for citation in response.get('citations', []):
                for ref in citation.get('retrievedReferences', []):
                    citations.append({
                        'content': ref.get('content', {}).get('text', ''),
                        'source': ref.get('location', {}).get('s3Location', {}).get('uri', 'Unknown')
                    })

            return {
                'answer': response['output']['text'],
                'citations': citations,
                'session_id': response.get('sessionId')
            }

        except Exception as e:
            logger.error(f"Query failed: {e}")
            raise

    def stream_response(
        self,
        question: str,
        context: Optional[list[str]] = None
    ) -> Generator[str, None, None]:
        """
        Stream a response with optional pre-retrieved context.

        Args:
            question: User question
            context: Optional list of context strings (if None, retrieves automatically)

        Yields:
            Response tokens as they are generated
        """
        # Get context if not provided
        if context is None:
            docs = self.retrieve(question)
            context = [doc['content'] for doc in docs]

        context_text = "\n\n---\n\n".join(context)

        body = {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 1000,
            "temperature": self.config.temperature,
            "messages": [
                {
                    "role": "user",
                    "content": f"""Answer based on this context:

{context_text}

Question: {question}

Provide a clear, concise answer based only on the context above."""
                }
            ]
        }

        try:
            response = self.bedrock_runtime.invoke_model_with_response_stream(
                modelId=self.config.generation_model,
                body=json.dumps(body)
            )

            for event in response['body']:
                chunk = json.loads(event['chunk']['bytes'])
                if chunk['type'] == 'content_block_delta':
                    text = chunk['delta'].get('text', '')
                    if text:
                        yield text

        except Exception as e:
            logger.error(f"Streaming failed: {e}")
            raise


def main():
    """Example usage."""

    # Configure
    config = RAGConfig(
        knowledge_base_id="YOUR_KNOWLEDGE_BASE_ID",
        region="us-east-1",
        generation_model="anthropic.claude-3-haiku-20240307-v1:0",  # Cheaper for demos
        max_results=5,
        temperature=0.0
    )

    rag = BedrockRAG(config)

    # Example 1: Simple query
    print("=" * 60)
    print("SIMPLE QUERY")
    print("=" * 60)

    result = rag.query("What is our vacation policy?")
    print(f"Answer: {result['answer']}")
    print(f"\nSources: {len(result['citations'])} citations")

    # Example 2: Filtered query (multi-tenant)
    print("\n" + "=" * 60)
    print("FILTERED QUERY (TENANT ISOLATION)")
    print("=" * 60)

    result = rag.query(
        "What are the security requirements?",
        filters={
            'equals': {
                'key': 'department',
                'value': 'engineering'
            }
        }
    )
    print(f"Answer: {result['answer']}")

    # Example 3: Streaming
    print("\n" + "=" * 60)
    print("STREAMING RESPONSE")
    print("=" * 60)

    print("Answer: ", end="", flush=True)
    for token in rag.stream_response("How do I request time off?"):
        print(token, end="", flush=True)
    print("\n")


if __name__ == "__main__":
    main()

Running the Example

# 1. Install dependencies
pip install boto3

# 2. Configure AWS credentials
aws configure

# 3. Update YOUR_KNOWLEDGE_BASE_ID in the script

# 4. Run
python bedrock_rag.py

Summary

AWS Bedrock Knowledge Bases offers a compelling value proposition: production RAG without infrastructure management. But the trade-offs are significant.

Strengths:

Zero infrastructure to manage
Deep AWS integration (IAM, S3, CloudWatch)
Enterprise compliance built in
Multiple data source connectors
Advanced features (hybrid search, reranking, query reformulation)

Critical Considerations:

OpenSearch Serverless costs ~$700/month minimum
Use Pinecone or Aurora to avoid the OpenSearch tax
Limited customization compared to frameworks
Vendor lock-in to AWS

When to Choose Bedrock Knowledge Bases:

Already heavily invested in AWS
Need compliance certifications
Want to minimize infrastructure management
Budget accommodates the costs
Team lacks RAG infrastructure expertise

When to Choose Alternatives:

Cost-sensitive workloads
Need deep customization
Want multi-cloud flexibility
Already have vector database infrastructure

Next Steps

This article covered AWS's managed RAG offering. Continue with the series:

Part 7: Azure AI Search RAG - Microsoft's managed RAG approach
Part 1: RAG Foundations - Review core concepts if needed
Part 2-4: Framework deep-dives if you want more control than managed services offer

For production deployments, also explore:

Production RAG Architecture - Scaling patterns
Chunking Strategies - Optimize retrieval quality

This is Part 6 of the "Building RAG Systems: A Platform-by-Platform Guide" series. Next up: Azure AI Search RAG.