An opinionated 2026 decision guide to AWS vector stores. OpenSearch is the default for production RAG, Bedrock Knowledge Bases for zero-ops teams, Neptune Analytics for graph plus vector, with pgvector, MemoryDB, and S3 Vectors as situational picks.
There is no single product called "the AWS vector database." AWS ships at least eight services that can store and query embeddings, and none of them was born as a pure vector engine. Each one bolted vector indexing onto an existing storage heritage: a search cluster, a relational engine, an in-memory cache, a graph engine, object storage. That heritage is exactly what determines whether a service is right for your workload, and picking the wrong one can cost five to ten times more or cap you at the wrong scale.
This is a decision guide, not a feature tour. The short version: Amazon OpenSearch Service is the default for most production RAG and semantic search. Bedrock Knowledge Bases is the right call when zero operational burden matters more than tuning. Neptune Analytics earns its place when you need graph traversal and vector similarity in the same query. pgvector, MemoryDB, and the newly GA Amazon S3 Vectors are situational. Start from the shape of your workload, not the service brochure.
Decide From Workload Shape, Not the Service List
Before comparing services, pin down six dimensions. They map almost directly onto the AWS option that fits.
- Latency SLO. Sub-millisecond, sub-100 ms, or seconds-tolerant? This single number eliminates most options.
- Recall and distance metrics. What recall do you need at your corpus size, and do you need cosine, Euclidean, dot product, or all three?
- Corpus size. Millions of vectors behaves very differently from billions. RAM-resident indexes hit a wall that disk-backed ones do not.
- Filter cardinality and hybrid retrieval. Do you need BM25 keyword relevance plus vector similarity plus structured metadata filters in one query? Pure vector search misses exact-match terms.
- Write freshness. Real-time change data capture, or batch reindexing on a schedule?
- Ops model and cost shape. Can your team run and tune a cluster, or do you need fully managed? Pay-per-query, provisioned capacity, or reserved?
AWS publishes its own vector database comparison in the Prescriptive Guidance library, and it is a useful sanity check. The reason AWS offers so many overlapping options is that it extended services it already had rather than building one monolithic vector database. A data engineer running OpenSearch, an app developer on Aurora, and an ML engineer living inside Bedrock each get a vector path that fits their existing stack. That is convenient for AWS and confusing for buyers.
Amazon OpenSearch Service: the Default for Production RAG
For high-QPS, hybrid, filtered retrieval at scale, Amazon OpenSearch Service is the service to beat. It runs approximate k-NN with HNSW (in-memory, the default) and IVF (disk-friendly for very large corpora), supports vectors up to 16,000 dimensions, and combines keyword relevance, vector similarity, and metadata filters in a single compound query. That hybrid capability is the main reason it wins production RAG: dense retrieval alone routinely misses exact identifiers, SKUs, and rare terms that BM25 catches, and you can normalize and fuse the two with min-max or reciprocal rank fusion.
On the engine side, FAISS is the recommended default for large-scale workloads and Lucene is a solid choice for smaller deployments with automatic filtering strategy selection. NMSLIB is on its way out: it has been deprecated since OpenSearch 2.19 and blocked for new indexes starting in 3.0, so new builds should standardize on FAISS or Lucene. Quantization is where you control the bill. Byte (scalar) quantization cuts the memory footprint roughly 4x with limited recall loss; FP16, product quantization, and binary quantization trade further recall for RAM. We cover the parameter tuning in depth in Scaling Vector Search with OpenSearch, and the broader cluster design in AWS OpenSearch Service Architecture, Setup, and Best Practices.
The managed-versus-serverless split is a real fork. Managed domains give you instance-type control (r7g, r8g), explicit shard and replica counts, reserved-instance discounts, and the best unit economics at steady-state scale. OpenSearch Serverless vector collections bill in OCUs at $0.24 per OCU-hour and auto-scale, but the redundancy default instantiates a minimum of two indexing and two search half-OCUs that run continuously. That works out to roughly $350 per month even at zero traffic; disabling redundancy for dev/test cuts the floor to about $175. The idle-bill surprise on Serverless is real, and it is the most common cost complaint we see. The other gotcha is sizing: HNSW indexes live in RAM, so instance memory must cover the vector footprint plus JVM overhead, or queries spill and latency degrades.
Pick OpenSearch when sustained QPS is high, you need hybrid BM25-plus-vector-plus-filter retrieval, multi-tenant index isolation, or fine-grained access control, or your team already runs OpenSearch.
Amazon Bedrock Knowledge Bases: Zero-Ops RAG in a Box
Amazon Bedrock Knowledge Bases is a managed RAG pipeline that handles ingestion, chunking, embedding, vector storage, and retrieval behind a single API, so a team can stand up a knowledge base without operating any vector infrastructure. It exposes two retrieval modes, Retrieve (return chunks) and RetrieveAndGenerate (chunks plus a generated answer), and supports fixed, semantic, and hierarchical chunking.
The trap is thinking "managed" means "one price." You pay across three layers that compound: embedding tokens at ingestion and query time, the underlying vector store bill (OCUs, Aurora capacity, S3 Vectors charges, and so on), and generation-model tokens when you use RetrieveAndGenerate. Knowledge Bases abstracts the retrieval API but not the backend bill. Embedding pricing on Titan Text Embeddings v2 and Cohere Embed v3 is low per thousand tokens but adds up across a full-corpus re-embed; check the live numbers on the Bedrock pricing page rather than trusting a figure from a blog, and our AWS Bedrock Pricing Guide walks through how the layers stack. The backend list in 2026 is broad: Amazon Bedrock Knowledge Bases supports Aurora PostgreSQL, OpenSearch Serverless and managed clusters, Neptune Analytics, S3 Vectors, Pinecone, MongoDB Atlas, and Redis Enterprise Cloud. For background on how Knowledge Bases fits alongside Bedrock Agents and foundation models, see Amazon Bedrock Explained.
Use Bedrock KB when time-to-demo is under a week, the team lacks OpenSearch ops skills, you want built-in CloudTrail logging and Bedrock Guardrails, or you are wiring a knowledge base into Bedrock Agents. Outgrow it when you need custom chunking, an external reranker, hybrid BM25 tuning the API does not expose, or per-query cost optimization at high volume, where querying OpenSearch directly is cheaper.
Neptune Analytics: Vectors and Graph in One Query
Neptune Analytics is the right pick when relationships matter as much as similarity. It stores vectors as node properties and runs openCypher queries that combine graph pattern matching with vector similarity through functions like topKByNode and topKByEmbedding, supporting cosine and Euclidean distance. The pattern it unlocks is GraphRAG: find semantically similar nodes, then walk relationships for context. When you select Neptune Analytics as a Bedrock Knowledge Bases backend, Bedrock automatically extracts entities and relationships and builds the graph for you, then uses those links to improve retrieval accuracy.
This shines for multi-hop reasoning. "Find every supplier connected to this recalled component" is a graph traversal seeded by semantic search, and no flat vector index answers it cleanly. Medical ontologies, legal citation networks, supply-chain lineage, and fraud or identity-resolution workloads are the sweet spot. Pricing is capacity-based: you provision m-NCUs (each m-NCU is roughly 1 GB of memory plus compute), and graphs can now launch at 32 or 64 m-NCU, down from a former 128 m-NCU floor. Running graphs bill at the full hourly rate, paused graphs at 10%, and snapshots per GB-month; confirm current rates on the Neptune pricing page. There is no scale-to-zero, so a graph is an always-on cost.
The anti-patterns are clear. Flat-document RAG with no entity relationships is over-engineered on Neptune Analytics and cheaper on OpenSearch. High-QPS semantic search over millions of unstructured chunks belongs on OpenSearch too. And if you need sub-10 ms latency, Neptune Analytics operates in the sub-second range, not the single-digit-millisecond range.
The Situational Options: pgvector, MemoryDB, S3 Vectors
Three more services are worth keeping in the toolbox, each for a narrow reason.
Aurora PostgreSQL with pgvector keeps embeddings next to relational data, with HNSW and IVFFlat indexes and 10-100 ms typical query latency. The deciding factor is whether you already run Aurora and want SQL joins between vector results and relational rows. It holds up well below roughly 10-50 million rows; past that, the lack of horizontal sharding for vector indexes becomes the ceiling and a move to OpenSearch is the usual exit. Aurora I/O-Optimized pricing removes per-I/O charges, which matters for write-heavy vector workloads.
MemoryDB is the latency play. It is a Valkey- and Redis OSS-compatible in-memory database with HNSW indexing and dimensions up to 32,768. AWS measures single-digit millisecond vector search at over 99% recall, which is faster than the disk-backed options but not literally sub-millisecond for vector queries. Because storage is RAM-only, cost per GB is steep, so it fits small corpora, online feature stores, session memory, and real-time personalization rather than large archives.
Amazon S3 Vectors changed the cost equation when it went GA on December 2, 2025. It is serverless object-storage-native vector search billed on storage, per-query API calls, and data processed, with no provisioned floor. At GA it scales to two billion vectors per index and 10,000 indexes per bucket, returns infrequent queries in under a second and warmer queries around 100 ms or less, and supports up to 4,096 dimensions. AWS claims up to 90% lower cost than provisioned vector indexes for upload, storage, and query. The trade-off is latency: this is a cold and warm tier, not a hot-path engine. The strongest pattern is a cold tier behind a hot OpenSearch index, which we walk through in OpenSearch with S3 Vectors: Cost-Efficient Hybrid Search. It also slots in as a low-cost Bedrock Knowledge Bases backend for archival RAG.
The Decision Matrix
Map your dominant constraint to a service. The table is the TL;DR; the flowchart below resolves ties.
| Workload archetype | Recommended AWS service |
|---|---|
| Production RAG, high QPS, hybrid search and filters | OpenSearch Service (managed domain) |
| Zero-ops prototype or Bedrock-native agents | Bedrock Knowledge Bases (on OpenSearch Serverless or S3 Vectors) |
| Graph plus vector (entity-linked RAG, fraud, lineage) | Neptune Analytics |
| Vectors alongside relational data, under ~50M rows | Aurora PostgreSQL with pgvector |
| Single-digit-ms latency, small corpus, real-time features | MemoryDB |
| Archival or cold-tier, batch queries, cost-first | S3 Vectors |
Work the questions in order and stop at the first match:
- Do you need graph traversal combined with vector similarity? Use Neptune Analytics.
- Is operational burden your top constraint and the corpus under ~10M chunks? Use Bedrock Knowledge Bases.
- Do you need hybrid BM25 plus vector plus complex filters at scale? Use OpenSearch managed.
- Is cost the primary driver and your latency tolerance above ~100 ms? Use S3 Vectors.
- Are vectors just a feature of an existing PostgreSQL app? Use pgvector.
- Do you need single-digit-millisecond reads on a small corpus? Use MemoryDB.
What we would build starting today: production at scale on an OpenSearch managed domain (r7g/r8g, HNSW with byte quantization, hybrid search on); prototypes on Bedrock Knowledge Bases backed by S3 Vectors for the lowest floor or OpenSearch Serverless for faster queries. S3 Vectors is the one to re-evaluate every couple of quarters now that it is GA, because if its latency and feature set keep maturing it becomes the default cold tier behind almost every hot index.
Key takeaways
- There is no single AWS vector database. Eight services can store embeddings; the right one follows from latency SLO, corpus size, filter and hybrid needs, freshness, and ops tolerance.
- OpenSearch Service is the production default for high-QPS hybrid RAG, with FAISS or Lucene plus byte quantization. Watch the Serverless idle floor (~$350/month with redundancy on).
- Bedrock Knowledge Bases trades tuning for speed-to-launch and bills across three layers (embedding tokens, backend store, generation tokens). It supports Aurora, OpenSearch, Neptune Analytics, S3 Vectors, Pinecone, MongoDB Atlas, and Redis Enterprise Cloud.
- Neptune Analytics is for graph plus vector, not flat-document RAG. Always-on capacity, sub-second (not sub-10 ms) latency.
- S3 Vectors went GA in December 2025 with up to 90% claimed cost savings and ~100 ms warm latency, making it the natural cold tier behind OpenSearch.
If you are weighing these options for a specific RAG or semantic-search workload, BigData Boutique designs and operates vector search on OpenSearch and the wider AWS stack, and we are happy to pressure-test your architecture and cost model.