AWS OpenSearch Service - Architecture, Setup, and Best Practices

A practitioner's guide to Amazon OpenSearch Service covering architecture, provisioned vs serverless deployment, instance selection, security, monitoring, and cost optimization.

Amazon OpenSearch Service is a fully managed AWS service for deploying, operating, and scaling OpenSearch and legacy Elasticsearch workloads in the cloud without managing the underlying infrastructure. Originally launched as Amazon Elasticsearch Service, it was rebranded in 2021 when AWS forked OpenSearch from Elasticsearch 7.10. Today it remains the largest managed OpenSearch provider, powering search, log analytics, observability, and vector search workloads across thousands of organizations.

At BigData Boutique, we hold the AWS Service Delivery designation for Amazon OpenSearch Service. This guide distills what we've learned managing OpenSearch clusters at scale on AWS - architecture decisions, deployment models, and the operational practices that actually matter in production.

If you're coming from the Elasticsearch side, our What is AWS Elasticsearch page covers the history and transition. For a broader look at the OpenSearch project itself, see What is OpenSearch.

Architecture Overview

An Amazon OpenSearch Service domain is the AWS term for a managed OpenSearch cluster. Each domain consists of data nodes (which hold your indices and handle queries), optional dedicated master nodes (for cluster state management), and optional UltraWarm and cold storage nodes for tiered data.

The data plane runs inside your AWS account. You choose between two network configurations:

VPC access - domain endpoints live within a VPC. Traffic never crosses the public internet. This is the recommended configuration for production workloads.
Public access - domain endpoints are publicly reachable, secured through IAM policies and fine-grained access control. Simpler to get started with, but requires more careful policy management.

Each domain exposes a single cluster endpoint for both the OpenSearch API and OpenSearch Dashboards. Under the hood, AWS manages the EC2 instances, EBS volumes, automated snapshots, and software patching. You control the cluster topology, instance types, storage configuration, access policies, and OpenSearch version.

Provisioned vs Serverless - When to Use Each

Amazon OpenSearch Service offers two distinct deployment models, and choosing the wrong one is one of the most common (and expensive) mistakes we see.

	Provisioned Domains	OpenSearch Serverless
Capacity management	Manual - you select instance types, node counts, storage	Automatic - scales OCUs based on demand
Pricing model	EC2 instance-hours + EBS storage	OCU-hours (indexing + search separately) + S3 storage
Minimum cost	Depends on instance selection (can start under $50/mo)	~$350/month floor (minimum OCUs even at zero traffic)
Shard/index control	Full control over shard count, allocation, ILM	No shard-level configuration
Plugin support	Alerting, anomaly detection, k-NN, SQL, all included	Limited - no alerting, no anomaly detection, restricted API surface
Version upgrades	Manual, you control timing	Automatic, AWS-managed
Encryption at rest	Optional	Required
Network access	VPC or public, single endpoint	VPC endpoints + network policies, decoupled Dashboards access

Use provisioned when you need full control over cluster topology, run steady-state workloads, require advanced plugins (alerting, anomaly detection, cross-cluster search), or need predictable performance on large datasets. Most production search and analytics workloads fall here.

Use serverless for intermittent or unpredictable workloads where you'd rather not manage capacity - prototyping, development environments, or bursty log analytics pipelines that can tolerate the limitations. Be aware of the minimum cost floor - serverless is not "pay nothing when idle."

For a head-to-head comparison of managed providers beyond AWS, see our managed OpenSearch services comparison.

Instance Selection and Cluster Sizing

Picking the right instance type has an outsized impact on both performance and cost. Here's how we approach it.

Graviton instances first. AWS Graviton-based instances (ARM) deliver better price-performance than x86 equivalents. As of late 2025, Amazon OpenSearch Service supports Graviton4 (C8g, M8g, R8g, R8gd) with up to 30% better performance over Graviton3. Unless you have a specific x86 dependency, always default to Graviton.

Match instance family to workload:

R-series (memory-optimized) - the default choice for most OpenSearch workloads. Search-heavy and aggregation-heavy use cases benefit from high memory-to-CPU ratios. R7g or R8g for hot data.
C-series (compute-optimized) - ingest-heavy workloads where indexing throughput matters more than heap size. Good for log pipelines with heavy parsing.
I-series / OR1 (storage-optimized) - large datasets where local NVMe storage gives you better I/O than EBS. OR1 instances are OpenSearch-optimized and support a new writeable warm tier.
UltraWarm - S3-backed warm storage for read-heavy, infrequently accessed data. Roughly 80% cheaper per GB than hot storage.
Cold storage - S3-based archive at $0.024/GB/month. Data is detached from compute and must be explicitly attached to UltraWarm for querying.

Shard strategy. Target 10-50 GB per shard for hot data. Avoid thousands of small shards - they create cluster state overhead and slow down recovery. For time-series data, use index lifecycle management (ISM) policies to roll over indices at a fixed size rather than fixed time intervals. Size your cluster so each data node holds no more than 25 shards per GB of JVM heap.

Dedicated master nodes. Always use them in production. Three dedicated master nodes (e.g., C7g.large.search or M7g.large.search) prevent data node instability from cascading into cluster state failures. This is non-negotiable for any domain with more than a handful of data nodes.

Security Best Practices

Amazon OpenSearch Service provides layered security, but the defaults don't go far enough for production.

Fine-grained access control (FGAC) is the most important feature to enable. It gives you role-based permissions at the cluster, index, document, and field level. Enable it at domain creation - retrofitting it later requires re-creating the domain or a complex migration. FGAC requires HTTPS, encryption at rest (via KMS), and node-to-node encryption. Enable all three.

Authentication options:

IAM-based signing (SigV4) for programmatic access
SAML 2.0 integration for Dashboards SSO - works with Okta, Azure AD, ADFS, AWS IAM Identity Center
Internal user database as a fallback (avoid for production - use IAM or SAML)

VPC deployment is strongly recommended. Public endpoints with IP-based access policies are brittle and hard to audit. VPC access combined with security groups and IAM policies gives you proper network-level isolation.

Encryption everywhere. Enable encryption at rest with a customer-managed KMS key (not the AWS-managed default) for audit trail and rotation control. Enable node-to-node encryption. Enforce TLS 1.2 minimum on all client connections.

Monitoring and Alerting

CloudWatch is the native monitoring path. Amazon OpenSearch Service publishes metrics at 60-second intervals covering cluster health, CPU/memory/disk utilization, JVM pressure, indexing rate, search latency, and more. Set CloudWatch alarms for at minimum:

ClusterStatus.red - immediate alert, something is broken
FreeStorageSpace dropping below 25% - disk pressure causes cascading problems
JVMMemoryPressure above 80% - circuit breakers will start rejecting requests
CPUUtilization sustained above 80% - time to scale out or optimize queries
MasterReachableFromNode - master node connectivity loss needs immediate attention

OpenSearch Dashboards gives you index-level visibility and the built-in alerting plugin lets you create monitors that trigger on query conditions (e.g., error rate spikes). For deeper monitoring, consider tools like Pulse which provide query-level analytics, slow log analysis, and proactive recommendations that CloudWatch alone cannot offer.

Cost Optimization

OpenSearch AWS costs add up fast. These are the levers that make the biggest difference.

Reserved Instances / Savings Plans. For steady-state provisioned clusters, Reserved Instances offer 31-52% savings depending on term length and payment option. Database Savings Plans provide similar discounts with more flexibility across instance families. If your cluster has been running for more than a few months and isn't going away, you're leaving money on the table without a commitment.

Data tiering. Hot storage is the most expensive tier. Move older, less-frequently-accessed data to UltraWarm (roughly 80% cheaper) and archive to cold storage ($0.024/GB/month). Use ISM policies to automate transitions - for example, move indices older than 30 days to warm, archive after 90 days, delete after 365. The new writeable warm tier on OR1 instances (released late 2025) adds flexibility for workloads that need occasional updates to warm data. For vector workloads, backing knn_vector fields with S3 Vectors cuts vector storage cost dramatically in exchange for higher query latency.

Right-size aggressively. Over-provisioned clusters are the norm, not the exception. Monitor actual CPU, memory, and storage utilization. A cluster running at 20% CPU is wasting at least half its compute spend. Use Graviton instances for the price-performance advantage. Drop from 3 AZs to 2 if your availability requirements allow it (most non-critical workloads don't need 3-AZ deployments).

Index design matters. Fewer, larger shards cost less than many small ones. Disable indexing on fields you never search. Use appropriate field types - a keyword field is cheaper to store and query than a text field with multiple analyzers. Review your mappings quarterly.

For more cost optimization strategies, see our OpenSearch cost optimization webinar recap covering real-world patterns from managing thousands of clusters.

Migrating from Elasticsearch Service

If you're running an older Amazon Elasticsearch Service domain, AWS provides an in-place upgrade path to OpenSearch. The critical points: test your client libraries for compatibility (the OpenSearch client is a drop-in replacement for Elasticsearch 7.x clients), review any breaking API changes in the OpenSearch version you're targeting, and verify that your custom plugins or scripts still work.

For a detailed walkthrough of the modernization process, see our Modernizing Amazon Elasticsearch/OpenSearch Service solution page.

If you're migrating from self-managed Elasticsearch to Amazon OpenSearch Service, snapshot/restore is the most reliable approach for the data layer. Plan for differences in security configuration (FGAC vs X-Pack Security), plugin availability, and operational tooling.

We've documented the broader migration considerations in our OpenSearch data migration from Elasticsearch guide.

What Comes Next

Amazon OpenSearch Service covers a lot of ground, but it's not a "set and forget" service. Clusters drift out of optimal configuration as data volumes grow, query patterns change, and new instance types become available. Regular capacity reviews, security audits, and cost optimization passes are part of running OpenSearch well on AWS.

We work with teams running Amazon OpenSearch Service at every scale - from single-domain setups to multi-region deployments handling billions of documents. Whether you need OpenSearch consulting services for initial architecture, a migration from Elasticsearch, or ongoing OpenSearch support, our team has the depth of experience that comes from our work maintaining thousands of OpenSearch clusters in production.