A practitioner's guide to Amazon OpenSearch Service covering architecture, provisioned vs serverless deployment, instance selection, security, monitoring, and cost optimization.
Amazon OpenSearch Service is a fully managed AWS service for deploying, operating, and scaling OpenSearch and legacy Elasticsearch workloads in the cloud without managing the underlying infrastructure. Originally launched as Amazon Elasticsearch Service, it was rebranded in 2021 when AWS forked OpenSearch from Elasticsearch 7.10. Today it remains the largest managed OpenSearch provider, powering search, log analytics, observability, and vector search workloads across thousands of organizations.
At BigData Boutique, we hold the AWS Service Delivery designation for Amazon OpenSearch Service. This guide distills what we've learned managing OpenSearch clusters at scale on AWS - architecture decisions, deployment models, and the operational practices that actually matter in production.
If you're coming from the Elasticsearch side, our What is AWS Elasticsearch page covers the history and transition. For a broader look at the OpenSearch project itself, see What is OpenSearch.
Architecture Overview
An Amazon OpenSearch Service domain is the AWS term for a managed OpenSearch cluster. Each domain consists of data nodes (which hold your indices and handle queries), optional dedicated master nodes (for cluster state management), and optional UltraWarm and cold storage nodes for tiered data.
The data plane runs inside your AWS account. You choose between two network configurations:
- VPC access - domain endpoints live within a VPC. Traffic never crosses the public internet. This is the recommended configuration for production workloads.
- Public access - domain endpoints are publicly reachable, secured through IAM policies and fine-grained access control. Simpler to get started with, but requires more careful policy management.
Each domain exposes a single cluster endpoint for both the OpenSearch API and OpenSearch Dashboards. Under the hood, AWS manages the EC2 instances, EBS volumes, automated snapshots, and software patching. You control the cluster topology, instance types, storage configuration, access policies, and OpenSearch version.
Provisioned vs Serverless - When to Use Each
Amazon OpenSearch Service offers two distinct deployment models, and choosing the wrong one is one of the most common (and expensive) mistakes we see.
| Provisioned Domains | OpenSearch Serverless | |
|---|---|---|
| Capacity management | Manual - you select instance types, node counts, storage | Automatic - scales OCUs based on demand |
| Pricing model | EC2 instance-hours + EBS storage | OCU-hours (indexing + search separately) + S3 storage |
| Minimum cost | Depends on instance selection (can start under $50/mo) | ~$350/month floor (minimum OCUs even at zero traffic) |
| Shard/index control | Full control over shard count, allocation, ILM | No shard-level configuration |
| Plugin support | Alerting, anomaly detection, k-NN, SQL, all included | Limited - no alerting, no anomaly detection, restricted API surface |
| Version upgrades | Manual, you control timing | Automatic, AWS-managed |
| Encryption at rest | Optional | Required |
| Network access | VPC or public, single endpoint | VPC endpoints + network policies, decoupled Dashboards access |
Use provisioned when you need full control over cluster topology, run steady-state workloads, require advanced plugins (alerting, anomaly detection, cross-cluster search), or need predictable performance on large datasets. Most production search and analytics workloads fall here.
Use serverless for intermittent or unpredictable workloads where you'd rather not manage capacity - prototyping, development environments, or bursty log analytics pipelines that can tolerate the limitations. Be aware of the minimum cost floor - serverless is not "pay nothing when idle."
For a head-to-head comparison of managed providers beyond AWS, see our managed OpenSearch services comparison.
Instance Selection and Cluster Sizing
Picking the right instance type has an outsized impact on both performance and cost. Here's how we approach it.
Graviton instances first. AWS Graviton-based instances (ARM) deliver better price-performance than x86 equivalents. As of late 2025, Amazon OpenSearch Service supports Graviton4 (C8g, M8g, R8g, R8gd) with up to 30% better performance over Graviton3. Unless you have a specific x86 dependency, always default to Graviton.
Match instance family to workload:
- R-series (memory-optimized) - the default choice for most OpenSearch workloads. Search-heavy and aggregation-heavy use cases benefit from high memory-to-CPU ratios. R7g or R8g for hot data.
- C-series (compute-optimized) - ingest-heavy workloads where indexing throughput matters more than heap size. Good for log pipelines with heavy parsing.
- I-series / OR1 (storage-optimized) - large datasets where local NVMe storage gives you better I/O than EBS. OR1 instances are OpenSearch-optimized and support a new writeable warm tier.
- UltraWarm - S3-backed warm storage for read-heavy, infrequently accessed data. Roughly 80% cheaper per GB than hot storage.
- Cold storage - S3-based archive at $0.024/GB/month. Data is detached from compute and must be explicitly attached to UltraWarm for querying.
Shard strategy. Target 10-50 GB per shard for hot data. Avoid thousands of small shards - they create cluster state overhead and slow down recovery. For time-series data, use index lifecycle management (ISM) policies to roll over indices at a fixed size rather than fixed time intervals. Size your cluster so each data node holds no more than 25 shards per GB of JVM heap.
Dedicated master nodes. Always use them in production. Three dedicated master nodes (e.g., C7g.large.search or M7g.large.search) prevent data node instability from cascading into cluster state failures. This is non-negotiable for any domain with more than a handful of data nodes.
Security Best Practices
Amazon OpenSearch Service provides layered security, but the defaults don't go far enough for production.
Fine-grained access control (FGAC) is the most important feature to enable. It gives you role-based permissions at the cluster, index, document, and field level. Enable it at domain creation - retrofitting it later requires re-creating the domain or a complex migration. FGAC requires HTTPS, encryption at rest (via KMS), and node-to-node encryption. Enable all three.
Authentication options:
- IAM-based signing (SigV4) for programmatic access
- SAML 2.0 integration for Dashboards SSO - works with Okta, Azure AD, ADFS, AWS IAM Identity Center
- Internal user database as a fallback (avoid for production - use IAM or SAML)
VPC deployment is strongly recommended. Public endpoints with IP-based access policies are brittle and hard to audit. VPC access combined with security groups and IAM policies gives you proper network-level isolation.
Encryption everywhere. Enable encryption at rest with a customer-managed KMS key (not the AWS-managed default) for audit trail and rotation control. Enable node-to-node encryption. Enforce TLS 1.2 minimum on all client connections.
Monitoring and Alerting
CloudWatch is the native monitoring path. Amazon OpenSearch Service publishes metrics at 60-second intervals covering cluster health, CPU/memory/disk utilization, JVM pressure, indexing rate, search latency, and more. Set CloudWatch alarms for at minimum:
- ClusterStatus.red - immediate alert, something is broken
- FreeStorageSpace dropping below 25% - disk pressure causes cascading problems
- JVMMemoryPressure above 80% - circuit breakers will start rejecting requests
- CPUUtilization sustained above 80% - time to scale out or optimize queries
- MasterReachableFromNode - master node connectivity loss needs immediate attention
OpenSearch Dashboards gives you index-level visibility and the built-in alerting plugin lets you create monitors that trigger on query conditions (e.g., error rate spikes). For deeper monitoring, consider tools like Pulse which provide query-level analytics, slow log analysis, and proactive recommendations that CloudWatch alone cannot offer.
Cost Optimization
OpenSearch AWS costs add up fast. These are the levers that make the biggest difference.
Reserved Instances / Savings Plans. For steady-state provisioned clusters, Reserved Instances offer 31-52% savings depending on term length and payment option. Database Savings Plans provide similar discounts with more flexibility across instance families. If your cluster has been running for more than a few months and isn't going away, you're leaving money on the table without a commitment.
Data tiering. Hot storage is the most expensive tier. Move older, less-frequently-accessed data to UltraWarm (roughly 80% cheaper) and archive to cold storage ($0.024/GB/month). Use ISM policies to automate transitions - for example, move indices older than 30 days to warm, archive after 90 days, delete after 365. The new writeable warm tier on OR1 instances (released late 2025) adds flexibility for workloads that need occasional updates to warm data.
Right-size aggressively. Over-provisioned clusters are the norm, not the exception. Monitor actual CPU, memory, and storage utilization. A cluster running at 20% CPU is wasting at least half its compute spend. Use Graviton instances for the price-performance advantage. Drop from 3 AZs to 2 if your availability requirements allow it (most non-critical workloads don't need 3-AZ deployments).
Index design matters. Fewer, larger shards cost less than many small ones. Disable indexing on fields you never search. Use appropriate field types - a keyword field is cheaper to store and query than a text field with multiple analyzers. Review your mappings quarterly.
For more cost optimization strategies, see our OpenSearch cost optimization webinar recap covering real-world patterns from managing thousands of clusters.
Migrating from Elasticsearch Service
If you're running an older Amazon Elasticsearch Service domain, AWS provides an in-place upgrade path to OpenSearch. The critical points: test your client libraries for compatibility (the OpenSearch client is a drop-in replacement for Elasticsearch 7.x clients), review any breaking API changes in the OpenSearch version you're targeting, and verify that your custom plugins or scripts still work.
For a detailed walkthrough of the modernization process, see our Modernizing Amazon Elasticsearch/OpenSearch Service solution page.
If you're migrating from self-managed Elasticsearch to Amazon OpenSearch Service, snapshot/restore is the most reliable approach for the data layer. Plan for differences in security configuration (FGAC vs X-Pack Security), plugin availability, and operational tooling.
We've documented the broader migration considerations in our OpenSearch data migration from Elasticsearch guide.
What Comes Next
Amazon OpenSearch Service covers a lot of ground, but it's not a "set and forget" service. Clusters drift out of optimal configuration as data volumes grow, query patterns change, and new instance types become available. Regular capacity reviews, security audits, and cost optimization passes are part of running OpenSearch well on AWS.
We work with teams running Amazon OpenSearch Service at every scale - from single-domain setups to multi-region deployments handling billions of documents. Whether you need help with initial architecture, a migration from Elasticsearch, or ongoing OpenSearch support, our team has the depth of experience that comes from our work maintaining thousands of OpenSearch clusters in production.