Solr to OpenSearch Migration Deep Dive: Architecture, Queries, and Data Strategies

A technical deep dive into the architectural differences between Apache Solr and OpenSearch, with practical query translation examples and migration strategies for teams planning the switch.

Both Apache Solr and OpenSearch build on Apache Lucene, but once you look past that shared foundation, the differences run deep. Understanding these architectural gaps is what separates a smooth migration from one that stalls on unexpected behavior in production.

This post digs into the technical details that matter when moving from Solr to OpenSearch - internal architecture differences, the growing vector search gap, practical query translation, and data migration strategies that minimize risk. For migration planning and execution steps, see our Guide to Migrating from Apache Solr to OpenSearch. For a feature-level comparison, see Apache Solr vs OpenSearch: Comparison and Key Differences.

How Solr and OpenSearch Differ Under the Hood

Both engines are JVM-based and use Lucene. Both divide data into shards - physical Lucene indices. The similarities thin out quickly after that.

Data Organization. Solr uses collections divided into shards. OpenSearch uses indices divided into shards. Functionally similar - both are virtual groupings that share configuration and document structure. The real differences show up in how those shards are replicated and recovered.

Replica Types. SolrCloud offers three replica types, and you can mix them within a single collection:

NRT (Near Real-Time): Maintains a transaction log and refreshes continuously. Highest resource cost, lowest indexing-to-search latency.
TLOG: Reads from the transaction log on a schedule. Lower resource usage, higher latency.
PULL: Physically copies Lucene segments from the leader. Cheapest option but with the most delay - essentially a leader-follower pattern on top of SolrCloud.

OpenSearch takes a different approach with document replication (default) and segment replication. It also supports S3-backed remote storage for cold data. An open shard consumes heap memory regardless of query activity, so offloading cold indices to remote storage can significantly reduce cluster costs while keeping that data searchable on demand.

Leader Election. Solr requires Apache ZooKeeper - a separate JVM-based service where you need at least three instances for fault tolerance. That's additional infrastructure to provision, monitor, and maintain through upgrades. OpenSearch handles cluster coordination internally with built-in discovery. No external dependencies.

Recovery. Recovery behavior in Solr depends on replica type. NRT replicas first attempt to catch up from the transaction log (roughly the last 100 entries), then fall back to a full segment copy. TLOG and PULL replicas each use their own mechanisms. In OpenSearch, recovery is a binary copy of Lucene segments between nodes - you'll see this in cluster logs during shard allocation and rebalancing.

Aspect	Apache Solr	OpenSearch
Run modes	SolrCloud + legacy leader-follower	Distributed by default
Replica types	NRT, TLOG, PULL (mixable per collection)	Document, Segment, Remote (S3)
Leader election	External (ZooKeeper, 3+ instances)	Built-in cluster discovery
Recovery	Transaction log or segment copy (varies by replica)	Binary segment copy between nodes
Cold storage	Limited	S3-backed remote storage

The Vector Search Gap

If your migration is partly motivated by AI and semantic search capabilities, the gap between Solr and OpenSearch is substantial as of early 2026.

OpenSearch supports FAISS and other ANN algorithms for approximate nearest neighbor search, and added Maximum Marginal Relevance (MMR) in version 3.3. MMR diversifies result sets while maintaining relevance - OpenSearch documentation reports up to 90% relevance retention with 100x search performance improvement over brute-force approaches. Quantization support for reducing vector storage costs is built in, which matters when working with high-dimensional embeddings.

Solr has basic neural and sparse search support, but vectorization during indexing requires an external embedding service. There's no built-in equivalent to OpenSearch's integrated semantic search pipeline.

This gap matters because search in 2026 is hybrid. BM25 and keyword matching are still strong for exact-match use cases, part numbers, and specific terminology. But users and AI agents increasingly expect semantic understanding. Agents query search engines using natural language, not keyword strings. If your roadmap includes RAG, agentic search, or any form of semantic retrieval, OpenSearch provides those building blocks natively. For a broader look at the search engine landscape, including vector databases and analytics alternatives, see our Elasticsearch alternatives guide.

For a deeper look at search modernization, see Beyond Keyword Search: Search Modernization With OpenSearch.

Translating Queries from Solr to OpenSearch

Query migration is where teams spend the most time. Solr offers a URL-based API with query parsers (dismax, edismax) that handle scoring logic behind the scenes. OpenSearch uses a JSON Query DSL that's more verbose but gives explicit control over every aspect of query execution. For additional query translation examples, see the query migration section in our migration guide.

DisMax to Multi-Match. Solr's dismax parser is the workhorse for most search applications - pass a query and field list, it handles disjunction max scoring:

q=laptop&defType=dismax&qf=title^2 description

The closest OpenSearch equivalent is multi_match with best_fields:

{
    "query": {
      "multi_match": {
        "query": "laptop",
        "type": "best_fields",
        "fields": ["title^2", "description"]
      }
    }
  }

Faceting to Aggregations. One of the bigger conceptual shifts. Solr faceting gives you document counts per field value as a side-channel on search results:

facet=true&facet.field=brand&facet.limit=10

OpenSearch aggregations are structurally different. Each aggregation is named, can be nested virtually without limit, and supports domain changes - computing aggregations on a different document set than the query results:

{
    "aggs": {
      "brands": {
        "terms": {
          "field": "brand.keyword",
          "size": 10
        }
      }
    }
  }

Note the .keyword sub-field. OpenSearch commonly uses multi-fields where the main field is analyzed for full-text search and a .keyword sub-field stores the raw value for aggregations and sorting.

Filter Queries. Solr's fq parameter applies cached filters outside the scoring context:

fq=timestamp:[NOW-7DAYS TO *]

In OpenSearch, filters go inside a bool query's filter clause:

{
    "query": {
      "bool": {
        "filter": [
          { "range": { "timestamp": { "gte": "now-7d", "lte": "now" } } }
        ]
      }
    }
  }

Common Parameter Mappings:

Solr Parameter	OpenSearch Equivalent
`rows`	`size`
`start`	`from`
`fl` (field list)	`_source`
`fq` (filter query)	`bool.filter` clause
`sort`	`sort`
`defType=dismax`	`multi_match` (best_fields)
`facet.field`	`terms` aggregation

OpenSearch queries are more verbose. Newcomers from Solr sometimes find that frustrating, but people new to search generally find the explicit JSON DSL easier to read and debug.

Migration Strategies and Bulk Indexing Tips

Two patterns handle most migration scenarios. For the full planning checklist - infrastructure sizing, PoC validation, team preparation - see the migration guide.

Dual-Write with Historical Catchup. The preferred approach. Your application writes to both Solr and OpenSearch simultaneously while you reindex historical data in the background. Once caught up, run AB tests, compare results, and gradually shift traffic - 1%, 10%, 50%, then full cutover. This may require adding a message queue (Kafka, SQS) before the indexing layer to prevent data loss during the transition. More complex to set up, but lets you validate everything without downtime and roll back instantly if something looks wrong.

Planned Downtime with Bulk Reindex. Export from your source of truth, bulk index into OpenSearch, switch over. Simpler to execute but requires a maintenance window. Whether the business accepts downtime depends on your indexing cadence - if you index continuously, this approach may not be viable.

Bulk Indexing Performance Tips

When loading data into OpenSearch during migration, a few settings make a large difference:

Batch size matters. Below 100 documents per bulk request, network overhead dominates (socket handshake, ACK). Above 5,000, memory pressure causes rejected requests. Start around 500-1,000 and tune based on document size.
Set refresh_interval to -1 during bulk loading. This prevents OpenSearch from rebuilding the searchable view after every write, dramatically improving throughput. Reset it after loading completes.
Use multiple indexing threads. A single thread won't saturate your cluster. Parallelize across available cores and data sources.
Monitor JVM heap. Bulk requests consume heap memory. Watch _nodes/stats/jvm and reduce concurrency if usage climbs above 75%.
Disable dynamic mapping. OpenSearch guesses field types for unknown fields by default. In production, this leads to mapping bloat - hit the default 1,000-field limit and indexing starts failing silently. Use index templates with "dynamic": "strict" to enforce your schema.

Key Takeaways

Solr and OpenSearch share a Lucene core but diverge in replication models, cluster management, recovery mechanisms, and vector search capabilities.
OpenSearch's built-in cluster coordination eliminates the ZooKeeper dependency - less infrastructure to manage and fewer failure modes.
The vector search gap is real: OpenSearch offers FAISS, MMR, and integrated semantic search pipelines. Solr requires external services.
Query translation is mechanical but time-consuming. The biggest conceptual shifts are dismax to multi_match and faceting to named, nestable aggregations.
For production migrations, dual-write with gradual traffic shifting is the safest path. Tune bulk indexing settings to avoid bottlenecks during data loading.

For schema conversion tooling, see Schema Migration from Solr to Elasticsearch/OpenSearch. Planning a Solr-to-OpenSearch migration in production? Our OpenSearch consulting services cover schema translation, dual-write cutover, and post-migration relevance tuning.

This post is based on a webinar presented by Rafał Kuć covering migration benefits and a hands-on tutorial for moving from Apache Solr to OpenSearch.