Two years after the fork from Elasticsearch and the initial release of OpenSearch, here’s our detailed and updated OpenSearch vs Elasticsearch comparison.
Over the last 13 years, the BigData Boutique team and I have been working closely with Elasticsearch teams in both Elastic and Amazon. I’ve also followed the events that took place in 2020 and 2021 closely. I didn't voice an opinion about that argument back then because we work closely with both teams and have great appreciation for both. To be honest, this competition is actually good for our customers, and of course Search and Big Data consumers in general. Now, two years following the events that led to the birth of OpenSearch, it’s a good time for a retrospective. Let’s compare both technologies and assess their trajectory.
Today, both Elasticsearch and OpenSearch are widely used for indexing and searching large volumes of data in real-time. Together, they are the industry standard for search engines, log analytics, and real time BI. However, they have some significant differences that set them apart.
In this blog, we'll take a closer look at OpenSearch vs. Elasticsearch and help you decide which one is right for your needs.
Elasticsearch is a popular search engine that is based on the Apache Lucene project (which is also the parent project for Apache Solr), that has been used by many since 2010 for search and log analytics. It is designed to be highly scalable and can be used for a wide range of applications, from simple search functionality to complex data analysis. Elasticsearch and Kibana, as part of the Elastic Stack, both have a robust set of features, including support for full-text search, enterprise search, real time analytics, and geospatial queries. It used to be fully open source under the Apache License, until that changed in early 2021, when competitor Amazon embarked into creating its own project. Elasticsearch is often deployed self-managed, or on Elastic Cloud.
The OpenSearch search engine is a fork of Elasticsearch maintained by Amazon since January 2021. It's essentially the same codebase until the fork event, which is also when the projects started to slightly diverge. One of the key features of OpenSearch is its focus on transparency and community-driven development. Unlike Elasticsearch, which is owned by Elastic NV, OpenSearch is governed by a community-driven foundation. This means that anyone can contribute to the development of OpenSearch. While the codebase of both software products is open for inspection by anyone who wants to review it, it is easier to contribute code and influence the direction of OpenSearch than it is to do for Elasticsearch. It is often used as part of the Amazon OpenSearch Service (previously known as the Amazon Elasticsearch Service).
When looking to compare the two, we will review Codebase as a signal for traction and development effort, and Features - so you can choose which one is a better fit for your needs.
The OpenSearch project forked the Elasticsearch codebase when version 7.10.2 was the latest release, and then significant work occurred on the OpenSearch codebase to rename the project and clean all non Apache-licensed code (namely, X-Pack functionality). To properly compare the work done on both, we counted commits on the master/main branch made since April 22nd 2021, which marked the first release candidate of OpenSearch following the fork a few months earlier.
The Elasticsearch repository had nearly 20k commits, 6k of them were made to core Elasticsearch (the "server" folder), and few more to satellite modules:
# total commits in repo since fork ➜ elasticsearch git:(master) git log --oneline --all --since='Apr 22 2021' | wc -l 19527 # total commits to the main codebase (server folder) since fork ➜ elasticsearch git:(master) git log --oneline --all --since='Apr 22 2021' -- server/ | wc -l 6130 # total commits to main modules (various surrounding functionality not under x-pack) since fork # https://github.com/elastic/elasticsearch/tree/main/modules ➜ elasticsearch git:(master) git log --oneline --all --since='Apr 22 2021' -- modules/ | wc -l 1437 # just as means of comparison, the amount of work made on x-pack features is not negligible ➜ elasticsearch git:(master) git log --oneline --all --since='Apr 22 2021' -- x-pack/ | wc -l 7294
OpenSearch saw over 3 times less code commits on core, and 14 times less work on important modules which includes for example the scripting languages, reindexing features, ingestion pipeline processors, and more:
➜ OpenSearch git:(main) git log --oneline --all --since='Apr 22 2021' | wc -l 3727 ➜ OpenSearch git:(main) git log --oneline --all --since='Apr 22 2021' -- server/ | wc -l 1966 # total commits to main modules (surrounding functionality not under x-pack) since fork # https://github.com/opensearch-project/OpenSearch/tree/main/modules ➜ OpenSearch git:(main) git log --oneline --all --since='Apr 22 2021' -- modules/ | wc -l 470
Consequently, OpenSearch has fewer releases compared to the number of releases (major and minor) made by Elasticsearch.
While the number of commits is not a direct evidence to quality of code or performance of the software, it is quite clear and evident the Elasticsearch project is seeing more work on core, which in turn surely translates to better performance, more features, keeping up with latest versions of dependencies and Lucene features, and so on - especially when the difference is on that scale.
The aforementioned stats were collected in April 2023.
All basic functionality of search, analytics and dashboards is exactly the same between the two technologies. After all, OpenSearch was forked from a very mature version of Elasticsearch. For the standard use cases, from the features perspective it doesn't matter which search engine you pick.
The difference in features between the projects will be for anything that was under Elastic’s X-Pack (free or paid), and all features that have been added after the fork.
For the important features that go a little bit above just basic - those eventually exist or will exist in both. As prime examples we can list the following:
- Data Streams API is implemented by both (although Elasticsearch just released time-series data streams that are not in OpenSearch - yet)
- Index Lifecycle Management becomes Index State Management in OpenSearch
- Both have some support for alerting (although we actually recommend going with ElastAlert2 and not any built-in alerting solution).
- Cross-cluster replication is supported by both, in Elasticsearch it's a Premium tier feature (not free).
At the time of writing this piece, some niche core features are still unique to Elasticsearch, such as geoshape and geohex grid aggregations.
Some OpenSearch features are only available on the managed service Amazon OpenSearch Service, and on the other hand unlike Elastic Cloud which is always up to date with the latest Elasticsearch version, the managed OpenSearch offering by Amazon is usually 2-3 versions behind.
Most of the major differences exist around the stack of vertical solutions available for various use cases (e.g. APM, SIEM, and more). Following are the highlights of major differences between the Elasticsearch and OpenSearch.
Security features in Elasticsearch and OpenSearch is quite a broad category involving several features and concerns. Authentication (letting users in), Authorization and RBAC (role-based access control), user impersonation, audit logging, encryption at rest and in transit, and various multi-tenancy concerns.
All of Elasticsearch’s built-in security features are part of the X-Pack Basic license, and those are limited to an Elasticsearch-based user directory. Since version 7.0 those are available free of charge to all users. To authenticate with LDAP, OpenID, SAML and more - paid licensing is required. Same goes to other security features such as IP filtering, document and field level security, and more.
OpenSearch offers the same security features and controls, but completely for free. OpenSearch’s Security module is developed entirely in the open and has all the necessary features: Active Directory and LDAP, SAML, OpenID, Access Control features including masking and field-level security, audit logs, encryption support and more.
As Security goes, Elasticsearch and OpenSearch are completely on par, with OpenSearch having the edge by offering all those completely for free as an Open Source built-in module.
The ability to create an “offline” search experience, thus significantly reducing the amount of hardware required to run Elasticsearch clusters with older, less frequently accessed data, is a true game changer for many use cases.
Elasticsearch has this feature implemented and in very wide use for a while now; while OpenSearch have just recently released it and it’s still marked as experimental. However, and very importantly - Elasticsearch requires a paid license on a high tier (Enterprise) to make use of this feature, while in OpenSearch Searchable Snapshots is a completely free feature.
This feature is provided by managed services, known as “Searchable Snapshots” or “search on frozen tier” in Elastic Cloud, and “Ultrawarm” on Amazon OpenSearch Service.
Our advice is not to run Machine Learning and AI workloads on Elasticsearch or OpenSearch simply because it's not purposely built for it. It's sometimes handy to have, sure, but it doesn't come without a price tag.
Elasticsearch and OpenSearch should be thought of as serving layer engines. You should prepare the data structure so that data can be served easily from them with or without ML involved. As an example, you can use the vector fields (dense or sparse vectors) and use kNN / ANN algorithms to find similar documents via Vector Search.
Another approach is to use rescoring approaches, like the LTR plugin does, to improve scoring capabilities.
Both Elasticsearch and OpenSearch offer built-in solutions (or “apps”) for Machine Learning workloads and use cases, that in some cases might come in handy (e.g. built-in SIEM in the Elastic Stack) but in our opinion - not for a general, widespread use.
The Data Prepper technology, which is part of the OpenSearch project, is meant to address that need.
Alternatively, there are dedicated connectors ready for various data streaming technologies, such as Kafka Connect for Kafka, Flink sink to be used with various sources, and so on.
One significant difference between the two is ease of use from various coding languages and platforms, and the maturity of client libraries.
Since the fork, most client libraries are throwing errors when trying to connect them to OpenSearch clusters; and naturally the technologies will diverge over time so even the core and currently shared APIs will evolve and change between the two. So OpenSearch needs to see its own client libraries developed and maintained.
Unfortunately, this is one big weak-spot for OpenSearch. The various client libraries we tried to use are minimal, lacking and even with bugs and documentation holes. They are not completely unusable, but they are often close to that. It’s sometimes just easier to use simple HTTP client libraries directly than to use OpenSearch’s client libraries.
Of course we cannot have a post comparing the two without discussing the elephant in the room, which is the licensing model. Elasticsearch was previously released under the Apache license, which is a very permissive license. This is also the current license of OpenSearch - but Elasticsearch is now released under a different, less permissive license, that many consider to not be an open source license.
Neither me or the team are lawyers - we prefer remaining highly technical and this is the real value we are able to provide. But more often than not we are asked if doing X is legit or will be in violation of Elastic’s license.
The gist is the new license forbids serving Elasticsearch APIs as a managed service. If you just use Elasticsearch as the backend for your application - you are good to go. But there is a lot of gray area, such as embedding Elasticsearch as part of a larger solution that is sold as one piece, exposing some APIs that could be seen as Elasticsearch APIs (e.g. search via API), and so on. A lot of our customers look for having zero risk, and especially if they don’t need anything special from Elasticsearch - they opt to use OpenSearch and use its basic features and then some.
OpenSearch is an open source project, which means that there is no official support on offer. Managed services for OpenSearch such as Amazon OpenSearch Service, Aiven, and others - will take responsibility for running the hardware and software for you, but not how you make use of it.
Elastic Co, the company behind Elasticsearch, does offer support via its standard subscription licenses, or via the managed offering on Elastic Cloud. But again - that support will be limited and won’t always provide the best tailored advice as of how to use the technology to best suit your needs.
For cases when the documentation is not enough, and when you need a true expert who can serve as your trusted advisor - we have established our name as world-leaders in providing support for Elasticsearch and OpenSearch alike. Aside from consulting and migration services, we also provide 24/7 production support to help with urgent matters and keep your clusters healthy at all times.
Also check out Pulse - our automated consultant solution for proactive monitoring and support.
To easily summarize this OpenSearch vs Elasticsearch comparison - as long as you don't serve Elasticsearch directly to customers or fall in the legal gray area of doing so, you are safe to use both Elasticsearch and OpenSearch.
For all basic and mainstream use cases there is really no difference between Elasticsearch and OpenSearch. Those use cases include text search, log analytics, dashboards, and so on. Both technologies will serve exactly the same purpose.
Elasticsearch will most likely be easier to integrate from anywhere due to the extensive client libraries support, and will also catch up on bugs and issues faster thanks to the very active development team.
OpenSearch on the other hand will most likely be cheaper to operate, most definitely so if you are looking for something that is beyond just basic functionality, such as a full-fledged SIEM. The Elastic Stack implementation of those solutions will most likely be much more mature, but they will also come at a significant price tag.
For self-managed - those would probably be your deciding factors. If you’re looking for a managed solution, there are many more options out there for OpenSearch, for obvious reasons.
Regardless of your choice of search engine, our team is available to provide 24/7 support, consulting services, and hands-on development and migration services. Reach out to us to discuss further.