How to migrate your data from Elasticsearch to OpenSearch with minimal or zero downtime, ensuring a smooth transition for your projects.
We are being approached regularly by organizations interested in migrating from Elasticsearch, whether its Elasticsearch OSS or Elastic Cloud, to OpenSearch. During our engagement, we are providing them with the guidance and advice on how to approach such a migration and providing hands-on support as needed.
This guide covers every step of the process - from upgrading your Elasticsearch version to setting up your new OpenSearch cluster and ensuring all data is safely migrated from Elasticsearch. Follow our guidelines to ensure a seamless transition with minimal downtime and no data loss.
Preparing for Migration from Elasticsearch cluster
The first step is preparing for the move. This step involves upgrading to a compatible Elasticsearch version if you are running an older Elasticsearch cluster, backing up data, and planning the transition period to minimize downtime and data inconsistencies.
Upgrade Elasticsearch to a Compatible Version
OpenSearch supports direct upgrades from versions 6.8 to 7.10.2, so updating to these versions is necessary if you’re using an older setup. This alignment minimizes potential issues during migration.
The easiest path would be if you are currently running Elasticsearch 7.x, in which case you should start by upgrading to Elasticsearch 7.10. If you are running on an even older version, you should first clearly map the differences between your version and the latest OpenSearch version. If they are minor, you should pursue the same path as before - upgrade to 7.10 (which in this case should be relatively easy to do), and the proceed with the migration to OpenSearch.
Running on older versions where there are dependecies on old or deprecated features or old syntax, you should first work on rebuilding those parts so they are compatible with the latest OpenSearch. The Elasticsearch and OpenSearch Upgrade Tool might come in handy here to identify those gaps. In most likelihood, the best path in this case would be to do a side-by-side migration with full data reindexing.
This is also going to be the case for clusters running Elasticsearch 7.11 and later. Migration to OpenSearch isn't backwards-compatible, and as such you'd need to run a side-by-side migration following our advice below.
The two primary methods to perform a cluster upgrade are rolling upgrade of the Elasticsearch nodes and complete Elasticsearch cluster upgrade, with rolling upgrades generally preferred for sequential node upgrades without full cluster restart. if you are already using a managed version such as Amazon Elasticsearch Service this should be handled for you by the service provider. Regardles of the method used, remember to execute the necessary steps to disable shard allocation to keep your Elasticsearch data safe.
Check Plugins
If you are using any plugins that are not core plugins, you should first confirm those are available for OpenSearch as well; and then that they are supported to be installed on the OpenSearch node on any managed service you might be running on.
Commonly used plugins include a security plugin and analyzers and you should confirm which plugins are installed and used on your existing cluster before proceeding.
Backup Current Data
Backing up your current data is vital. Before starting to migrate data activities, take a snapshot of all indexes to prevent data loss and allow for a rollback option.
The Snapshot API in Elasticsearch reliably takes and stores backups of your indexes into a new index on OpenSearch, ensuring data integrity and enabling disaster recovery if needed. This step maintains data continuity and reliability during migration.
Backup Kibana Dashboards
In case you have dashboards in Kibana, you'll need to make sure to keep them so you can convert them later for OpenSearch Dashboards usage.
Plan the Transition Period
After setting up the environment, you can start the process to migrate data. There are two main procedures - the mass data copy process, and keeping the system operational during the migration. Careful planning can significantly reduce data inconsistencies and downtime and it will boild own to how you plan the transition period.
There is going to be a transition period where OpenSearch was deployed and migration into it has started, until the Elasticsearch oss data copy has been completed. There are multiple approaches to handle the live system during the transition period, which will depend on the criticality of your system and other business requirements:
One option is to start writing immediately to OpenSearch; in which case data reads from OpenSearch will be partial until the data copy is complete.
Another option is to continue writing to Elasticsearch and initiate the data migration, and only make a switch to using the new cluster when data copy is done. In this case you will need to clearly define the cutoff point and when to make this switch, and account for how to copy the deltas that are added to the data copy process.
Some systems will allow to write to both clusters in parallel, for example when queues like Kafka topics or RabbitMQ are used in ingestion. You will need to handle data overwrite concerns in cases where updates or deletes are supported. This is the preferred approach when data is append only, and then you'll only need to clearly define when the dual write started so you know when to start copying from.
Of course, there are more complicated scenarios of course where more evolved solutions will need to be created.
Setting Up OpenSearch Cluster
OpenSearch can be deployed as a single-node or multi-node cluster with similar configuration steps. Various deployment methods, including Docker, Helm, and RPM, provide flexibility based on your infrastructure and preferences. There are of course Managed OpenSearch options like AWS OpenSearch, Aiven and more that you can use.
A lift-and-shift approach, replicating the same cluster topology, sizing and configurations used in your Elasticsearch cluster is recommended. This method ensures a smooth transition and meets performance expectations without adding any risk to the process. Begin by configuring the elasticsearch oss config file to match your requirements, setting up necessary nodes, and ensuring that your cluster is green and operational.
Once the basic setup is complete, start OpenSearch and verify that all OpenSearch nodes are running correctly. OpenSearch Dashboards provide visual confirmation of your cluster’s status and help in configuring and managing your new OpenSearch environment effectively.
The Migration Process
This stage can be streamlined using tools like the Migration Assistant for Amazon OpenSearch Service where applicable, but in most cases snapshot and reindexing techniques will be necessary to ensure data integrity and accessibility in the new environment.
Option 1: Migration Assistant for Amazon OpenSearch Service
The Migration Assistant for Amazon OpenSearch Service simplifies and streamlines the migration process. It helps users transfer data from an Amazon S3 snapshot to a specified target cluster efficiently and accurately. A standout feature is its ability to capture live traffic aimed at the original cluster and archive it for playback on the new destination cluster using the AWS OpenSearch Service.
The Migration Assistant also validates traffic by recording and comparing requests and responses between the source and destination clusters. It integrates with AWS Service Catalog AppRegistry and Application Manager for centralized resource management. Leveraging the Migration Assistant ensures a smoother transition with minimal disruption to your services, but it's not applicable to any use-case and it might be costly and slow to operate.
Option 2: Migrating Data Using Snapshots
Using Elasticsearch Snapshots is an efficient and safe method for transferring data from Elasticsearch to OpenSearch, and is usually what we recommend for any non-trivial migration process.
The first step is taking a snapshot of your existing Elasticsearch cluster. You should use a shared filesystem or a storage system like S3 as a snapshot repository to store the snapshot files, and allow easy restore. The final step is to restore the snapshots with your Elasticsearch data on the OpenSearch Service domain with the cluster state and index mappings.
Restoring snapshots does not require cluster restart and will only require you to migrate data ingested to Elasticsearch after the snapshot was taken, if you didn't make your system read-only.
You can follow the guide provided on AWS OpenSearch documentation for a step-by-step guidance.
Option 3: Migrate Data via Reindexing
Migrating from Elasticsearch 8.x or newer cannot be done directly and will require running two clusters side-by-side, and perform data copy via reindexing from the source index on the Elasticsearch cluster.
There are multiple ways to perform reindexing from Elasticsearch to OpenSearch. Remote indexing with the Reindex API is one of them, but we found that using Logstash and similar tools to perform the ETL works better for very large data copy operations.
We helped Yotpo perform a significant migration using Logstash and they have shared their insights on OpenSearch Con 2024. The process included having many dry-runs until we figured out the right logstash cluster size, and then making sure the process can run end to end without failing due to shortage of resources.
Syncing Changes During Migration
Syncing changes during migration is crucial to ensure data consistency between Elasticsearch and OpenSearch, and you should have planned for this transition period. Selective remote reindexing can synchronize data modifications from Elasticsearch to OpenSearch, relying on an indexing timestamp to track changes accurately during migration.
After the initial migration, continuous synchronization is crucial to handle ongoing data changes. Document selection can use the timestamp or the last snapshot time to determine which documents to transfer, ensuring all changes are captured and migrated to the new system without data loss.
Handling writes during migration requires a strategy to avoid data loss and ensure ongoing writes are captured. Implementing real-time strategies that adapt to changes is crucial to ensure all writes are properly captured and migrated. This may involve dual writing or other methods to maintain data integrity throughout the migration process.
Post-Migration Tasks
Once migration is complete, you will need to ensure that the new OpenSearch cluster is fully operational and compatible with existing applications. This phase involves verifying data integrity, updating applications, and optimizing the new environment for performance and security.
It's important to ensure that all migrated data is accessible and accurately transferred to OpenSearch, and cluster is on green status. Perform checks like document counts and ID matches between the old and new systems. Running query tests and system tests if you have those can confirm that queries return expected results, ensuring data integrity post-migration.
Updating applications to connect with the OpenSearch Client is essential after migration. This includes modifying connection settings to align with the new OpenSearch cluster configurations. Testing applications in a staging environment before full deployment helps mitigate risks.
Also make sure any environment variables and configurations including plugins and secure settings have been copied over.
Submit Comment