Dangling indexes in Elasticsearch and OpenSearch can cause issues, but mostly are a signal of data loss. Here's how to recover dangling indices, and how to avoid this situation in the first place.

What are dangling indices?

Dangling indexes are indexes that exist on disk, recognized by the Elasticsearch node, but are not known to the cluster metadata. This can occur when a node moves from one cluster to another, perhaps because the original cluster lost all of its master nodes, which is a form of a split-brain situation; or when a cluster loses all master nodes, restores from backup, but the backup is old and doesn't contain the index which is now in a dangling state.

Some user actions can cause dangling indices too. Users meddling with the contents of the Elasticsearch data path, trying to recover indices, or clone, or manually backup - are risking creating dangling indices.

Lastly, mass-deleting indices (over 500 of them or whatever is set in cluster.indices.tombstones.size) when a node is offline will cause the node rejoining the cluster to receive only some of the index deletion requests (because the "index graveyard" has been overflowed), thus keeping the index sources intact but having no record of them in the cluster state.

Dangling indexes can cause a number of issues, including increased disk usage, slower search performance, and instability in the cluster. In addition, dangling indexes can also make it difficult to manage and maintain the cluster, and may also be an evidence of lost data that you might want to recover.

How to Identify Dangling Indexes

When a node joins the cluster, if it finds any shards stored in its local data directory that do not already exist in the cluster, it will consider those shards to be "dangling". Identifying dangling indexes in Elasticsearch and OpenSearch is relatively straightforward. The easiest way to do this is to use the cat indices API, which provides information about all of the indexes in a cluster.

To use the cat indices API, you can run the following command in the Elasticsearch console:

GET /_cat/indices?v

This will return a list of all of the indexes in the cluster, along with information about their status, size, and other properties. If there are any dangling indexes in the cluster, they will be listed with a status of "dangling".

Elasticsearch also provides a dedicated API for listing dangling indices directly, called the List dangling indices API:

GET /_dangling

# with response:
{
  "dangling_indices": [
   {
    "index_name": "my-index-000001",
    "index_uuid": "zmM4e0JtBkeUjiHD-MihPQ",
    "creation_date_millis": 1589414451372,
    "node_ids": [
      "pL47UN3dAb2d5RCWP6lQ3e"
    ]
   }
  ]
}

How to Recover Dangling Indexes

The gateway.auto_import_dangling_indices cluster setting controls whether to automatically import dangling indices into the cluster state, provided no indices already exist with the same name. On older Elasticsearch versions it is set to true by default, meaning Elasticsearch will try (and frequently fail) to import those indices. An error will be logged detailing the same.

Importing dangling indices into the cluster using gateway.auto_import_dangling_indices is not safe, and was failing too often. Thus it defaults to false since Elasticsearch version 7.9 and OpenSearch (all versions). On all modern Elasticsearch and OpenSearch versions importing dangling indices is a manual process.

Instead, use the Dangling indices APIs to import a dangling index or delete it. Neither mechanism provides any guarantees as to whether the imported data truly represents the latest state of the data when the index was still part of the cluster.

If you have identified dangling indexes in your Elasticsearch cluster and wish to import them, there are several steps you can take to fix the issue.

The first step is to try to recover the index by re-adding it to the cluster. This can be done by running the following command in the Elasticsearch console:

POST /_dangling/zmM4e0JtBkeUjiHD-MihPQ?accept_data_loss=true

In this example, zmM4e0JtBkeUjiHD-MihPQ is the index_uuid to restore, as provided by the List dangling indices API. Note, and acknowledge, the use of the accept_data_loss flag. A restore will be attempted on dangling indices only if data loss is acceptable and was explicitly approved.

This will attempt to reassign any dangling indexes to available nodes in the cluster. If the index is successfully reassigned, it will no longer be listed as dangling.

If the index cannot be recovered, you can delete it by using the Delete dangling index API. This will delete the dangling index from the cluster and free up disk space.

How to Prevent Dangling Indexes

Preventing dangling indexes in Elasticsearch requires proper management of the cluster. This includes ensuring that all nodes are properly configured and part of the cluster, and that indexes are created and managed correctly.

It is especially important to properly manage the removal of nodes from the cluster. This includes ensuring that all indexes are properly removed from the node before it is removed from the cluster. Keeping the cluster healthy at all times, and avoiding frequently replacing nodes, or prolonged node downtimes, will minimize the chances of having dangling indices in your cluster.

We are often asked by customers about manually tempering with the data path. Our answer is always "no!", and this is one of the reasons why. To avoid having dangling indices in your cluster, always manage indices through the appropriate APIs, and never manually.

Another good advice is not to delete too many indices at once. When deleting indices with a wildcard (e.g. myindex-2022*) a simple DELETE command may expand to dozens and sometimes hundreds of indices. This is a common root cause for dangling indices, and it should be used in caution, and also while always confirming all cluster nodes are available.

Conclusion

The dangling index problem can be easily avoided by correctly using and managing your Elasticsearch or OpenSearch clusters.

Are looking for help with monitoring and reviewing cluster configurations and/or best practices? be sure to check out our Pulse solution. Pulse can offer insights into your cluster with actionable recommendations. It also allows you to tap into world-class Elasticsearch experts to further review your cluster and help with your needs. If you’re interested in learning more, reach out to us here!