Vector Search with Elasticsearch can leverage the built-in ELSER image or vector embeddings generated by external models. What is the difference and which one is better?

You’ve set up Elasticsearch, fine-tuned your queries, and your search engine delivers great results—until a user types a query that doesn’t exactly match any keywords. Suddenly, you realise you’re in need of semantic understanding. This is where ELSER, Elasticsearch’s native embedding model, steps in. But is ELSER enough, or should you turn to external models like OpenAI or Cohere for richer semantic capabilities? In this article, we’ll dive into when to choose ELSER and when external embeddings might be the better fit for your Elasticsearch-powered search.

What are embeddings?

Embeddings are numerical vector representations of text that capture the relationships, context, and meaning of words, phrases, or documents. For example, in the embedding space, similar concepts like "cat" and "dog" are closer together, while unrelated ones like "cat" and "car" are farther apart.

What is ELSER?

ELSER (Efficient Lexical and Semantic Embedding Retrieval) is Elasticsearch’s native sparse embedding model designed for hybrid search.

Let’s break it down.

  • Native: ELSER is built into Elasticsearch, so there’s no need for external ML infrastructure or additional setup.
  • Sparse: Most values in the vectors it generates are zero, making it memory-efficient and seamlessly compatible with Elasticsearch’s inverted index.
  • Embedding model: It uses AI to convert sentences into numerical vectors that encode both lexical and semantic information.
  • Hybrid search: Using ELSER combined with BM25 enables you to create a hybrid search experience with ease.

So by using ELSER embeddings, you can combine the precision of keyword-based retrieval with the flexibility of semantic similarity, enabling fast, scalable, and cost-effective hybrid search—all within Elasticsearch’s native infrastructure.

How does ELSER differ from external models?

External embedding models like OpenAI and Cohere use dense vectors, whereas ELSER relies on sparse vectors. But the distinction goes beyond the vector type—it has significant implications for system design and application architecture.

Using an external embedding model typically involves integrating an additional service into your workflow. Here’s how it works: your application processes documents, sends their text to an external API, receives the resulting dense vector embeddings, and then stores these embeddings alongside the original documents in Elasticsearch as a dense_vector field.

When performing semantic search, the query text is also embedded using the same external model, and similarity algorithms (e.g., cosine similarity) are applied to retrieve semantically relevant documents.

Albeit the increased application complexity, using an external model offers several advantages over ELSER, including richer semantic understanding, the ability to handle complex relationships in text, and multilingual support for global applications. Trained on diverse datasets, they allow fine-tuning for domain-specific tasks like legal or medical retrieval, and regular updates ensure access to cutting-edge performance without extra effort.

So should I use ELSER or an external model?

At this point, the choice comes down to the specific needs and requirements of your application.

First of all, choosing ELSER means committing to Elasticsearch as your primary database. On the plus side, this simplifies your system by avoiding the introduction of additional components. However, if you ever decide to switch to another database like OpenSearch or Pinecone, you'll need to regenerate your embeddings using a different model, as ELSER embeddings are tightly integrated with Elasticsearch.

Another limitation of ELSER is that it’s designed primarily for English texts and works best with inputs under 512 characters. If your data within these parameters, it might be a great fit, but if not, an external embedding model may be a better fit.

Finally, for complex queries and large datasets, ELSER can sometimes be less accurate than external embedding models like OpenAI or Cohere. That said, ELSER’s simplicity and native integration make it an excellent starting point. My recommendation is to begin with ELSER—it's easy to set up and cost-effective—and if it doesn’t meet your performance needs, you can transition to an external model later.

Comparing Costs

Compute

Using ELSER as your embedding model might put additional load on your cluster. The cluster’s ML node will need to create embeddings for each ingested document, as well as embed search queries when running semantic search. This means that a main contributor to cost will be the additional compute necessary to handle the embedding workload.

While the ML node that is included by default on Elastic cloud can handle the load of a smaller data set (<10000 documents), if the dataset is large or ingest and search requests are frequent, you will probably have to scale those nodes at an additional cost.

As for traditional embedding models, their cost is determined by the amount of input tokens that are being sent to the model (when consumed from an API) or the dedicated compute power (when running a dedicated server).

Storage

In terms of storage, sparse embedding vectors are usually more compact than dense vectors. So using dense vectors might require more storage as well as more RAM.

How to proceed?

In the next article I will share examples of setting up an application powered by hybrid search, both using ELSER and an external model (Cohere).