A hands-on tutorial for building a semantic hybrid search application with Elasticsearch as a vector database.
In the previous article, we introduced vector search using OpenSearch and Elasticsearch. We looked at the differences between keyword and semantic search and explored how vectors work behind the scenes to power both.
In this article, we’ll take a hands-on approach, focusing on Elasticsearch as a vector database. We’ll build a hybrid search application that combines keyword and semantic search, working directly with Elasticsearch’s low-level APIs to handle mapping, embedding, indexing, and searching. This will give us a deeper understanding of how to develop vector-powered applications in Elasticsearch.
This is the first of three implementation-focused articles. In the next one, I’ll apply the same approach to OpenSearch, and in a follow-up article, I’ll explore how Elastic's ESRE can simplify the process by abstracting away much of the low-level work.
By the end of this series, you’ll have a strong understanding of different ways to implement vector search across these platforms. Let’s get started!
Follow along
All the code in this article is available on GitHub for reference.
To follow along, you'll need
- A kaggle account and API token
- A Cohere account and API token
- An Elasticsearch cluster. You can easily set up a trial on Elastic Cloud.
Choosing a Dataset
Your dataset might already be stored in Elasticsearch, be in another database, or you might not have created it yet. For this article, I’ll use a publicly available movie plot dataset from Kaggle.
We'll be working with the following columns:
- Release Year – Useful for filtering and sorting
- Title – A key identifier for search
- Director – Another potential search and filter field
- Cast – Important for keyword-based search
- Genre – Useful for categorization and filtering
- Plot – A rich text field, ideal for semantic search
Why This Dataset?
I chose this dataset because it has both structured and unstructured data, making it a good example for keyword and semantic search. Fields like release year, cast, and genre work well for traditional filtering and keyword-based search, while the plot summaries provide long text, which is useful for testing semantic search.
Another reason I picked this dataset is that it is already clean, so we don’t need much preprocessing. In real-world projects, your data may not be this clean and could need extra work before you can use it. How much cleaning is needed depends on your specific data and use case, so keep that in mind when preparing your dataset.
Choosing an embedding model
Once we have our dataset, the next step is to choose an embedding model to turn our text into vectors. This choice affects search quality and system performance, so it’s important to think about pricing, supported languages, input size limits, and output vector size. These things depend on how complex and long your text is. For example, models with larger input limits are better for long documents, while higher-dimensional vectors can improve search accuracy but need more storage and processing power.
Sometimes, you may need to fine-tune an embedding model if the available ones do not understand your specific domain or terminology well. This is useful for technical, medical, or legal texts, where general models may not capture important details. However, fine-tuning requires a lot of labeled data and extra training costs, so it is best when standard models do not give good results.
For this tutorial, I chose Cohere’s embed-english-v3
model because our text is in English, and Cohere offers free tokens for testing, making it easy to try out.
Preparing the index
Although Elasticsearch can automatically infer mappings when data is ingested, it’s generally best to define the mapping in advance to avoid inconsistencies and ensure optimal performance. For our dataset, the mapping is as follows:
PUT /movies
{
"mappings": {
"properties": {
"release_year": {
"type": "integer"
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
…
"plot": {
"type": "text"
},
"plot_embedding": {
"type": "dense_vector",
"dims": 1024,
"index": true,
"similarity": "cosine"
}
}
}
}
Most of this should look familiar, but let's take a closer look at the plot_embedding
field, which is where we store vector embeddings for semantic search. Here’s what each parameter does:
dims: 1024
– Specifies the size of the vector, which must match the output dimensions of the embedding model. For Cohere’sembed-english-v3
, the output vectors are 1024-dimensional.index: true
– An Elasticsearch-specific setting that makes the field searchable using vector similarity algorithms.similarity: "cosine"
– Defines the similarity metric used to compare vectors. Cohere embeddings perform best with cosine similarity, but other options (e.g., L2, dot product) are available.
Understanding Index Options
If you retrieve the index mapping using:
GET /movies/_mapping
You’ll notice that Elasticsearch has automatically added an index_options
section to the plot_embedding mapping:
"plot_embedding": {
"type": "dense_vector",
"dims": 1024,
"index": true,
"similarity": "cosine",
"index_options": {
"type": "int8_hnsw",
"m": 16,
"ef_construction": 100
}
}
The default values work well for this example, but understanding these settings is critical in real-world scenarios. Keep in mind that changing these values requires re-indexing, so it’s best to configure them carefully upfront.
Breaking Down index_options
:
type: "int8_hnsw"
– Determines whether the index uses Approximate Nearest Neighbor (ANN) (hnsw
) or Exact k-Nearest Neighbors (kNN) (flat
). As covered in the previous article, ANN is faster but less precise.
This setting also controls quantization, using the default setting, vectors are compressed from floating-point to 8-bit integers, reducing memory usage at the cost of some precision.m: 16
– Controls the number of bi-directional links per node in the HNSW graph.
Intuition tip: Higher values improve recall but increase memory usage.ef_construction: 100
– Defines how many neighbors are considered during index construction.
Intuition tip: Higher values improve search accuracy but slow down indexing.
By understanding these parameters, you can fine-tune your Elasticsearch setup for optimal vector search performance. In the next section, we’ll move on to preparing our data by chunking and embedding it.
Preparing the data
Chunking
Chunking refers to the concept of splitting documents into smaller, meaningful sections that are optimal for embedding and semantic retrieval.
Choosing the chunking strategy is one of the most important and difficult decisions in a semantic search project, as the way you divide your data affects search quality, speed, cost, and accuracy. For example, if you embed an entire document into a single vector, you might lose important details within the text. However, embedding individual sentences or words may remove valuable context, making retrieval less effective.
Common chunking strategies include:
- Fixed-Length Chunking – Splitting text into equal-sized chunks, often based on a fixed number of words, sentences, or characters.
- Semantic Chunking – Using natural language processing (NLP) techniques to split text at meaningful points, such as paragraph or sentence boundaries.
- Metadata-Aware Chunking – Chunking based on document structure, such as breaking sections based on headers, timestamps, or predefined labels.
I won’t go into detail about each method, but if you're interested in a deeper dive, the RAG Techniques repository provides an excellent guide on the advantages and disadvantages of each approach. I strongly recommend experimenting with different chunking strategies, as the way you split your data can significantly impact search performance.
For this tutorial, we’ll use fixed-length chunking with overlap strategy. While popular frameworks like LlamaIndex and LangChain provide built-in implementations, I opted for a simple custom implementation to avoid unnecessary dependencies.
Embedding
Once we have our chunks ready, we can go ahead and embed them using our chosen embedding algorithm. The goal of this process is to generate and attach an embedding vector to each chunk, to make it searchable once indexed.
To achieve this you would need an API token from your embedding model provider, or a connection to an AI platform such as AWS bedrock or GCP Vertex AI. Note that most providers also offer significant discounts for batch embedding (vs. real time embedding).
In our repo you will find code that takes in a chunks.jsonl
file, processes it and outputs a new embedded.jsonl
file.
Lets examine an output chunk:
This is the first chunk for the movie The Matrix, note that it contains a portion of the movies description as well as an embedding
field that contains an embedding vector that represents the semantic meaning of the text.
{
"Release Year": 1999,
"Title": "The Matrix",
"Director": "Andy Wachowski, Larry Wachowski",
"Cast": "Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss",
"Genre": "science fiction",
"Plot": "A woman is cornered by police in an abandoned hotel…",
"trail": "chunk_0",
"embedding": ["float_", [[0.03729248, 0.031402588, … ]]]
}
Ingestion
The final step in the data preparation process is ingesting the data into Elasticsearch. With our index already set up, all that remains is to use Elasticsearch’s Bulk API to insert the chunks along with their embeddings.
During ingestion, Elasticsearch will generate the HNSW graph based on the configurations defined in index_options
. Once the data is ingested, the index will be ready for vector-based queries.
Execute the ingestion script to insert the documents into Elasticsearch. Once completed, you can verify the stored data by fetching a document using the following command in Dev Tools (Kibana):
POST /movies/_search
{
"size": 1,
"query": {
"match_all": {}
}
}
This will return the stored document, allowing you to inspect its values, including the vector embedding.
With the data successfully ingested, the next step is running queries to test and evaluate our hybrid search system!
Searching the index
Now that we’ve processed and ingested our data, we can move on to the most exciting part—searching the index. Elasticsearch allows us to perform different types of searches, including keyword search, semantic search, and hybrid search.
Keyword search
Keyword search is the traditional method of retrieving documents based on exact matches or full-text relevance scoring. It works well for structured fields like titles, genres, or actor names, so if I wanted lets say to find all movies from the Matrix trilogy, I could search the index using a match query:
GET /movies/_search
{
"query": {
"match": {
"title": "Matrix"
}
}
}
Note that since our index contains multiple chunks per movie, this query would retrieve all chunks where the title contains the word “Matrix”. This means that instead of returning only one document per movie, it may return multiple results, corresponding to different chunks of the same movie’s text.
Semantic search
However, if I wanted to find all movies about the world being taken over by hostile machines enslaving humanity, a simple keyword search wouldn’t be enough. Keyword-based retrieval relies on exact matches, meaning it would only return documents that explicitly contain those words—missing relevant results that describe the same concept in different terms.
To retrieve the most relevant movies based on meaning rather than exact wording, we need to use semantic search.
First of all, we would need to pass our search query through the same embedding model (e.g., Cohere's embed-english-v3
), so we can use the resulting vector as the query_vector
in an Elasticsearch kNN query.
Once we have the query embedding, we can use it in a kNN section, which in this case, replaces the traditional query
section. Note that although the key is knn, this query will perform an kNN or ANN search, depending on the index settings.
POST /movies/_search
{
"knn": {
"field": "plot_embedding",
"k": 10,
"num_candidates": 100,
"query_vector":[...],
}
}
Breaking the parameters of the knn section:
field: "plot_embedding"
- The vector field where embeddings are stored.query_vector: [...]
- The embedding of our search query, generated using the same model as our dataset.k: 10
- Retrieve the 10 most similar movies based on vector similarity.num_candidates: 100
- Elasticsearch first retrieves the 100 closest candidates before refining the top 10 results. This is part of the accuracy tradeoff when using ANN. It’s irrelevant the index is configured to run an exact search (kNN).
You can read more about picking the right k and num_candidates in this interesting blog post by Elastic.
This approach allows Elasticsearch to find movies that may not contain the exact words "hostile machines" but describe the same idea—such as "rogue AI", "robot uprising", or "artificial intelligence taking over.", the Matrix trilogy movies might be among them, but the results might include other movies as it’s a pretty popular topic in film.
Hybrid search
Lastly, let’s say we wanted to find all movies about the world being taken over by hostile machines enslaving humanity, but only those directed by the Wachowski brothers (assume that we forgot the exact movie names).
This is where Elasticsearch's hybrid search truly shines, allowing us to combine both keyword filtering and semantic search into a single query.
Here’s an example of how to structure this query:
POST /movies/_search
{
“size”: 10,
"knn": {
"field": "plot_embedding",
"query_vector": [...],
"k": 5,
"num_candidates": 100
},
"query": {
"bool": {
"must": [
{
"match": {
"director": "Wachowski"
}
}
]
}
}
}
In this case ElasticSearch retrieves both the 5 most relevant results semantically and combines them with best matches to finally return the 10 most relevant results, both semantically and lexically.
In this case, those would be the Matrix movies that we were after!
Types of hybrid search
There are multiple ways of combining knn with other search features. ElasticSearch has several great examples in their documentation showing several more advanced knn uses.
Summary
In this article, we explored a hands-on tutorial on using Elasticsearch as a vector database. We started by selecting a dataset and an embedding model, then configured an index for semantic search, reviewing key configuration parameters along the way. Next, we prepared and ingested data, ensuring it was optimized for retrieval. Finally, we executed various search techniques, including keyword search, semantic search, and hybrid search, demonstrating how Elasticsearch can combine traditional and AI-driven retrieval methods.
In the next article, we’ll take a similar approach with OpenSearch, which, while similar to Elasticsearch, has some key differences. Hope you enjoyed the tutorial - stay tuned for the next one!