RAG with OpenSearch: A Crash Course for Building RAG Applications
In this workshop, you’ll learn how to build Retrieval-Augmented Generation (RAG) applications with OpenSearch from end to end.
We’ll cover the essentials: how to preprocess your data, generate and use embeddings, and store them in a vector database.
You’ll see how to combine traditional and semantic search, and how to use pre- and post-processing to improve results. By the end, you’ll know how to set up a simple but effective RAG pipeline with OpenSearch - covering everything from data ingestion to generating answers with large language models.
We'll walk through the full RAG pipeline, including:
- Data preparation: Structuring documents for chunking, metadata enrichment, and indexing
- Embeddings: Choosing the right model and generating semantic vectors at scale
- Vector storage: Storing and retrieving vectors efficiently in OpenSearch with k-NN or HNSW
- Hybrid search: Blending keyword-based and semantic retrieval for better relevance
- LLM integration: Passing context to your language model and shaping better responses with pre- and post-processing techniques
You'll leave with a clear understanding of how to set up and optimize a RAG system with OpenSearch - from ingestion and indexing to querying and generating accurate, grounded answers with LLMs.
Ideal for developers, data engineers, and AI practitioners looking to move from proof-of-concept to production-ready RAG solutions.
Presenter
- Liza Katz, GenAI Lead, BigData Boutique
About BigData Boutique
We are a team of Big Data and ML/AI experts, with over 15 years of experience - providing end-to-end consulting services for modern data platforms.
BigData Boutique has achieved the Amazon Web Service (AWS) Service Delivery designation for Amazon OpenSearch Service, recognizing that BigData Boutique has deep technical knowledge, experience, and proven success in delivering Amazon OpenSearch Service to customers.
Follow the sun availability with varying SLAs for production incident response and a shared Slack channel, to ensure smooth operation of your data clusters.