As the OpenSearch project continues to evolve, it forges its own path by prioritizing features driven by the needs and feedback of its community and users. In this spirit of progress, we’ve highlighted the most significant improvements and advancements introduced in 2024.
Here is a list of the more significant features and advancements made in 2024 by the OpenSearch project:
1. Vector Database and Generative AI
Vector Search Performance
By providing multiple underlying engines for vector search, the C++ FAISS being one of them, OpenSearch is already well positioned to offer high performance vector search via both kNN and ANN. In 2024, there have been multiple optimizations to further enable speed ups in vector search in the engine. For example, via enabling concurrent segment search, better machine resources management, and enabling Lucene's SIMD optimizations.
Built-in Vector Quantization
As part of the performance and storage optimizations for vector search in OpenSearch, the kNN plugin supports various quantization techniques such as byte-vector and Faiss FP16 that are now built-in and running during indexing.
Neural Search
OpenSearch expanded its capabilities by integrating neural search, leveraging machine learning to provide more intelligent and context-aware search results. This advancement allows OpenSearch to handle complex queries by understanding intent and semantics rather than relying solely on keyword matching. This paves the way for more precise and nuanced search experiences. Read more about it here. Also interesting is the easy connectivity to 3rd party models via ML Connectors.
Hybrid Search
Hybrid search combines traditional text-based search with vector similarity search, offering a dual approach to retrieving data. This combination enhances the relevance of results by blending the strengths of exact keyword matching with approximate nearest neighbor (ANN) search, critical for AI-driven applications. OpenSearch offers a dedicated API for that end.
OpenSearch Assistant Toolkit
Aimed at simplifying user interaction with the platform, OpenSearch introduced the Assistant Toolkit. This toolkit helps users craft complex queries, generate visualizations, and gain insights without requiring deep technical expertise. The goal is to democratize access to OpenSearch’s powerful analytics capabilities by lowering the barrier to entry.
2. Search Performance Enhancements
Improved Query Latency
OpenSearch delivered notable improvements in query performance across the board. Some of the most popular query types, including term, range, and Boolean queries, now exhibit 15% to 98% faster response times (source). OpenSearch 2.17 is 6x faster than 1.3, and more performance improvements are planned for 2025 (source) These enhancements translate to a smoother and more responsive user experience, even at scale.
Rerank Processors
The addition of ByField
rerank processor and the improvements of the whole rerank API is yet another step at improving retrieval and relevance performance of OpenSearch, by performing a second-level rerank on search results based on a specified target field or an ML model. Read more about it here.
MatchOnlyText field
The OpenSearch Project introduced a new field type called match_only_text
in version 2.12. This new field type is designed for full-text search scenarios where scoring and positional information of terms within a document are not critical. If you’re working with large datasets in OpenSearch and looking to optimize storage and performance, then the match_only_text
field could be an interesting option to explore.
Query Insights
Query Insights is a new feature recently added by OpenSearch to help with identifying long running and costly queries, at scale, in real-time. By identifying those queries in time you should be able to optimize your system on the go by rewriting costly queries, optimizing hardware, or pulling the right levers of the cluster. Most notably, the Top N queries monitoring and query grouping features are at the heart of the new Query Insights feature.
3. Observability and Analytics
Advanced Observability Features
OpenSearch bolstered its observability stack, providing users with enhanced tools for log analytics, monitoring, and visualizations. This ensures that users can track system health, diagnose performance bottlenecks, and perform root cause analysis more effectively.
Security Analytics
Security is a growing focus for OpenSearch. New security analytics tools were introduced to help users monitor threats, detect anomalies, and secure their data pipelines. These features are essential for organizations that prioritize compliance and data protection.
4. Performance and Scalability
Segment Replication
OpenSearch introduced segment replication, an innovative replication strategy that boosted ingestion rate throughput by up to 25%. This results in faster data indexing and reduces lag, making it easier for users to keep their datasets updated in near real-time.
Remote-Backed Storage
Version 2.10 saw the debut of remote-backed storage, allowing OpenSearch to write segments directly to object storage solutions such as Amazon S3. This approach decouples compute from storage, offering better scalability, data durability, and cost-efficiency by leveraging cloud storage for large datasets.
Vector Search on Searchable Snapshots
Searchable Snapshots allow big cost savings on OpenSearch clusters, by allowing to execute search on cold copies of the data for infrequent access. Noww, Searchable Snapshots also support vector search.
5. Governance and Community
OpenSearch Software Foundation
A landmark announcement in September 2024 saw AWS transfer OpenSearch governance to the Linux Foundation, creating the OpenSearch Software Foundation (OSSF). This move establishes a vendor-neutral governance model, fostering open collaboration among contributors, businesses, and developers. By anchoring the project within the Linux Foundation, OpenSearch is positioned to grow sustainably as a key player in the search, analytics, and observability ecosystems.
6. Amazon OpenSearch Service Updates
Multi-AZ with Standby
To enhance system resilience, OpenSearch introduced multi-availability zone (AZ) standby deployments. This ensures high availability by distributing data across multiple zones, safeguarding against regional failures, and ensuring minimal downtime in the event of disruptions.
Serverless Scaling
OpenSearch now supports serverless scaling, allowing users to elastically scale their infrastructure based on demand. This is particularly beneficial for organizations with fluctuating workloads, as they can avoid over-provisioning and only pay for the resources they consume.
Custom Plugins Support
After many years of anticipation, the managed service by Amazon now support installing custom plugins.
AI/ML Connectors
OpenSearch Service has supported both lexical and vector search for a while now; however, configuring semantic search required building a framework to integrate machine learning (ML) models to ingest and search. The neural search feature facilitates text-to-vector transformation during ingestion and search and with the new AI/ML Connectors to Amazon Sagemaker and Amazon Bedrock it's easier to integrate embeddings in both ingestion and search.
Submit Comment