What is Apache Solr?

Apache Solr is an open-source search platform built on Apache Lucene. It provides full-text search, faceted navigation, real-time indexing, and distributed search through SolrCloud. Written in Java, Solr exposes a REST API for indexing and querying, and supports multiple document formats including JSON, XML, CSV, and rich documents like PDF and Word through Apache Tika integration.

Solr has been a production search engine for over two decades. Organizations use it to power e-commerce product search, enterprise knowledge bases, content management systems, and geospatial applications. It remains one of the most deployed search platforms worldwide, though its ecosystem has shifted considerably in recent years.

History of Apache Solr

Yonik Seeley created Solr in 2004 at CNET Networks to power the site's internal search. CNET open-sourced it in 2006, and the Apache Software Foundation accepted it as a top-level project shortly after.

In 2010, the Solr and Lucene projects merged under a single Apache umbrella, aligning their release cycles and development communities. Solr 4.0 (2012) introduced SolrCloud, bringing native distributed search with automatic sharding, leader election, and cluster coordination through Apache ZooKeeper.

Solr 9, the current major release, moved to Java 11+ as a minimum requirement, dropped several legacy APIs, and modernized the internal module system. The release cadence has slowed compared to earlier years, with the community focusing on stability and Lucene alignment rather than rapid feature additions.

Key Features

Full-Text Search

Solr delivers powerful full-text search backed by Lucene's inverted index. It supports analyzers, tokenizers, stemmers, and synonym expansion out of the box. Query parsers range from the simple Lucene syntax to the more expressive Extended DisMax (eDisMax), which handles complex relevance tuning across multiple fields.

Faceted Search and Navigation

Faceting is one of Solr's strongest capabilities. It supports field facets, range facets, pivot facets, and JSON facets for building layered navigation interfaces. E-commerce sites commonly use Solr facets to let shoppers filter by category, price range, brand, and attributes without separate database queries.

SolrCloud - Distributed Search

SolrCloud provides horizontal scalability through automatic sharding and replication. Apache ZooKeeper handles cluster coordination, leader election, and configuration management. Collections can be split across multiple nodes, with queries automatically distributed and results merged.

Rich Document Handling

Through Apache Tika integration, Solr can index content from PDFs, Word documents, spreadsheets, and other binary formats directly. This makes it a natural fit for enterprise document management and knowledge base applications.

Near Real-Time Search

Solr supports near real-time (NRT) search, making newly indexed documents available for queries within seconds of ingestion. Soft commits expose documents to searchers without the overhead of a full hard commit to disk.

Extensibility

Solr's plugin architecture allows custom request handlers, search components, update processors, and query parsers. The community maintains plugins for everything from language detection to machine learning reranking.

Common Use Cases

E-Commerce Product Search

Online retailers use Solr to power product search with faceted navigation, spell correction, and relevance boosting. Its faceting capabilities make it particularly well-suited for catalog search where shoppers need to filter across dozens of product attributes.

Enterprise Search

Large organizations deploy Solr to search across internal documents, wikis, email archives, and databases. Tika integration lets it handle diverse document formats without preprocessing, and access control plugins restrict results based on user permissions.

Content Management and Publishing

Media companies and publishers use Solr to index and search articles, metadata, and multimedia content. Its support for highlighting, spell checking, and "more like this" queries powers content discovery features.

Geospatial Search

Solr includes built-in support for geospatial queries -- point-radius searches, bounding box filters, and distance-based sorting. Real estate platforms, logistics companies, and location-based services use these capabilities to find records within geographic boundaries.

Research and Data Discovery

Academic institutions and research organizations use Solr to build discovery interfaces over large datasets. The Apache Solr-based Blacklight project and VuFind are widely used in libraries and digital repositories.

Apache Solr vs Elasticsearch

Solr and Elasticsearch are both built on Apache Lucene and serve overlapping use cases. Elasticsearch gained significant market share starting around 2014, driven by the ELK Stack's popularity for log analytics and its developer-friendly JSON API.

Solr's strengths include mature faceting, XML/CSV ingestion support, and a long track record in enterprise search. Elasticsearch tends to have faster release cycles, broader client library support, and a larger ecosystem of commercial integrations.

For teams evaluating both options, the choice often comes down to existing expertise, specific feature requirements, and ecosystem preferences.

Apache Solr vs OpenSearch

OpenSearch forked from Elasticsearch 7.10.2 in 2021 under the Apache 2.0 license. Like Elasticsearch, it uses Lucene under the hood but has diverged with its own feature set, including built-in observability tools and security plugins.

For a detailed comparison covering architecture, query capabilities, vector search, and ecosystem differences, see our in-depth post on Apache Solr vs OpenSearch.

Migrating From Apache Solr

Teams migrating away from Solr typically move to Elasticsearch or OpenSearch. Both targets share the same Lucene foundation, which simplifies the transition at the search layer, though schema design, query syntax, and operational tooling differ substantially.

Key migration considerations include schema mapping between Solr fields and Elasticsearch/OpenSearch mappings, query translation from Solr's query parser syntax to Query DSL, and infrastructure changes from ZooKeeper-based coordination to built-in cluster management.

For practical guidance, see our guide to migrating from Apache Solr to OpenSearch and the companion post on schema migration from Solr to Elasticsearch/OpenSearch.

Need Help With Apache Solr?

Whether you are running Solr in production, planning a migration to OpenSearch or Elasticsearch, or evaluating Solr for a new project, working with experienced professionals saves time and reduces risk. BigData Boutique has deep expertise in Solr consulting, performance tuning, SolrCloud architecture, and search platform migrations. Learn more about our Solr consulting services.