Dense vectors became the default for semantic search, but production systems reveal a different story. This guide explains when sparse vectors outperform dense embeddings, where each approach fails, and why hybrid search is becoming the engineering standard.

"Dense vectors everywhere" became the default narrative after the transformer revolution. Semantic search, RAG pipelines, and neural retrieval all converged on learned embeddings as the solution. But production deployments tell a different story: many systems struggle with cost, latency, explainability, and recall degradation at scale.

Meanwhile, sparse vectors—the lexical retrieval methods dismissed as "old-school"—quietly solve problems that dense vectors make worse. The real engineering question isn't which is better, but where each one breaks and how to combine their strengths.

Vectors: Two Fundamentally Different Philosophies

Vectors are numerical representations of text (documents, queries, sentences) that enable similarity computation via distance metrics in vector space. Two documents with similar vectors are considered semantically or lexically related.

But there are two fundamentally different approaches:

  • Sparse vectors encode explicit signals: each dimension corresponds to a known feature (term, token, n-gram). Most values are zero.
  • Dense vectors encode learned semantics: dimensions are distributed representations learned by neural networks. All values are populated.

This philosophical difference drives every trade-off that follows.

Sparse Vectors: Explicit Features and Lexical Precision

Sparse vectors are high-dimensional representations where most values are zero. Each non-zero dimension corresponds to a specific feature—typically a word, token, or n-gram.

Examples in Practice

TF-IDF (Term Frequency-Inverse Document Frequency) weights terms by how often they appear in a document relative to the corpus. A document about "Elasticsearch cluster optimization" gets high weights for those exact terms.

BM25 improves on TF-IDF with saturation functions and document length normalization. It's still the backbone of lexical search in Elasticsearch, OpenSearch, and Lucene-based systems. BM25 remains competitive in production, particularly for exact keyword matching scenarios.

SPLADE (Sparse Lexical and Expansion) is a learned sparse encoder that uses BERT's masked language modeling head to expand terms contextually. For a brief input like "vector database for scalable search," SPLADE can expand it to 118 tokens, including synonyms and related concepts, while maintaining sparsity. Recent variants like LACONIC achieve 60.2 nDCG on MTEB Retrieval benchmarks as of January 2026.

Strengths

  • Exact term matching: BM25 excels on engineering logs, error messages, financial data, legal documents, and scientific symbols where precision matters.
  • Explainability: You can inspect which terms contributed to a match and their weights. Critical for debugging and regulated environments.
  • Inverted index efficiency: Sparse vectors map naturally to inverted indexes, enabling sub-millisecond lookups with predictable latency.

Weaknesses

  • Vocabulary mismatch: A query for "automobile" won't match documents about "cars" unless you implement synonym expansion.
  • Limited generalization: Without learning (like SPLADE), sparse methods struggle with paraphrasing and semantic variation.

Dense Vectors: Learned Semantics and Compact Representations

Dense vectors are fully populated, typically ranging from 256 to 4096 dimensions depending on the model. Every dimension contributes to the representation, learned through neural networks.

Examples in Production

Sentence-BERT models like all-MiniLM-L6-v2 generate 384-dimensional vectors optimized for semantic similarity tasks.

OpenAI's text-embedding-ada-002 produces 1536-dimensional vectors, while newer models like Google's Gemini text-embedding-004 generate 768-dimensional vectors with high-quality semantic representations.

Matryoshka Representation Learning (MRL) enables flexible dimensions from 2048 down to 256, letting you trade accuracy for speed based on production constraints.

Strengths

  • Semantic similarity: Handles paraphrasing, synonyms, and conceptual relationships naturally.
  • Compact representation: A 768-dimensional float vector (3KB) is more storage-efficient than a 30,000-dimension sparse vector for semantic tasks.
  • Cross-lingual capabilities: Multilingual models can match queries and documents across languages.

Weaknesses

  • Opaque scoring: You know that two items matched, not why. Debugging poor results requires re-running embeddings and inspecting training data.
  • Recall degradation at scale: Approximate Nearest Neighbor (ANN) algorithms like HNSW trade recall for speed. At scale, tail latency increases with shard count.
  • Memory bottleneck: Every additional dimension increases memory, storage, and search latency. For 10M documents with 1536-dimensional embeddings, you need ~60GB just for vectors.

Explicit vs Implicit Meaning: Why This Matters

The difference between explicit and implicit representations has practical implications:

Debugging: With sparse vectors, you can see that "Elasticsearch" matched with weight 4.2 and "cluster" with weight 3.8. With dense vectors, you get a similarity score of 0.87 with no explanation.

Trust: In regulated industries (healthcare, finance, legal), auditors want to know why a document was retrieved. Dense vectors provide no such trail.

Domain adaptation: Adding domain-specific terms to BM25 requires updating a dictionary. For dense vectors, you need retraining or fine-tuning on domain data.

Precision, Recall, and Where Each Approach Fails

Dense Vector Failure Modes

Semantic drift: Dense models can retrieve conceptually related but factually wrong results. A query for "Python 3.11 performance" might return results about "Python optimization" in general.

Over-generalization: Dense embeddings can miss exact keyword matches that users expect, like model numbers, error codes, or API endpoints.

False positives that "feel right": Two documents about different topics using similar language get high similarity scores.

Sparse Vector Failure Modes

Synonym gaps: Without expansion, "automobile" doesn't match "car."

Long-tail phrasing: Queries using different terminology than documents fail to retrieve relevant results.

No semantic understanding: "The product is not bad" and "The product is good" have different term distributions despite similar meaning.

Performance and Scale: The Cost Reality

Latency

Sparse vectors: Inverted indexes enable sub-10ms retrieval. BM25 queries in Elasticsearch routinely complete in single-digit milliseconds.

Dense vectors: ANN search adds overhead. In production, hybrid search adds 20-40ms to pure dense retrieval, which already runs at 50ms. The majority of latency comes from embedding generation.

Hybrid systems: Most production systems see 70-90ms end-to-end latency with both sparse and dense retrieval plus Reciprocal Rank Fusion (RRF).

Cost

Sparse search (BM25): Approximately $0.00001 per query with no API calls and in-memory search.

Dense search: Requires embedding generation via API or GPU inference. Adds infrastructure cost and API call overhead.

Hybrid search: ~$0.0001 per query for basic RRF (one embedding call + two database queries). Storage costs roughly double since you're maintaining both sparse and dense indexes.

Dense vectors are not cheaper at scale. Memory and compute become the bottleneck, especially when serving millions of queries.

Hybrid Search: Why the Industry Is Converging Here

Sparse and dense vectors solve different failure modes. Hybrid search combines them:

  • Sparse vectors anchor results with lexical grounding and exact term matching
  • Dense vectors augment with semantic expansion and synonym handling

Benchmarks show hybrid retrieval improves recall by 15-30% over either approach alone. On NDCG@10 metrics, hybrid RRF achieves 0.85, an 18% improvement over dense-only methods.

Reciprocal Rank Fusion (RRF)

RRF is the most common fusion method in production:

RRF_score(doc) = Σ (1 / (k + rank_i))
  

Where k is typically 60, and rank_i is the document's rank in each retrieval method. This normalizes scores without requiring calibration between sparse and dense scoring systems.

Decision Framework: Choosing the Right Tool

Use Sparse When:

  • Precision matters: Legal documents, medical records, compliance search
  • Queries are structured: Product SKUs, error codes, log searches
  • Explainability is required: Audited systems, regulated industries
  • Exact matching is critical: API documentation, code search

Use Dense When:

  • Queries are conversational: "How do I optimize Elasticsearch for time-series data?"
  • Semantic recall is critical: Research, support ticket matching
  • Data is noisy: Social media, user-generated content
  • Cross-lingual search: Matching queries and documents in different languages

Use Hybrid When:

  • You care about quality and control
  • You operate at scale and can't afford missed recalls or false positives
  • You need predictable behavior in production
  • Your users issue diverse query types (structured and conversational)

Key Takeaways

Dense vectors are powerful—but not magic. They excel at semantic matching but fail at exact precision and explainability.

Sparse vectors are old—but not obsolete. BM25 still dominates in scenarios requiring lexical precision, and learned sparse models like SPLADE bridge the gap to semantic understanding while maintaining interpretability.

The best production systems don't choose sides. They combine sparse and dense retrieval, using each where it's strongest and minimizing failure modes. As of 2026, hybrid search is the engineering standard for serious information retrieval systems.


Sources: