Hybrid Search
Combine BM25 and vector ranking with Reciprocal Rank Fusion.
Hybrid search combines keyword relevance (BM25) and semantic similarity (vector KNN) into a single ranked result list using Reciprocal Rank Fusion (RRF). This is useful when neither keyword matching nor vector similarity alone produces the best ranking.
When to use hybrid search
- Keyword search alone works well when the user knows the exact terminology, but misses semantically related content that uses different words.
- Vector search alone captures semantic similarity, but can rank irrelevant results highly if the embedding space is noisy or the query is ambiguous.
- Hybrid search hedges between both. If a result ranks highly on both signals, it floats to the top. If it ranks highly on only one, it still appears but lower.
Use hybrid search when your data has both structured text fields (titles, descriptions, tags) and embedding vectors, and you want queries to benefit from both signals.
Schema setup
Hybrid search requires both a String @index field (for BM25) and a Vector(N) @index field (for nearest-neighbor):
node Person {
name: String @key
bio: String @index
embedding: Vector(1536) @index
}rrf() syntax
rrf() appears in the order clause and takes two ranking expressions as arguments:
query hybrid_search($term: String, $vec: Vector) {
match {
$p: Person
}
order rrf(nearest($p.embedding, $vec), bm25($p.bio, $term))
return { $p.name, $p.bio }
}omnigraph read ./my-graph \
--query queries.gq \
--name hybrid_search \
--params '{"term": "ML researcher", "vec": [0.021, -0.003, 0.118, ...]}'[
{ "name": "Dr. Sarah Kim", "bio": "Machine learning researcher at..." },
{ "name": "James Chen", "bio": "Applied ML engineer focused on..." }
]How RRF score is computed
Reciprocal Rank Fusion does not combine raw scores. It combines ranks. For each result, RRF computes:
score = 1 / (k + rank_a) + 1 / (k + rank_b)Where:
rank_ais the result's rank in the first ranking (e.g.,nearest())rank_bis the result's rank in the second ranking (e.g.,bm25())kis a constant that controls how much weight is given to top-ranked results
A result that ranks #1 on both lists gets a higher RRF score than a result that ranks #1 on one list and #100 on the other. This naturally promotes results that are relevant according to both signals.
Tuning the fusion constant k
The default value of k is 60, which is the standard RRF constant from the original paper. A lower k gives more weight to the top-ranked results in each list; a higher k smooths the differences between ranks.
In most cases the default works well. If you find that one signal is dominating (e.g., vector results always win), consider whether the issue is the k value or whether one of the indexed fields needs better data quality.
Combining with filters and traversal
Hybrid ranking composes with match filters and graph traversal. The filter narrows the candidate set before ranking:
query experts_at_company($term: String, $vec: Vector, $company: String) {
match {
$p: Person
search($p.bio, $term)
$p worksAt $c: Company { name: $company }
}
order rrf(nearest($p.embedding, $vec), bm25($p.bio, $term))
return { $p.name, $p.bio, $c.name }
}This query:
- Filters to people whose bio matches the keyword (via inverted index)
- Constrains to people who work at a specific company (via graph traversal)
- Ranks the remaining candidates by hybrid RRF score
Graph-constrained reranking is one of Omnigraph's key advantages over standalone vector databases: you can narrow candidates by graph structure before applying expensive ranking.
Limiting results
Use limit to cap the number of results:
query top_matches($term: String, $vec: Vector) {
match {
$p: Person
}
order rrf(nearest($p.embedding, $vec), bm25($p.bio, $term))
limit 10
return { $p.name, $p.bio }
}