vibe.review.hybrid_search¶
Unified hybrid search for document and reference parts.
Combines BM25 keyword search with embedding similarity search using Reciprocal Rank Fusion (RRF) for ranking. Supports both: - DocumentPartModel (contract parts being reviewed) - ReferencePartModel (regulatory reference sources)
EmbeddingDimensionMismatchWarning ¶
Warning raised when query and stored embeddings have different dimensions.
EmbeddingDimensionMismatchError ¶
Error raised when embedding dimensions are incompatible.
__init__ ¶
__init__(stored_dim: int, provider_dim: int, part_count: int) -> None
Initialize with stored and provider dimensions and count of affected parts.
SearchResult ¶
A single search result with scoring information.
SearchResults ¶
PartSearchStrategy ¶
Abstract strategy for searching a specific part model.
Subclasses define model-specific behavior for content access, filtering, and BM25 search approach.
apply_filters ¶
apply_filters(query: Any, **kwargs: object) -> object
Apply model-specific filters to the query.
bm25_search ¶
bm25_search(session: Session, query_text: str, base_query: Any, limit: int, language: str | None) -> list[SearchResult[T]]
Perform BM25-style keyword search.
DocumentPartStrategy ¶
Search strategy for document parts (contracts being reviewed).
model_class ¶
model_class: type[DocumentPartModel]
Return DocumentPartModel as the searchable model class.
get_content ¶
get_content(part: DocumentPartModel) -> str
Extract text content from a document part.
get_embedding_column ¶
get_embedding_column() -> Any
Return the embedding column for vector similarity search.
apply_filters ¶
apply_filters(query: Any, **kwargs: object) -> object
Filter query by document_id if provided.
bm25_search ¶
bm25_search(session: Session, query_text: str, base_query: Any, limit: int, language: str | None) -> list[SearchResult[DocumentPartModel]]
Use ParadeDB BM25 search.
ReferencePartStrategy ¶
Search strategy for reference parts (regulatory sources).
model_class ¶
model_class: type[ReferencePartModel]
Return ReferencePartModel as the searchable model class.
get_content ¶
get_content(part: ReferencePartModel) -> str
Extract text content from a reference part.
get_embedding_column ¶
get_embedding_column() -> Any
Return the embedding column for vector similarity search.
apply_filters ¶
apply_filters(query: Any, **kwargs: object) -> object
Filter query by language and/or source_id, joining with ReferenceSourceModel.
bm25_search ¶
bm25_search(session: Session, query_text: str, base_query: Any, limit: int, language: str | None) -> list[SearchResult[ReferencePartModel]]
Use ParadeDB BM25 search.
HybridSearcher ¶
Hybrid searcher combining BM25 and embedding similarity search.
Uses Reciprocal Rank Fusion (RRF) to combine results from keyword and vector search for improved retrieval quality.
Usage
For document parts¶
searcher = HybridSearcher(session, DocumentPartStrategy()) results = searcher.search("audit rights", document_id=1)
For reference parts¶
searcher = HybridSearcher(session, ReferencePartStrategy()) results = searcher.search("ICT risk management", language="en")
__init__ ¶
__init__(session: Session, strategy: PartSearchStrategy[T], embedding_provider: EmbeddingProvider | None = None, rrf_k: int = 60) -> None
Initialize the searcher.
| Parameters: |
|
|---|
search ¶
search(query: str, limit: int = 50, bm25_weight: float = 0.5, embedding_weight: float = 0.5, bm25_limit: int = 100, embedding_limit: int = 100, language: str | None = None, **filter_kwargs: object) -> SearchResults[T]
Perform hybrid search.
| Parameters: |
|
|---|
| Returns: |
|
|---|
sanitize_bm25_query ¶
sanitize_bm25_query(query: str) -> str
Sanitize a query string for ParadeDB BM25 search.
ParadeDB/Tantivy interprets certain characters as query operators. This function escapes all special characters to enable literal text search.
Special characters that need escaping: + - && || ! ( ) { } [ ] ^ " ~ * ? : \ /
get_stored_embedding_dimension ¶
get_stored_embedding_dimension(session: Session, *, document_id: int | None = None, session_id: int | None = None) -> tuple[int | None, int]
Get the dimension of stored embeddings.
| Parameters: |
|
|---|
| Returns: |
|
|---|
check_embedding_dimension_compatibility ¶
check_embedding_dimension_compatibility(session: Session, provider_dim: int, *, document_id: int | None = None, session_id: int | None = None, raise_on_mismatch: bool = True) -> bool
Check if provider dimension is compatible with stored embeddings.
| Parameters: |
|
|---|
| Returns: |
|
|---|
| Raises: |
|
|---|
clear_embeddings ¶
clear_embeddings(session: Session, *, document_id: int | None = None, session_id: int | None = None) -> int
Clear embeddings for parts.
| Parameters: |
|
|---|
| Returns: |
|
|---|
search_document_parts ¶
search_document_parts(session: Session, query: str, document_id: int, limit: int = 50, language: str | None = None, embedding_provider: EmbeddingProvider | None = None) -> SearchResults[DocumentPartModel]
Search document parts for a query.
search_reference_parts ¶
search_reference_parts(session: Session, query: str, limit: int = 50, language: str | None = None, source_id: str | None = None, embedding_provider: EmbeddingProvider | None = None) -> SearchResults[ReferencePartModel]
Search reference parts for a query.