vibe.embedding_providers.base¶

Base class for embedding providers.

Embedding providers generate dense vector representations of text for semantic similarity search. This is used for: - Matching requirements to document sections - Finding similar examples for few-shot learning - Hybrid retrieval (combining with BM25)

EmbeddingProviderConfig ¶

Configuration for embedding providers.

Attributes:

model (str) –

Model identifier (e.g., "multilingual-e5-large")
dimension (int) –

Expected embedding dimension
batch_size (int) –

Maximum texts per batch request
normalize (bool) –

Whether to L2-normalize embeddings
prefix (str | None) –

Optional prefix to add to texts (e.g., "query: " for E5)
api_key (str | None) –

API key for cloud providers
base_url (str | None) –

Base URL for API endpoint

from_dict ¶

from_dict(config: dict[str, Any]) -> EmbeddingProviderConfig

Create from configuration dictionary.

EmbeddingProvider ¶

Abstract base class for embedding providers.

Subclasses must implement: - embed(): Embed a single text - embed_batch(): Embed multiple texts efficiently

The base class provides: - Configuration handling - Logging setup - Batch splitting for large inputs

dimension ¶

dimension: int

Expected embedding dimension.

model_name ¶

model_name: str

Model identifier.

init ¶

__init__(config: dict[str, Any] | None = None) -> None

Initialize the provider.

Parameters:	`config` (`dict[str, Any] \| None`, default: `None` ) – Provider configuration dictionary

embed ¶

embed(text: str) -> list[float]

Generate embedding for a single text.

Parameters:	`text` (`str`) – Input text to embed

Returns:	`list[float]` – Embedding vector as list of floats

embed_batch ¶

embed_batch(texts: list[str]) -> list[list[float]]

Generate embeddings for multiple texts.

Implementations should handle batching efficiently.

Parameters:	`texts` (`list[str]`) – List of texts to embed

Returns:	`list[list[float]]` – List of embedding vectors

embed_batch_with_splitting ¶

embed_batch_with_splitting(texts: list[str]) -> list[list[float]]

Embed texts, automatically splitting into batches if needed.

This is a utility method that subclasses can use to handle large inputs by splitting into config.batch_size chunks.

similarity ¶

similarity(embedding1: list[float], embedding2: list[float]) -> float

Compute cosine similarity between two embeddings.

If embeddings are L2-normalized, this is equivalent to dot product.