lancedb_haystack ================ .. py:module:: lancedb_haystack Subpackages ----------- .. toctree:: :maxdepth: 1 /autoapi/lancedb_haystack/conversion/index /autoapi/lancedb_haystack/schema/index Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/lancedb_haystack/__about__/index /autoapi/lancedb_haystack/document_store/index /autoapi/lancedb_haystack/embedding_retriever/index /autoapi/lancedb_haystack/filters/index /autoapi/lancedb_haystack/fts_retriever/index Classes ------- .. autoapisummary:: lancedb_haystack.LanceDBDocumentStore lancedb_haystack.LanceDBEmbeddingRetriever lancedb_haystack.LanceDBFTSRetriever Package Contents ---------------- .. py:class:: LanceDBDocumentStore(database: str, table_name: str, metadata_schema: Optional[pyarrow.StructType] = None, embedding_dims: Optional[int] = None) Bases: :py:obj:`haystack.document_stores.types.DocumentStore` Stores data in LanceDB, and leverages its inbuilt search features. .. py:attribute:: _database .. py:attribute:: _table_name .. py:attribute:: _metadata_schema .. py:attribute:: _embedding_dims .. py:attribute:: db .. py:method:: table_exists() -> bool Check if the table this DocumentStore relies on already exists. :return: True if the table already exists in the LanceDB backing this DocumentStore .. py:method:: count_documents() -> int Returns how many documents are present in the document store. :return: the number of documents in the document store, or 0 if the table hasn't been created yet. .. py:method:: filter_documents(filters: Optional[Dict[str, Any]] = None) -> List[haystack.Document] Returns the documents that match the filters provided. Filters are defined as nested dictionaries that can be of two types: - Comparison - Logic Comparison dictionaries must contain the keys: - `field` - `operator` - `value` Logic dictionaries must contain the keys: - `operator` - `conditions` The `conditions` key must be a list of dictionaries, either of type Comparison or Logic. The `operator` value in Comparison dictionaries must be one of: - `==` - `!=` - `>` - `>=` - `<` - `<=` - `in` - `not in` The `operator` values in Logic dictionaries must be one of: - `NOT` - `OR` - `AND` A simple filter: ```python filters = {"field": "meta.type", "operator": "==", "value": "article"} ``` A more complex filter: ```python filters = { "operator": "AND", "conditions": [ {"field": "meta.type", "operator": "==", "value": "article"}, {"field": "meta.date", "operator": ">=", "value": 1420066800}, {"field": "meta.date", "operator": "<", "value": 1609455600}, {"field": "meta.rating", "operator": ">=", "value": 3}, { "operator": "OR", "conditions": [ {"field": "meta.genre", "operator": "in", "value": ["economy", "politics"]}, {"field": "meta.publisher", "operator": "==", "value": "nytimes"}, ], }, ], }``` :param filters: the filters to apply to the document list. :return: a list of Documents that match the given filters. .. py:method:: perform_query(query: Optional[Union[str, List[float]]] = None, filters: Optional[Dict[str, Any]] = None, top_k: Optional[int] = None) -> List[haystack.Document] Performs a query againts the LanceDB backing this DocumentStore :param query: Either a query string for FTS, a vector for vector search, or empty to just use filters. :param filters: Filters to apply to the search. See: https://docs.haystack.deepset.ai/docs/metadata-filtering :param top_k: limit the results to the top_k most relevant documents. Default: no limit :return: a list of Haystack Documents which match the search and filters. :raises ValueError: if an invalid top_k is given (ie: negative) .. py:method:: write_documents(documents: List[haystack.Document], policy: haystack.document_stores.types.DuplicatePolicy = DuplicatePolicy.NONE) -> int Writes (or overwrites) documents into the store. :param documents: a list of documents. :param policy: documents with the same ID count as duplicates. When duplicates are met, the store can: - skip: keep the existing document and ignore the new one. - overwrite: remove the old document and write the new one. - fail: an error is raised :raises DuplicateDocumentError: Exception trigger on duplicate document if `policy=DuplicatePolicy.FAIL` :return: the number of documents created or updated. :raises ValueError: if no documents are provided. .. py:method:: delete_documents(object_ids: List[str]) -> None Deletes all documents with a matching document_ids from the document store. Fails with `MissingDocumentError` if no document with this id is present in the store. :param object_ids: the object_ids to delete .. py:method:: to_dict() -> Dict[str, Any] Serializes this store to a dictionary. .. py:method:: from_dict(data: Dict[str, Any]) -> LanceDBDocumentStore :classmethod: Deserializes the store from a dictionary. .. py:class:: LanceDBEmbeddingRetriever(document_store: lancedb_haystack.document_store.LanceDBDocumentStore, filters: Optional[Dict[str, Any]] = None, top_k: Optional[int] = 10) A component for retrieving documents from an LanceDBDocumentStore using embeddings and vector similarity. .. py:attribute:: NAME :value: 'lancedb_haystack.embedding_retriever.LanceDBEmbeddingRetriever' .. py:attribute:: _document_store .. py:attribute:: _filters .. py:attribute:: _top_k .. py:method:: run(query_embedding: List[float], filters: Optional[Dict[str, Any]] = None, top_k: Optional[int] = None) Run the LanceDBEmbeddingRetriever on the given input data. :param query_embedding: Embedding of the query. :param filters: A dictionary with filters to narrow down the search space. :param top_k: The maximum number of documents to return. :return: The retrieved documents. .. py:method:: to_dict() -> Dict[str, Any] Serialize this component to a dictionary. .. py:method:: from_dict(data: Dict[str, Any]) -> LanceDBEmbeddingRetriever :classmethod: Deserialize this component from a dictionary. .. py:class:: LanceDBFTSRetriever(document_store: lancedb_haystack.document_store.LanceDBDocumentStore, filters: Optional[Dict[str, Any]] = None, top_k: Optional[int] = 10) A component for retrieving documents from an LanceDBDocumentStore using the FTS. .. py:attribute:: NAME :value: 'lancedb_haystack.fts_retriever.LanceDBFTSRetriever' .. py:attribute:: _document_store .. py:attribute:: _filters .. py:attribute:: _top_k .. py:method:: run(query: str, filters: Optional[Dict[str, Any]] = None, top_k: Optional[int] = None) Run the LanceDBFTSRetriever on the given input data. :param query: The query string for the Retriever. :param filters: A dictionary with filters to narrow down the search space. :param top_k: The maximum number of documents to return. :return: The retrieved documents. :raises ValueError: If the specified DocumentStore is not found or is not a LanceDBFTSRetriever instance. .. py:method:: to_dict() -> Dict[str, Any] Serialize this component to a dictionary. .. py:method:: from_dict(data: Dict[str, Any]) -> LanceDBFTSRetriever :classmethod: Deserialize this component from a dictionary.