Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 84 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,9 @@ DocArray handles your data while integrating seamlessly with the rest of your **
> - [Coming from Pydantic](#coming-from-pydantic)
> - [Coming from FastAPI](#coming-from-fastapi)
> - [Coming from a vector database](#coming-from-vector-database)
> - [Coming from Langchain](#coming-from-langchain)

DocArray was released under the open-source [Apache License 2.0](https://github.com/docarray/docarray/blob/main/LICENSE) in January 2022. It is currently a sandbox project under [LF AI & Data Foundation](https://lfaidata.foundation/).
DocArray has been distributed under the open-source [Apache License 2.0](https://github.com/docarray/docarray/blob/main/LICENSE) since January 2022. It is currently a sandbox project under [LF AI & Data Foundation](https://lfaidata.foundation/).

## Represent

Expand Down Expand Up @@ -776,6 +777,88 @@ Of course this is only one of the things that DocArray can do, so we encourage y
</details>


## Coming from Langchain

<details markdown="1">
<summary>Click to expand</summary>

With DocArray, you can connect external data to LLMs through Langchain. DocArray gives you the freedom to establish
flexible document schemas and choose from different backends for document storage.
After creating your document index, you can connect it to your Langchain app using [DocArrayRetriever](https://python.langchain.com/docs/modules/data_connection/retrievers/integrations/docarray_retriever).

Install Langchain via:
```shell
pip install langchain
```

1. Define a schema and create documents:
Comment thread
jupyterjazz marked this conversation as resolved.
```python
from docarray import BaseDoc, DocList
from docarray.typing import NdArray
from langchain.embeddings.openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

# Define a document schema
class MovieDoc(BaseDoc):
title: str
description: str
year: int
embedding: NdArray[1536]


movies = [
{"title": "#1 title", "description": "#1 description", "year": 1999},
{"title": "#2 title", "description": "#2 description", "year": 2001},
]

# Embed `description` and create documents
docs = DocList[MovieDoc](
MovieDoc(embedding=embeddings.embed_query(movie["description"]), **movie)
for movie in movies
)
```

2. Initialize a document index using any supported backend:
```python
from docarray.index import (
InMemoryExactNNIndex,
HnswDocumentIndex,
WeaviateDocumentIndex,
QdrantDocumentIndex,
ElasticDocIndex,
)

# Select a suitable backend and initialize it with data
db = InMemoryExactNNIndex[MovieDoc](docs)
```

3. Finally, initialize a retriever and integrate it into your chain!
Comment thread
jupyterjazz marked this conversation as resolved.
```python

from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain.retrievers import DocArrayRetriever


# Create a retriever
retriever = DocArrayRetriever(
index=db,
embeddings=embeddings,
search_field="embedding",
content_field="description",
)

# Use the retriever in your chain
model = ChatOpenAI()
qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)
```

Alternatively, you can use built-in vector stores. Langchain supports two vector stores: [DocArrayInMemorySearch](https://python.langchain.com/docs/modules/data_connection/vectorstores/integrations/docarray_in_memory) and [DocArrayHnswSearch](https://python.langchain.com/docs/modules/data_connection/vectorstores/integrations/docarray_hnsw).
Both are user-friendly and are best suited to small to medium-sized datasets.

</details>

## Installation

To install DocArray from the CLI, run the following command:
Expand Down
4 changes: 3 additions & 1 deletion tests/documentation/test_docs.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,5 +70,7 @@ def test_files_good(fpath):

def test_readme():
check_md_file(
fpath='README.md', memory=True, keyword_ignore=['tensorflow', 'fastapi', 'push']
fpath='README.md',
memory=True,
keyword_ignore=['tensorflow', 'fastapi', 'push', 'langchain', 'MovieDoc'],
)