Release Note
This release contains 4 bug fixes, 1 refactoring and 2 documentation improvements.
β Refactoring
Improve ElasticDocIndex logging (#1551)
More debugging logs have been added inside ElasticDocIndex.
π Bug Fixes
Allow InMemoryExactNNIndex with Optional embedding tensors (#1575)
You can now index Documents where the tensor search_field is Optional. The index will not consider these None embeddings when running a search.
import torch
from typing import Optional
from docarray import BaseDoc, DocList
from docarray.typing import TorchTensor
from docarray.index import InMemoryExactNNIndex
class EmbeddingDoc(BaseDoc):
embedding: Optional[TorchTensor[768]]
index = InMemoryExactNNIndex[TestDoc](DocList[TestDoc]([TestDoc(embedding=(torch.rand(768,) if i % 2 else None)) for i in range(5)]))
index.find(torch.rand((768,)), search_field="embedding", limit=3)
Safe is_subclass check (#1569)
In DocArray, especially when dealing with indexers, field types are checked that lead to calls to Python's is_subclass method.
This call fails under some circumstances, for instance when checked for a List or Tuple. Starting with this release, we use a safe version that does not fail for these cases.
This enables the following usage, which would otherwise fail:
from docarray import BaseDoc
from docarray.index import HnswDocumentIndex
class MyDoc(BaseDoc):
test: List[str]
index = HnswDocumentIndex[MyDoc]()
Fix AnyDoc deserialization (#1571)
AnyDoc is a schema-less special Document that adapts to the schema of the data it tries to load. However, in cases where the data contained Dictionaries or Lists, deserialization failed. This is now fixed and you can have this behavior:
from docarray.base_doc import AnyDoc, BaseDoc
from typing import Dict
class ConcreteDoc(BaseDoc):
text: str
tags: Dict[str, int]
doc = ConcreteDoc(text='text', tags={'type': 1})
any_doc = AnyDoc.from_protobuf(doc.to_protobuf())
assert any_doc.text == 'text'
assert any_doc.tags == {'type': 1}
dict method for Document view (#1559)
Prior to this fix, doc.dict() would return an empty Dictionary if doc.is_view() == True:
class MyDoc(BaseDoc):
foo: int
vec = DocVec[MyDoc]([MyDoc(foo=3)])
# before
doc = vec[0]
assert doc.is_view()
print(doc.dict())
# > {}
# after
doc = vec[0]
assert doc.is_view()
print(doc.dict())
# > {'id': 'f285db406a949a7e7ab084032800f7d8', 'foo': 3}
π Documentation Improvements
π€ Contributors
We would like to thank all contributors to this release:
Release Note
This release contains 4 bug fixes, 1 refactoring and 2 documentation improvements.
β Refactoring
Improve
ElasticDocIndexlogging (#1551)More debugging logs have been added inside
ElasticDocIndex.π Bug Fixes
Allow
InMemoryExactNNIndexwithOptionalembedding tensors (#1575)You can now index Documents where the tensor
search_fieldisOptional. The index will not consider theseNoneembeddings when running a search.Safe
is_subclasscheck (#1569)In DocArray, especially when dealing with indexers, field types are checked that lead to calls to Python's
is_subclassmethod.This call fails under some circumstances, for instance when checked for a
ListorTuple. Starting with this release, we use a safe version that does not fail for these cases.This enables the following usage, which would otherwise fail:
Fix
AnyDocdeserialization (#1571)AnyDocis a schema-less special Document that adapts to the schema of the data it tries to load. However, in cases where the data contained Dictionaries or Lists, deserialization failed. This is now fixed and you can have this behavior:dictmethod for Document view (#1559)Prior to this fix,
doc.dict()would return an empty Dictionary ifdoc.is_view() == True:π Documentation Improvements
DocListin FastAPI (docs: explain the state of doclist in fastapiΒ #1546)π€ Contributors
We would like to thank all contributors to this release: