chore: draft release note v0.34.0

# Release Note

This release contains 2 breaking changes, 3 new features, 11 bug fixes, and 2 documentation improvements.

## :bomb: Breaking Changes

### Terminate Python 3.7 support

:warning: :warning: DocArray will now require Python 3.8. We can no longer assure compatibility with Python 3.7.

We decided to drop it for two reasons:

* Several dependencies of DocArray require Python 3.8.
Python [long-term support for 3.7 is ending](https://endoflife.date/python) this week. This means there will no longer 
be security updates for Python 3.7, making this a good time for us to change our requirements.

### Changes to `DocVec` Protobuf definition (#1639)

In order to fix a bug in the `DocVec` protobuf serialization described in [#1561](https://github.com/docarray/docarray/issues/1561),
we have changed the `DocVec` .proto definition.

This means that **`DocVec` objects serialized with DocArray v0.33.0 or earlier cannot be deserialized with DocArray
v.0.34.0 or later, and vice versa**.

:warning: :warning: **We strongly recommend** that everyone using Protobuf with `DocVec` upgrade to DocArray v0.34.0 or 
later.

## 🆕 Features


### Allow users to check if a Document is already indexed in a DocIndex (#1633)
You can now check if a Document has already been indexed by using the `in` keyword:

```python
from docarray.index import InMemoryExactNNIndex
from docarray import BaseDoc, DocList
from docarray.typing import NdArray
import numpy as np

class MyDoc(BaseDoc):
    text: str
    embedding: NdArray[128]

docs = DocList[MyDoc](
        [MyDoc(text="Example text", embedding=np.random.rand(128))
         for _ in range(2000)])

index = InMemoryExactNNIndex[MyDoc](docs)
assert docs[0] in index
assert MyDoc(text='New text', embedding=np.random.rand(128)) not in index
```

### Support subindexes in `InMemoryExactNNIndex` (#1617)

You can now use the [find_subindex](https://docs.docarray.org/user_guide/storing/docindex/#nested-data-with-subindex) 
method with the ExactNNSearch DocIndex.
```python
from docarray.index import InMemoryExactNNIndex
from docarray import BaseDoc, DocList
from docarray.typing import NdArray
import numpy as np

class MyDoc(BaseDoc):
    text: str
    embedding: NdArray[128]

docs = DocList[MyDoc](
        [MyDoc(text="Example text", embedding=np.random.rand(128))
         for _ in range(2000)])

index = InMemoryExactNNIndex[MyDoc](docs)
assert docs[0] in index
assert MyDoc(text='New text', embedding=np.random.rand(128)) not in index
```

### Flexible tensor types for protobuf deserialization (#1645)

You can deserialize any `DocVec` protobuf message to any tensor type,
by passing the `tensor_type` parameter to `from_protobuf`.

This means that you can choose at deserialization time if you are working with numpy, PyTorch, or TensorFlow tensors.

```python
class MyDoc(BaseDoc):
    tensor: TensorFlowTensor

da = DocVec[MyDoc](...)  # doesn't matter what tensor_type is here

proto = da.to_protobuf()
da_after = DocVec[MyDoc].from_protobuf(proto, tensor_type=TensorFlowTensor)

assert isinstance(da_after.tensor, TensorFlowTensor)
```

## ⚙ Refactoring

### Add `DBConfig` to `InMemoryExactNNSearch`
`InMemoryExactNNsearch` used to get a single parameter `index_file_path` as a constructor parameter, unlike the rest of 
the Indexers who accepted their own `DBConfig`. Now `index_file_path` is part of the `DBConfig` which allows to 
initialize from it.
This will allow us to extend this config if more parameters are needed.

The parameters of `DBConfig` can be passed at construction time as `**kwargs` making this change compatible with old 
usage.

These two initializations are equivalent.
```python
from docarray.index import InMemoryExactNNIndex
db_config = InMemoryExactNNIndex.DBConfig(index_file_path='index.bin')

index = InMemoryExactNNIndex[MyDoc](db_config=db_config)
index = InMemoryExactNNIndex[MyDoc](index_file_path='index.bin')
```

## 🐞 Bug Fixes

### Allow protobuf deserialization of `BaseDoc` with `Union` type (#1655)

Serialization of `BaseDoc` types who have `Union` types parameter of Python native types is supported.

```python
from docarray import BaseDoc
from typing import Union
class MyDoc(BaseDoc):
    union_field: Union[int, str]

docs1 = DocList[MyDoc]([MyDoc(union_field="hello")])
docs2 = DocList[BasisUnion].from_dataframe(docs_basic.to_dataframe())
assert docs1 == docs2
```

When these `Union` types involve other `BaseDoc` types, an exception is thrown.

```python
class CustomDoc(BaseDoc):
    ud: Union[TextDoc, ImageDoc] = TextDoc(text='union type')

docs = DocList[CustomDoc]([CustomDoc(ud=TextDoc(text='union type'))])

# raises an Exception
DocList[CustomDoc].from_dataframe(docs.to_dataframe())
```

###  Cast limit to integer when passed to `HNSWDocumentIndex` (#1657, #1656)

If you call `find` or `find_batched` on an `HNSWDocumentIndex`, the `limit` parameter will automatically be cast to 
`integer`.

### Moved `default_column_config` from `RuntimeConfig` to  `DBconfig` (#1648)

`default_column_config` contains specific configuration information about the columns and tables inside the backend's 
database. This was previously put inside `RuntimeConfig` which caused an error because this information is required at 
initialization time. This information has been moved inside `DBConfig` so you can edit it there.

```python
from docarray.index import HNSWDocumentIndex
import numpy as np

db_config = HNSWDocumentIndex.DBConfig()
db_conf.default_column_config.get(np.ndarray).update({'ef': 2500})
index = HNSWDocumentIndex[MyDoc](db_config=db_config)
```

### Fix issue with Protobuf (de)serialization for DocVec (#1639)

This bug caused raw Protobuf objects to be stored as DocVec columns after they were deserialized from Protobuf, making the 
data essentially inaccessible. This has now been fixed, and `DocVec` objects are identical before and after (de)serialization.

### Fix order of returned matches when `find` and `filter` combination used in `InMemoryExactNNIndex`   (#1642)

Hybrid search (find+filter) for `InMemoryExactNNIndex` was prioritizing low similarities (lower scores) for returned 
matches. Fixed by adding an option to sort matches in a reverse order based on their scores.

```python
# prepare a query
q_doc = MyDoc(embedding=np.random.rand(128), text='query')

query = (
    db.build_query()
    .find(query=q_doc, search_field='embedding')
    .filter(filter_query={'text': {'$exists': True}})
    .build()
)

results = db.execute_query(query)
# Before: results was sorted from worst to best matches
# Now: It's sorted in the correct order, showing better matches first
```

### Working with external Qdrant collections (#1632)

When using `QdrandDocumentIndex` to connect to a Qdrant DB initialized outside of `docarray` raised a `KeyError`.
This has been fixed, and now you can use `QdrantDocumentIndex` to connect to externally initialized collections.

## Other bug fixes
- Update text search to match Weaviate client's new sig (#1654)
- Fix `DocVec` equality (#1641, #1663)
- Fix exception when `summary()` called for `LegacyDocument`.  (#1637)
- Fix `DocList` and `DocVec` coersion. (#1568)
- Fix `update()` on `BaseDoc` with tensors fields (#1628)

## 📗 Documentation Improvements

- Enhance DocVec section (#1658)
- Qdrant in memory usage (#1634)

## 🤟 Contributors

We would like to thank all contributors to this release:
- Johannes Messner (@JohannesMessner)
- Nikolas Pitsillos (@npitsillos)
- Shukri (@hsm207)
- Kacper Łukawski (@kacperlukawski)
- Aman Agarwal (@agaraman0)
- maxwelljin (@maxwelljin)
- samsja (@samsja)
- Saba Sturua (@jupyterjazz)
- Joan Fontanals (@JoanFM)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: draft release note v0.34.0 #1661

Release Note

💣 Breaking Changes

Terminate Python 3.7 support

Changes to `DocVec` Protobuf definition (#1639)

🆕 Features

Allow users to check if a Document is already indexed in a DocIndex (#1633)

Support subindexes in `InMemoryExactNNIndex` (#1617)

Flexible tensor types for protobuf deserialization (#1645)

⚙ Refactoring

Add `DBConfig` to `InMemoryExactNNSearch`

🐞 Bug Fixes

Allow protobuf deserialization of `BaseDoc` with `Union` type (#1655)

Cast limit to integer when passed to `HNSWDocumentIndex` (#1657, #1656)

Moved `default_column_config` from `RuntimeConfig` to `DBconfig` (#1648)

Fix issue with Protobuf (de)serialization for DocVec (#1639)

Fix order of returned matches when `find` and `filter` combination used in `InMemoryExactNNIndex` (#1642)

Working with external Qdrant collections (#1632)

Other bug fixes

📗 Documentation Improvements

🤟 Contributors

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

chore: draft release note v0.34.0 #1661

Description

Release Note

💣 Breaking Changes

Terminate Python 3.7 support

Changes to DocVec Protobuf definition (#1639)

🆕 Features

Allow users to check if a Document is already indexed in a DocIndex (#1633)

Support subindexes in InMemoryExactNNIndex (#1617)

Flexible tensor types for protobuf deserialization (#1645)

⚙ Refactoring

Add DBConfig to InMemoryExactNNSearch

🐞 Bug Fixes

Allow protobuf deserialization of BaseDoc with Union type (#1655)

Cast limit to integer when passed to HNSWDocumentIndex (#1657, #1656)

Moved default_column_config from RuntimeConfig to DBconfig (#1648)

Fix issue with Protobuf (de)serialization for DocVec (#1639)

Fix order of returned matches when find and filter combination used in InMemoryExactNNIndex (#1642)

Working with external Qdrant collections (#1632)

Other bug fixes

📗 Documentation Improvements

🤟 Contributors

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Changes to `DocVec` Protobuf definition (#1639)

Support subindexes in `InMemoryExactNNIndex` (#1617)

Add `DBConfig` to `InMemoryExactNNSearch`

Allow protobuf deserialization of `BaseDoc` with `Union` type (#1655)

Cast limit to integer when passed to `HNSWDocumentIndex` (#1657, #1656)

Moved `default_column_config` from `RuntimeConfig` to `DBconfig` (#1648)

Fix order of returned matches when `find` and `filter` combination used in `InMemoryExactNNIndex` (#1642)