Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
a97b461
chore: group extras and add instructions for pip installs
Mar 23, 2023
c346ef8
fix: throw runtime error with install instructions for hnswlib
Mar 23, 2023
9f43985
feat: add instructions for video imports
Mar 24, 2023
d427c3f
feat: add instructions for audio imports
Mar 24, 2023
8690688
feat: add instructions for 3d imports
Mar 24, 2023
c4b6673
feat: add instructions for image imports
Mar 24, 2023
2a0c350
fix: import only audiosegment from pydub
Mar 24, 2023
3242801
fix: generalize audio and image imports
Mar 24, 2023
391215d
fix: add instructions for web imports
Mar 24, 2023
e4119b3
fix: add instructions for web imports
Mar 27, 2023
3eedfec
fix: add instructions for protobuf imports
Mar 27, 2023
d027213
fix: add instructions for lz4 imports
Mar 27, 2023
e5a5f01
fix: fastapi import
Mar 27, 2023
9ffd548
fix: revert changes in protobuf import
Mar 27, 2023
840b0bc
fix: add instructions for torch, without raising error
Mar 27, 2023
e5da8c0
fix: add instructions for torch, with raising error
Mar 27, 2023
41392bc
fix: add instructions for tensorflow
Mar 27, 2023
581206a
fix: base doc io imports
Mar 27, 2023
59452b8
fix: tf in doc index abstract
Mar 27, 2023
49e6759
fix: tf in doc index abstract
Mar 27, 2023
3979340
fix: clean up imports
Mar 27, 2023
9d14836
fix: tf import in doc index
Mar 27, 2023
8437866
fix: add getattr on module level
Mar 27, 2023
d26dae4
fix: import torch for type checking
Mar 27, 2023
0e868f3
fix: add type checking
Mar 27, 2023
58b6c52
fix: test cross backend
Mar 27, 2023
090382d
fix: add missing return statement
Mar 28, 2023
7a6783b
fix: clean up
Mar 28, 2023
f87e3b4
fix: update error message
Mar 28, 2023
91f8da6
fix: remove base document init
Mar 28, 2023
96087bb
fix: clean up
Mar 28, 2023
be07278
fix: add trimesh easy extra
Mar 28, 2023
d6605e2
fix: pil immage importfix: clean up
Mar 28, 2023
a368483
chore: add lz4 to mypy missing type hint section
Mar 28, 2023
1246a4d
docs: add instructions to doc index tutorial
Mar 28, 2023
34ef472
chore: extra pandas and condense module where missing imports ignore
Mar 28, 2023
b26bd14
fix: update poetry lock
Mar 28, 2023
280a5cd
fix: missed imports
Mar 28, 2023
33f78e9
fix: clean up
Mar 28, 2023
a725c3b
fix: revert last commit
Mar 28, 2023
f24042e
revert "fix: missed imports"
Mar 28, 2023
08e3d46
fix: missed imports
Mar 28, 2023
1fe5a1e
wip
Mar 28, 2023
cf19fa5
fix: rename DocArrayProto to DocumentArrayProto (#1297)
samsja Mar 28, 2023
5f237e7
fix: docstring polish typing (#1299)
samsja Mar 28, 2023
305fff9
fix: fix for doc_string test
Mar 29, 2023
bf055c5
fix: try short version in typing init getattr
Mar 29, 2023
80a2dab
fix: shorter version in getattr
Mar 29, 2023
075ca2d
fix: remove files (#1305)
samsja Mar 29, 2023
2e65d00
fix: flatten schema of abstract index (#1294)
AnneYang720 Mar 29, 2023
fde056d
fix: add type hint for lib
Mar 29, 2023
ec5c4f4
fix: add import error to inits getattrs
Mar 29, 2023
550981d
docs: add utils section (#1307)
samsja Mar 29, 2023
e1cf96b
docs: fix docstring example of find_batched (#1308)
JohannesMessner Mar 29, 2023
3010741
docs: fix map docstring (#1311)
samsja Mar 29, 2023
9da5624
feat: elasticsearch document index (#1196)
AnneYang720 Mar 29, 2023
2570be0
fix: add case for elastic search
Mar 29, 2023
9fef701
refactor: map_docs_batch to map_docs_batched (#1312)
Mar 29, 2023
a0264f0
refactor: map_docs_batch to map_docs_batched (#1312)
Mar 29, 2023
fefbe86
fix: clean up
Mar 29, 2023
2557a61
feat: torch backend basic operation tests (#1306)
agaraman0 Mar 29, 2023
3509d27
fix: ci add --fix-missing to apt-get
Mar 30, 2023
81de695
fix: revert "fix: ci add --fix-missing to apt-get"
Mar 30, 2023
b418462
fix: ci apt-get update
Mar 30, 2023
33ff6ba
Merge remote-tracking branch 'origin/feat-rewrite-v2' into chore-pip-…
Mar 30, 2023
fb01b6b
fix: apply samis suggestions from code review
Mar 30, 2023
a40c44f
fix: apply samis suggestions from code review
Mar 30, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
docs: add utils section (#1307)
* feat: add utils for map to docs and fix docstring

Signed-off-by: samsja <sami.jaghouar@hotmail.fr>

* feat: add utils for map to docs and fix docstring

Signed-off-by: samsja <sami.jaghouar@hotmail.fr>

* feat: add utils for find and fix docstring

Signed-off-by: samsja <sami.jaghouar@hotmail.fr>

* fix: fix video ndaray docstrng

Signed-off-by: samsja <sami.jaghouar@hotmail.fr>

* fix: fix video find docstrng

Signed-off-by: samsja <sami.jaghouar@hotmail.fr>

* fix: fix map docstring

Signed-off-by: samsja <sami.jaghouar@hotmail.fr>

* fix: fix fileter docstring

Signed-off-by: samsja <sami.jaghouar@hotmail.fr>

* fix: fix add reduce

Signed-off-by: samsja <sami.jaghouar@hotmail.fr>

---------

Signed-off-by: samsja <sami.jaghouar@hotmail.fr>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
  • Loading branch information
samsja authored and anna-charlotte committed Mar 30, 2023
commit 550981de89c9be049b6215c13b91d002e87ca08f
6 changes: 3 additions & 3 deletions docarray/typing/tensor/video/video_ndarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,15 +42,15 @@ class MyVideoDoc(BaseDoc):
video_tensor=np.random.random((100, 224, 224, 3)),
)

doc_1.video_tensor.save(file_path='file_1.mp4')
doc_1.video_tensor.save(file_path='/tmp/file_1.mp4')

doc_2 = MyVideoDoc(
title='my_second_video_doc',
url='file_1.mp4',
url='/tmp/file_1.mp4',
)

doc_2.video_tensor = parse_obj_as(VideoNdArray, doc_2.url.load().video)
doc_2.video_tensor.save(file_path='file_2.mp4')
doc_2.video_tensor.save(file_path='/tmp/file_2.mp4')
```

---
Expand Down
79 changes: 43 additions & 36 deletions docarray/utils/filter.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
__all__ = ['filter_docs']

import json
from typing import Dict, List, Union

Expand All @@ -13,50 +15,55 @@ def filter_docs(
Filter the Documents in the index according to the given filter query.


EXAMPLE USAGE

.. code-block:: python
---

from docarray import DocArray, BaseDoc
from docarray.documents import Text, Image
from docarray.util.filter import filter_docs
```python
from docarray import DocArray, BaseDoc
from docarray.documents import TextDoc, ImageDoc
from docarray.utils.filter import filter_docs


class MyDocument(BaseDoc):
caption: Text
image: Image
price: int
class MyDocument(BaseDoc):
caption: TextDoc
ImageDoc: ImageDoc
price: int


docs = DocArray[MyDocument](
[
MyDocument(
caption='A tiger in the jungle',
image=Image(url='tigerphoto.png'),
price=100,
),
MyDocument(
caption='A swimming turtle', image=Image(url='turtlepic.png'), price=50
),
MyDocument(
caption='A couple birdwatching with binoculars',
image=Image(url='binocularsphoto.png'),
price=30,
),
]
)
query = {
'$and': {
'image__url': {'$regex': 'photo'},
'price': {'$lte': 50},
}
docs = DocArray[MyDocument](
[
MyDocument(
caption='A tiger in the jungle',
ImageDoc=ImageDoc(url='tigerphoto.png'),
price=100,
),
MyDocument(
caption='A swimming turtle',
ImageDoc=ImageDoc(url='turtlepic.png'),
price=50,
),
MyDocument(
caption='A couple birdwatching with binoculars',
ImageDoc=ImageDoc(url='binocularsphoto.png'),
price=30,
),
]
)
query = {
'$and': {
'ImageDoc__url': {'$regex': 'photo'},
'price': {'$lte': 50},
}
}

results = filter_docs(docs, query)
assert len(results) == 1
assert results[0].price == 30
assert results[0].caption == 'A couple birdwatching with binoculars'
assert results[0].ImageDoc.url == 'binocularsphoto.png'
```

results = filter_docs(docs, query)
assert len(results) == 1
assert results[0].price == 30
assert results[0].caption == 'A couple birdwatching with binoculars'
assert results[0].image.url == 'binocularsphoto.png'
---

:param docs: the DocArray where to apply the filter
:param query: the query to filter by
Expand Down
165 changes: 80 additions & 85 deletions docarray/utils/find.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
__all__ = ['find', 'find_batched']

from typing import Any, Dict, List, NamedTuple, Optional, Type, Union, cast

from typing_inspect import is_union_type
Expand Down Expand Up @@ -34,52 +36,48 @@ def find(
Find the closest Documents in the index to the query.
Supports PyTorch and NumPy embeddings.

.. note::
This utility function is likely to be removed once
Document Stores are available.
At that point, and in-memory Document Store will serve the same purpose
by exposing a .find() method.

.. note::
This is a simple implementation that assumes the same embedding field name for
both query and index, does not support nested search, and does not support
hybrid (multi-vector) search. These shortcoming will be addressed in future
versions.
!!! note
This is a simple implementation of exact search. If you need to do advance
search using approximate nearest neighbours search or hybrid search or
multi vector search please take a look at the [BaseDoc][docarray.base_doc.doc.BaseDoc]

EXAMPLE USAGE
---

.. code-block:: python
```python
from docarray import DocArray, BaseDoc
from docarray.typing import TorchTensor
from docarray.utils.find import find
import torch

from docarray import DocArray, BaseDoc
from docarray.typing import TorchTensor
from docarray.util.find import find

class MyDocument(BaseDoc):
embedding: TorchTensor

class MyDocument(BaseDoc):
embedding: TorchTensor

index = DocArray[MyDocument](
[MyDocument(embedding=torch.rand(128)) for _ in range(100)]
)

index = DocArray[MyDocument](
[MyDocument(embedding=torch.rand(128)) for _ in range(100)]
)
# use Document as query
query = MyDocument(embedding=torch.rand(128))
top_matches, scores = find(
index=index,
query=query,
embedding_field='embedding',
metric='cosine_sim',
)

# use Document as query
query = MyDocument(embedding=torch.rand(128))
top_matches, scores = find(
index=index,
query=query,
embedding_field='tensor',
metric='cosine_sim',
)
# use tensor as query
query = torch.rand(128)
top_matches, scores = find(
index=index,
query=query,
embedding_field='embedding',
metric='cosine_sim',
)
```

# use tensor as query
query = torch.rand(128)
top_matches, scores = find(
index=index,
query=query,
embedding_field='tensor',
metric='cosine_sim',
)
---

:param index: the index of Documents to search in
:param query: the query to search for
Expand Down Expand Up @@ -123,54 +121,51 @@ def find_batched(
Find the closest Documents in the index to the queries.
Supports PyTorch and NumPy embeddings.

.. note::
This utility function is likely to be removed once
Document Stores are available.
At that point, and in-memory Document Store will serve the same purpose
by exposing a .find() method.

.. note::
This is a simple implementation that assumes the same embedding field name for
both query and index, does not support nested search, and does not support
hybrid (multi-vector) search. These shortcoming will be addressed in future
versions.

EXAMPLE USAGE

.. code-block:: python

from docarray import DocArray, BaseDoc
from docarray.typing import TorchTensor
from docarray.util.find import find


class MyDocument(BaseDoc):
embedding: TorchTensor


index = DocArray[MyDocument](
[MyDocument(embedding=torch.rand(128)) for _ in range(100)]
)

# use DocArray as query
query = DocArray[MyDocument]([MyDocument(embedding=torch.rand(128)) for _ in range(3)])
results = find(
index=index,
query=query,
embedding_field='tensor',
metric='cosine_sim',
)
top_matches, scores = results[0]

# use tensor as query
query = torch.rand(3, 128)
results, scores = find(
index=index,
query=query,
embedding_field='tensor',
metric='cosine_sim',
)
top_matches, scores = results[0]
!!! note
This is a simple implementation of exact search. If you need to do advance
search using approximate nearest neighbours search or hybrid search or
multi vector search please take a look at the [BaseDoc][docarray.base_doc.doc.BaseDoc]


---

```python
# from docarray import DocArray, BaseDoc
# from docarray.typing import TorchTensor
# from docarray.utils.find import find
# import torch
#
#
# class MyDocument(BaseDoc):
# embedding: TorchTensor
#
#
# index = DocArray[MyDocument](
# [MyDocument(embedding=torch.rand(128)) for _ in range(100)]
# )
#
# # use DocArray as query
# query = DocArray[MyDocument]([MyDocument(embedding=torch.rand(128)) for _ in range(3)])
# results = find(
# index=index,
# query=query,
# embedding_field='embedding',
# metric='cosine_sim',
# )
# top_matches, scores = results[0]
#
# # use tensor as query
# query = torch.rand(3, 128)
# results, scores = find(
# index=index,
# query=query,
# embedding_field='embedding',
# metric='cosine_sim',
# )
# top_matches, scores = results[0]
```

---

:param index: the index of Documents to search in
:param query: the query to search for
Expand Down
Loading