I created a @dataclass with a custom field for holding a collection of images with 2 embeddings each. Since it is better to wrap the image collection with a document, I have started the following modeling:
from typing import TypeVar, List
from docarray import dataclass, field, Document
from docarray.typing import Image, Text, JSON
TrialImages = TypeVar('TrialImages', bound=str)
def trial_images_setter(value: List[str]) -> Document:
# TODO: this is still not a multi-modal document, but I think it is fine for now
doc = Document(modality='trial_images')
doc.chunks = [Document(uri=uri, modality='image').load_uri_to_image_tensor() for uri in value]
return doc
def trial_images_getter(doc: Document) -> List[str]:
return [d.uri for d in doc.chunks]
@dataclass
class Lipstick:
brand: Text
color: Text
nickname: Text
meta: JSON
product_image: Image
# nested document for embeddings of skin and lip colors from all the trials
trial_images: TrialImages = field(setter=trial_images_setter, getter=trial_images_getter, default_factory=[])
While this is good, I wasn't able to get the trial_images from the DocumentArray.
Document Summary & Error Log:
📄 Document: 6ec9e9fe6ea097253cedcb8b938a303f
└── 💠 Chunks
├── 📄 Document: 4ef01f5798e058fa9b4df48791414c20
│ ╭──────────────────────┬───────────────────────────────────────────────────────╮
│ │ Attribute │ Value │
│ ├──────────────────────┼───────────────────────────────────────────────────────┤
│ │ parent_id │ 6ec9e9fe6ea097253cedcb8b938a303f │
│ │ granularity │ 1 │
│ │ text │ MAC │
│ │ modality │ text │
│ ╰──────────────────────┴───────────────────────────────────────────────────────╯
├── 📄 Document: 112c0f846fc7d2133ca92e21e675ae6b
│ ╭──────────────────────┬───────────────────────────────────────────────────────╮
│ │ Attribute │ Value │
│ ├──────────────────────┼───────────────────────────────────────────────────────┤
│ │ parent_id │ 6ec9e9fe6ea097253cedcb8b938a303f │
│ │ granularity │ 1 │
│ │ text │ Marrakesh │
│ │ modality │ text │
│ ╰──────────────────────┴───────────────────────────────────────────────────────╯
├── 📄 Document: 7d35324e75a8a4d0af507b90d733584d
│ ╭──────────────────────┬───────────────────────────────────────────────────────╮
│ │ Attribute │ Value │
│ ├──────────────────────┼───────────────────────────────────────────────────────┤
│ │ parent_id │ 6ec9e9fe6ea097253cedcb8b938a303f │
│ │ granularity │ 1 │
│ │ text │ 麻辣鸡丝 │
│ │ modality │ text │
│ ╰──────────────────────┴───────────────────────────────────────────────────────╯
├── 📄 Document: f2105af51c6b56f7b1a8bd2af94e28fe
│ ╭──────────────────────┬───────────────────────────────────────────────────────╮
│ │ Attribute │ Value │
│ ├──────────────────────┼───────────────────────────────────────────────────────┤
│ │ parent_id │ 6ec9e9fe6ea097253cedcb8b938a303f │
│ │ granularity │ 1 │
│ │ tags │ {'type': ['口红', '哑光']} │
│ │ modality │ json │
│ ╰──────────────────────┴───────────────────────────────────────────────────────╯
├── 📄 Document: 42b6effc1ed1fe7c75c907b212d0e63d
│ ╭─────────────┬────────────────────────────────────────────────────────────────╮
│ │ Attribute │ Value │
│ ├─────────────┼────────────────────────────────────────────────────────────────┤
│ │ parent_id │ 6ec9e9fe6ea097253cedcb8b938a303f │
│ │ granularity │ 1 │
│ │ tensor │ <class 'numpy.ndarray'> in shape (600, 640, 3), dtype: uint8 │
│ │ uri │ https://s1.vika.cn/space/2022/06/08/3709d7793d964e5393bcabfc9… │
│ │ modality │ image │
│ ╰─────────────┴────────────────────────────────────────────────────────────────╯
└── 📄 Document: 80dd7ee3fd6a6b9c258baf714376b937
╭──────────────────────┬───────────────────────────────────────────────────────╮
│ Attribute │ Value │
├──────────────────────┼───────────────────────────────────────────────────────┤
│ parent_id │ 6ec9e9fe6ea097253cedcb8b938a303f │
│ granularity │ 1 │
│ modality │ trial_images │
╰──────────────────────┴───────────────────────────────────────────────────────╯
└── 💠 Chunks
├── 📄 Document: e2ff2813f6e7e5eebefdb95b241db382
│ ╭─────────────┬────────────────────────────────────────────────────────────────╮
│ │ Attribute │ Value │
│ ├─────────────┼────────────────────────────────────────────────────────────────┤
│ │ parent_id │ 80dd7ee3fd6a6b9c258baf714376b937 │
│ │ granularity │ 1 │
│ │ tensor │ <class 'numpy.ndarray'> in shape (283, 175, 3), dtype: uint8 │
│ │ uri │ https://s1.vika.cn/space/2022/06/08/7f3680ba48114ebeba83b2938… │
│ │ modality │ image │
│ ╰─────────────┴────────────────────────────────────────────────────────────────╯
├── 📄 Document: 27e552e852f86eb8d349a213c1d87755
│ ╭─────────────┬────────────────────────────────────────────────────────────────╮
│ │ Attribute │ Value │
│ ├─────────────┼────────────────────────────────────────────────────────────────┤
│ │ parent_id │ 80dd7ee3fd6a6b9c258baf714376b937 │
│ │ granularity │ 1 │
│ │ tensor │ <class 'numpy.ndarray'> in shape (260, 146, 3), dtype: uint8 │
│ │ uri │ https://s1.vika.cn/space/2022/06/08/5577cf7ebfa6459eafa0c2cd9… │
│ │ modality │ image │
│ ╰─────────────┴────────────────────────────────────────────────────────────────╯
├── 📄 Document: 88294367d4ed0dd3019909e9a55350dd
│ ╭─────────────┬────────────────────────────────────────────────────────────────╮
│ │ Attribute │ Value │
│ ├─────────────┼────────────────────────────────────────────────────────────────┤
│ │ parent_id │ 80dd7ee3fd6a6b9c258baf714376b937 │
│ │ granularity │ 1 │
│ │ tensor │ <class 'numpy.ndarray'> in shape (606, 1075, 3), dtype: uint8 │
│ │ uri │ https://s1.vika.cn/space/2022/06/08/104b47a6afec4ef68472747ef… │
│ │ modality │ image │
│ ╰─────────────┴────────────────────────────────────────────────────────────────╯
└── 📄 Document: c721a22350a76e73952ccba0a5cf9e82
╭─────────────┬────────────────────────────────────────────────────────────────╮
│ Attribute │ Value │
├─────────────┼────────────────────────────────────────────────────────────────┤
│ parent_id │ 80dd7ee3fd6a6b9c258baf714376b937 │
│ granularity │ 1 │
│ tensor │ <class 'numpy.ndarray'> in shape (602, 1076, 3), dtype: uint8 │
│ uri │ https://s1.vika.cn/space/2022/06/08/b16b5bf313294cbeb1434bc5d… │
│ modality │ image │
╰─────────────┴────────────────────────────────────────────────────────────────╯
╭───────────────────── Documents Summary ─────────────────────╮
│ │
│ Length 1 │
│ Homogenous Documents True │
│ Has nested Documents in ('chunks',) │
│ Common Attributes ('id', 'embedding', 'chunks') │
│ Multimodal dataclass True │
│ │
╰─────────────────────────────────────────────────────────────╯
╭──────────────────────── Attributes Summary ────────────────────────╮
│ │
│ Attribute Data type #Unique values Has empty value │
│ ──────────────────────────────────────────────────────────────── │
│ chunks ('ChunkArray',) 1 False │
│ embedding ('ndarray',) 1 False │
│ id ('str',) 1 False │
│ │
╰────────────────────────────────────────────────────────────────────╯
╭─ DocumentArrayAnnlite Config ─╮
│ │
│ n_dim 50 │
│ metric cosine │
│ serialize_config {} │
│ data_path ./data │
│ ef_construction None │
│ ef_search None │
│ max_connection None │
│ columns [] │
│ │
╰───────────────────────────────╯
⠹ Working... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0:00:00 0% ETA: -:--:-- ERROR lipstickTrialImageExecutor/rep-0@89864 IndexError('Unsupported index type builtins.float: 5.0') [06/09/22 12:10:58]
add "--quiet-error" to suppress the exception details
╭─────────────────────────────────────────────────── Traceback (most recent call last) ────────────────────────────────────────────────────╮
│ /Users/simon/.local/share/virtualenvs/lipstick-db-nPH6fbgy/lib/python3.8/site-packages/jina/serve/runtimes/worker/__init__.py:162 in │
│ process_data │
│ │
│ 159 │ │ │ │ if self.logger.debug_enabled: │
│ 160 │ │ │ │ │ self._log_data_request(requests[0]) │
│ 161 │ │ │ │ │
│ ❱ 162 │ │ │ │ return await self._data_request_handler.handle(requests=requests) │
│ 163 │ │ │ except (RuntimeError, Exception) as ex: │
│ 164 │ │ │ │ self.logger.error( │
│ 165 │ │ │ │ │ f'{ex!r}' │
│ │
│ /Users/simon/.local/share/virtualenvs/lipstick-db-nPH6fbgy/lib/python3.8/site-packages/jina/serve/runtimes/request_handlers/data_reques… │
│ in handle │
│ │
│ 147 │ │ ) │
│ 148 │ │ │
│ 149 │ │ # executor logic │
│ ❱ 150 │ │ return_data = await self._executor.__acall__( │
│ 151 │ │ │ req_endpoint=requests[0].header.exec_endpoint, │
│ 152 │ │ │ docs=docs, │
│ 153 │ │ │ parameters=params, │
│ │
│ /Users/simon/.local/share/virtualenvs/lipstick-db-nPH6fbgy/lib/python3.8/site-packages/jina/serve/executors/__init__.py:272 in __acall__ │
│ │
│ 269 │ │ # noqa: DAR201 │
│ 270 │ │ """ │
│ 271 │ │ if req_endpoint in self.requests: │
│ ❱ 272 │ │ │ return await self.__acall_endpoint__(req_endpoint, **kwargs) │
│ 273 │ │ elif __default_endpoint__ in self.requests: │
│ 274 │ │ │ return await self.__acall_endpoint__(__default_endpoint__, **kwargs) │
│ 275 │
│ │
│ /Users/simon/.local/share/virtualenvs/lipstick-db-nPH6fbgy/lib/python3.8/site-packages/jina/serve/executors/__init__.py:295 in │
│ __acall_endpoint__ │
│ │
│ 292 │ │ │ if iscoroutinefunction(func): │
│ 293 │ │ │ │ return await func(self, **kwargs) │
│ 294 │ │ │ else: │
│ ❱ 295 │ │ │ │ return func(self, **kwargs) │
│ 296 │ │
│ 297 │ @property │
│ 298 │ def workspace(self) -> Optional[str]: │
│ │
│ /Users/simon/.local/share/virtualenvs/lipstick-db-nPH6fbgy/lib/python3.8/site-packages/jina/serve/executors/decorators.py:180 in │
│ arg_wrapper │
│ │
│ 177 │ │ │ │ def arg_wrapper( │
│ 178 │ │ │ │ │ executor_instance, *args, **kwargs │
│ 179 │ │ │ │ ): # we need to get the summary from the executor, so we need to access │
│ the self │
│ ❱ 180 │ │ │ │ │ return fn(executor_instance, *args, **kwargs) │
│ 181 │ │ │ │ │
│ 182 │ │ │ │ self.fn = arg_wrapper │
│ 183 │
│ │
│ /Users/simon/Documents/git-repo/cool/lipstick-db/executor.py:46 in index │
│ │
│ 43 class LipstickTrialImageExecutor(Executor): │
│ 44 │ @requests(on='/index') │
│ 45 │ def index(self, docs: DocumentArray, **kwargs): │
│ ❱ 46 │ │ trial_images = docs['@.[trial_images]c'] │
│ 47 │ │ return trial_images │
│ 48 │ │
│ 49 │ def _get_face_mesh(self, img_rgb): │
│ │
│ /Users/simon/.local/share/virtualenvs/lipstick-db-nPH6fbgy/lib/python3.8/site-packages/docarray/array/mixins/getitem.py:55 in │
│ __getitem__ │
│ │
│ 52 │ │ │ return self._get_doc_by_offset(int(index)) │
│ 53 │ │ elif isinstance(index, str): │
│ 54 │ │ │ if index.startswith('@'): │
│ ❱ 55 │ │ │ │ return self.traverse_flat(index[1:]) │
│ 56 │ │ │ else: │
│ 57 │ │ │ │ return self._get_doc_by_id(index) │
│ 58 │ │ elif isinstance(index, slice): │
│ │
│ /Users/simon/.local/share/virtualenvs/lipstick-db-nPH6fbgy/lib/python3.8/site-packages/docarray/array/mixins/traverse.py:195 in │
│ traverse_flat │
│ │
│ 192 │ │ │ return self │
│ 193 │ │ │
│ 194 │ │ leaves = self.traverse(traversal_paths, filter_fn=filter_fn) │
│ ❱ 195 │ │ return self._flatten(leaves) │
│ 196 │ │
│ 197 │ def flatten(self) -> 'DocumentArray': │
│ 198 │ │ """Flatten all nested chunks and matches into one :class:`DocumentArray`. │
│ │
│ /Users/simon/.local/share/virtualenvs/lipstick-db-nPH6fbgy/lib/python3.8/site-packages/docarray/array/mixins/traverse.py:234 in _flatten │
│ │
│ 231 │ def _flatten(sequence) -> 'DocumentArray': │
│ 232 │ │ from ... import DocumentArray │
│ 233 │ │ │
│ ❱ 234 │ │ return DocumentArray(list(itertools.chain.from_iterable(sequence))) │
│ 235 │
│ 236 │
│ 237 def _parse_path_string(p: str) -> Dict[str, str]: │
│ │
│ /Users/simon/.local/share/virtualenvs/lipstick-db-nPH6fbgy/lib/python3.8/site-packages/docarray/array/mixins/traverse.py:108 in traverse │
│ │
│ 105 │ │ """ │
│ 106 │ │ traversal_paths = re.sub(r'\s+', '', traversal_paths) │
│ 107 │ │ for p in _re_traversal_path_split(traversal_paths): │
│ ❱ 108 │ │ │ yield from self._traverse(self, p, filter_fn=filter_fn) │
│ 109 │ │
│ 110 │ @staticmethod │
│ 111 │ def _traverse( │
│ │
│ /Users/simon/.local/share/virtualenvs/lipstick-db-nPH6fbgy/lib/python3.8/site-packages/docarray/array/mixins/traverse.py:141 in │
│ _traverse │
│ │
│ 138 │ │ │ │ for d in docs: │
│ 139 │ │ │ │ │ for attribute in group_dict['attributes']: │
│ 140 │ │ │ │ │ │ yield from TraverseMixin._traverse( │
│ ❱ 141 │ │ │ │ │ │ │ d.get_multi_modal_attribute(attribute)[cur_slice], │
│ 142 │ │ │ │ │ │ │ remainder, │
│ 143 │ │ │ │ │ │ │ filter_fn=filter_fn, │
│ 144 │ │ │ │ │ │ ) │
│ │
│ /Users/simon/.local/share/virtualenvs/lipstick-db-nPH6fbgy/lib/python3.8/site-packages/docarray/document/mixins/multimodal.py:119 in │
│ get_multi_modal_attribute │
│ │
│ 116 │ │ position = self._metadata['multi_modal_schema'][attribute].get('position') │
│ 117 │ │ │
│ 118 │ │ if attribute_type in [AttributeType.DOCUMENT, AttributeType.NESTED]: │
│ ❱ 119 │ │ │ return DocumentArray([self.chunks[position]]) │
│ 120 │ │ elif attribute_type in [ │
│ 121 │ │ │ AttributeType.ITERABLE_DOCUMENT, │
│ 122 │ │ │ AttributeType.ITERABLE_NESTED, │
│ │
│ /Users/simon/.local/share/virtualenvs/lipstick-db-nPH6fbgy/lib/python3.8/site-packages/docarray/array/mixins/getitem.py:108 in │
│ __getitem__ │
│ │
│ 105 │ │ │ │ raise IndexError( │
│ 106 │ │ │ │ │ f'When using np.ndarray as index, its `ndim` must =1. However, │
│ receiving ndim={index.ndim}' │
│ 107 │ │ │ │ ) │
│ ❱ 108 │ │ raise IndexError(f'Unsupported index type {typename(index)}: {index}') │
│ 109 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
IndexError: Unsupported index type builtins.float: 5.0
After checking the code, I found the position attribute of the multi_modal_schema is a float:
{'nickname': {'type': 'Text', 'attribute_type': 'document', 'position': 2.0}, 'product_image': {'position': 4.0, 'type': 'Image', 'attribute_type': 'document'}, 'trial_images': {'attribute_type': 'document', 'position': 5.0, 'type': 'TrialImages'}, 'brand': {'type': 'Text', 'attribute_type': 'document', 'position': 0.0}, 'meta': {'attribute_type': 'document', 'position': 3.0, 'type': 'JSON'}, 'color': {'position': 1.0, 'attribute_type': 'document', 'type': 'Text'}}
Not sure why the positions turn into floats, but I think we need to cast it to int at the time of retrieving it to be safe. Will provide a PR for a fix. Was able to verify in my repository that the fix works.
I created a
@dataclasswith a custom field for holding a collection of images with 2 embeddings each. Since it is better to wrap the image collection with a document, I have started the following modeling:While this is good, I wasn't able to get the
trial_imagesfrom theDocumentArray.Document Summary & Error Log:
After checking the code, I found the position attribute of the
multi_modal_schemais afloat:{'nickname': {'type': 'Text', 'attribute_type': 'document', 'position': 2.0}, 'product_image': {'position': 4.0, 'type': 'Image', 'attribute_type': 'document'}, 'trial_images': {'attribute_type': 'document', 'position': 5.0, 'type': 'TrialImages'}, 'brand': {'type': 'Text', 'attribute_type': 'document', 'position': 0.0}, 'meta': {'attribute_type': 'document', 'position': 3.0, 'type': 'JSON'}, 'color': {'position': 1.0, 'attribute_type': 'document', 'type': 'Text'}}Not sure why the positions turn into floats, but I think we need to cast it to
intat the time of retrieving it to be safe. Will provide a PR for a fix. Was able to verify in my repository that the fix works.