Skip to content

Multi-modal Document has a float position #393

@lhr0909

Description

@lhr0909

I created a @dataclass with a custom field for holding a collection of images with 2 embeddings each. Since it is better to wrap the image collection with a document, I have started the following modeling:

from typing import TypeVar, List
from docarray import dataclass, field, Document
from docarray.typing import Image, Text, JSON

TrialImages = TypeVar('TrialImages', bound=str)

def trial_images_setter(value: List[str]) -> Document:
    # TODO: this is still not a multi-modal document, but I think it is fine for now
    doc = Document(modality='trial_images')
    doc.chunks = [Document(uri=uri, modality='image').load_uri_to_image_tensor() for uri in value]
    return doc

def trial_images_getter(doc: Document) -> List[str]:
    return [d.uri for d in doc.chunks]

@dataclass
class Lipstick:
    brand: Text
    color: Text
    nickname: Text
    meta: JSON
    product_image: Image
    # nested document for embeddings of skin and lip colors from all the trials
    trial_images: TrialImages = field(setter=trial_images_setter, getter=trial_images_getter, default_factory=[])

While this is good, I wasn't able to get the trial_images from the DocumentArray.

Document Summary & Error Log:

📄 Document: 6ec9e9fe6ea097253cedcb8b938a303f
└── 💠 Chunks
    ├── 📄 Document: 4ef01f5798e058fa9b4df48791414c20
    │   ╭──────────────────────┬───────────────────────────────────────────────────────╮
    │   │ Attribute            │ Value                                                 │
    │   ├──────────────────────┼───────────────────────────────────────────────────────┤
    │   │ parent_id            │ 6ec9e9fe6ea097253cedcb8b938a303f                      │
    │   │ granularity          │ 1                                                     │
    │   │ text                 │ MAC                                                   │
    │   │ modality             │ text                                                  │
    │   ╰──────────────────────┴───────────────────────────────────────────────────────╯
    ├── 📄 Document: 112c0f846fc7d2133ca92e21e675ae6b
    │   ╭──────────────────────┬───────────────────────────────────────────────────────╮
    │   │ Attribute            │ Value                                                 │
    │   ├──────────────────────┼───────────────────────────────────────────────────────┤
    │   │ parent_id            │ 6ec9e9fe6ea097253cedcb8b938a303f                      │
    │   │ granularity          │ 1                                                     │
    │   │ text                 │ Marrakesh                                             │
    │   │ modality             │ text                                                  │
    │   ╰──────────────────────┴───────────────────────────────────────────────────────╯
    ├── 📄 Document: 7d35324e75a8a4d0af507b90d733584d
    │   ╭──────────────────────┬───────────────────────────────────────────────────────╮
    │   │ Attribute            │ Value                                                 │
    │   ├──────────────────────┼───────────────────────────────────────────────────────┤
    │   │ parent_id            │ 6ec9e9fe6ea097253cedcb8b938a303f                      │
    │   │ granularity          │ 1                                                     │
    │   │ text                 │ 麻辣鸡丝                                              │
    │   │ modality             │ text                                                  │
    │   ╰──────────────────────┴───────────────────────────────────────────────────────╯
    ├── 📄 Document: f2105af51c6b56f7b1a8bd2af94e28fe
    │   ╭──────────────────────┬───────────────────────────────────────────────────────╮
    │   │ Attribute            │ Value                                                 │
    │   ├──────────────────────┼───────────────────────────────────────────────────────┤
    │   │ parent_id            │ 6ec9e9fe6ea097253cedcb8b938a303f                      │
    │   │ granularity          │ 1                                                     │
    │   │ tags                 │ {'type': ['口红', '哑光']}                            │
    │   │ modality             │ json                                                  │
    │   ╰──────────────────────┴───────────────────────────────────────────────────────╯
    ├── 📄 Document: 42b6effc1ed1fe7c75c907b212d0e63d
    │   ╭─────────────┬────────────────────────────────────────────────────────────────╮
    │   │ Attribute   │ Value                                                          │
    │   ├─────────────┼────────────────────────────────────────────────────────────────┤
    │   │ parent_id   │ 6ec9e9fe6ea097253cedcb8b938a303f                               │
    │   │ granularity │ 1                                                              │
    │   │ tensor      │ <class 'numpy.ndarray'> in shape (600, 640, 3), dtype: uint8   │
    │   │ uri         │ https://s1.vika.cn/space/2022/06/08/3709d7793d964e5393bcabfc9… │
    │   │ modality    │ image                                                          │
    │   ╰─────────────┴────────────────────────────────────────────────────────────────╯
    └── 📄 Document: 80dd7ee3fd6a6b9c258baf714376b937
        ╭──────────────────────┬───────────────────────────────────────────────────────╮
        │ Attribute            │ Value                                                 │
        ├──────────────────────┼───────────────────────────────────────────────────────┤
        │ parent_id            │ 6ec9e9fe6ea097253cedcb8b938a303f                      │
        │ granularity          │ 1                                                     │
        │ modality             │ trial_images                                          │
        ╰──────────────────────┴───────────────────────────────────────────────────────╯
        └── 💠 Chunks
            ├── 📄 Document: e2ff2813f6e7e5eebefdb95b241db382
            │   ╭─────────────┬────────────────────────────────────────────────────────────────╮
            │   │ Attribute   │ Value                                                          │
            │   ├─────────────┼────────────────────────────────────────────────────────────────┤
            │   │ parent_id   │ 80dd7ee3fd6a6b9c258baf714376b937                               │
            │   │ granularity │ 1                                                              │
            │   │ tensor      │ <class 'numpy.ndarray'> in shape (283, 175, 3), dtype: uint8   │
            │   │ uri         │ https://s1.vika.cn/space/2022/06/08/7f3680ba48114ebeba83b2938… │
            │   │ modality    │ image                                                          │
            │   ╰─────────────┴────────────────────────────────────────────────────────────────╯
            ├── 📄 Document: 27e552e852f86eb8d349a213c1d87755
            │   ╭─────────────┬────────────────────────────────────────────────────────────────╮
            │   │ Attribute   │ Value                                                          │
            │   ├─────────────┼────────────────────────────────────────────────────────────────┤
            │   │ parent_id   │ 80dd7ee3fd6a6b9c258baf714376b937                               │
            │   │ granularity │ 1                                                              │
            │   │ tensor      │ <class 'numpy.ndarray'> in shape (260, 146, 3), dtype: uint8   │
            │   │ uri         │ https://s1.vika.cn/space/2022/06/08/5577cf7ebfa6459eafa0c2cd9… │
            │   │ modality    │ image                                                          │
            │   ╰─────────────┴────────────────────────────────────────────────────────────────╯
            ├── 📄 Document: 88294367d4ed0dd3019909e9a55350dd
            │   ╭─────────────┬────────────────────────────────────────────────────────────────╮
            │   │ Attribute   │ Value                                                          │
            │   ├─────────────┼────────────────────────────────────────────────────────────────┤
            │   │ parent_id   │ 80dd7ee3fd6a6b9c258baf714376b937                               │
            │   │ granularity │ 1                                                              │
            │   │ tensor      │ <class 'numpy.ndarray'> in shape (606, 1075, 3), dtype: uint8  │
            │   │ uri         │ https://s1.vika.cn/space/2022/06/08/104b47a6afec4ef68472747ef… │
            │   │ modality    │ image                                                          │
            │   ╰─────────────┴────────────────────────────────────────────────────────────────╯
            └── 📄 Document: c721a22350a76e73952ccba0a5cf9e82
                ╭─────────────┬────────────────────────────────────────────────────────────────╮
                │ Attribute   │ Value                                                          │
                ├─────────────┼────────────────────────────────────────────────────────────────┤
                │ parent_id   │ 80dd7ee3fd6a6b9c258baf714376b937                               │
                │ granularity │ 1                                                              │
                │ tensor      │ <class 'numpy.ndarray'> in shape (602, 1076, 3), dtype: uint8  │
                │ uri         │ https://s1.vika.cn/space/2022/06/08/b16b5bf313294cbeb1434bc5d… │
                │ modality    │ image                                                          │
                ╰─────────────┴────────────────────────────────────────────────────────────────╯
╭───────────────────── Documents Summary ─────────────────────╮
│                                                             │
│   Length                    1                               │
│   Homogenous Documents      True                            │
│   Has nested Documents in   ('chunks',)                     │
│   Common Attributes         ('id', 'embedding', 'chunks')   │
│   Multimodal dataclass      True                            │
│                                                             │
╰─────────────────────────────────────────────────────────────╯
╭──────────────────────── Attributes Summary ────────────────────────╮
│                                                                    │
│   Attribute   Data type         #Unique values   Has empty value   │
│  ────────────────────────────────────────────────────────────────  │
│   chunks      ('ChunkArray',)   1                False             │
│   embedding   ('ndarray',)      1                False             │
│   id          ('str',)          1                False             │
│                                                                    │
╰────────────────────────────────────────────────────────────────────╯
╭─ DocumentArrayAnnlite Config ─╮
│                               │
│   n_dim              50       │
│   metric             cosine   │
│   serialize_config   {}       │
│   data_path          ./data   │
│   ef_construction    None     │
│   ef_search          None     │
│   max_connection     None     │
│   columns            []       │
│                               │
╰───────────────────────────────╯
⠹ Working... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0:00:00   0% ETA: -:--:--  ERROR  lipstickTrialImageExecutor/rep-0@89864 IndexError('Unsupported index type builtins.float: 5.0')                                              [06/09/22 12:10:58]
        add "--quiet-error" to suppress the exception details                                                                                                          
       ╭─────────────────────────────────────────────────── Traceback (most recent call last) ────────────────────────────────────────────────────╮                    
       │ /Users/simon/.local/share/virtualenvs/lipstick-db-nPH6fbgy/lib/python3.8/site-packages/jina/serve/runtimes/worker/__init__.py:162 in     │                    
       │ process_data                                                                                                                             │                    
       │                                                                                                                                          │                    
       │   159 │   │   │   │   if self.logger.debug_enabled:                                                                                      │                    
       │   160 │   │   │   │   │   self._log_data_request(requests[0])                                                                            │                    
       │   161 │   │   │   │                                                                                                                      │                    
       │ ❱ 162 │   │   │   │   return await self._data_request_handler.handle(requests=requests)                                                  │                    
       │   163 │   │   │   except (RuntimeError, Exception) as ex:                                                                                │                    
       │   164 │   │   │   │   self.logger.error(                                                                                                 │                    
       │   165 │   │   │   │   │   f'{ex!r}'                                                                                                      │                    
       │                                                                                                                                          │                    
       │ /Users/simon/.local/share/virtualenvs/lipstick-db-nPH6fbgy/lib/python3.8/site-packages/jina/serve/runtimes/request_handlers/data_reques… │                    
       │ in handle                                                                                                                                │                    
       │                                                                                                                                          │                    
       │   147 │   │   )                                                                                                                          │                    
       │   148 │   │                                                                                                                              │                    
       │   149 │   │   # executor logic                                                                                                           │                    
       │ ❱ 150 │   │   return_data = await self._executor.__acall__(                                                                              │                    
       │   151 │   │   │   req_endpoint=requests[0].header.exec_endpoint,                                                                         │                    
       │   152 │   │   │   docs=docs,                                                                                                             │                    
       │   153 │   │   │   parameters=params,                                                                                                     │                    
       │                                                                                                                                          │                    
       │ /Users/simon/.local/share/virtualenvs/lipstick-db-nPH6fbgy/lib/python3.8/site-packages/jina/serve/executors/__init__.py:272 in __acall__ │                    
       │                                                                                                                                          │                    
       │   269 │   │   # noqa: DAR201                                                                                                             │                    
       │   270 │   │   """                                                                                                                        │                    
       │   271 │   │   if req_endpoint in self.requests:                                                                                          │                    
       │ ❱ 272 │   │   │   return await self.__acall_endpoint__(req_endpoint, **kwargs)                                                           │                    
       │   273 │   │   elif __default_endpoint__ in self.requests:                                                                                │                    
       │   274 │   │   │   return await self.__acall_endpoint__(__default_endpoint__, **kwargs)                                                   │                    
       │   275                                                                                                                                    │                    
       │                                                                                                                                          │                    
       │ /Users/simon/.local/share/virtualenvs/lipstick-db-nPH6fbgy/lib/python3.8/site-packages/jina/serve/executors/__init__.py:295 in           │                    
       │ __acall_endpoint__                                                                                                                       │                    
       │                                                                                                                                          │                    
       │   292 │   │   │   if iscoroutinefunction(func):                                                                                          │                    
       │   293 │   │   │   │   return await func(self, **kwargs)                                                                                  │                    
       │   294 │   │   │   else:                                                                                                                  │                    
       │ ❱ 295 │   │   │   │   return func(self, **kwargs)                                                                                        │                    
       │   296 │                                                                                                                                  │                    
       │   297 │   @property                                                                                                                      │                    
       │   298 │   def workspace(self) -> Optional[str]:                                                                                          │                    
       │                                                                                                                                          │                    
       │ /Users/simon/.local/share/virtualenvs/lipstick-db-nPH6fbgy/lib/python3.8/site-packages/jina/serve/executors/decorators.py:180 in         │                    
       │ arg_wrapper                                                                                                                              │                    
       │                                                                                                                                          │                    
       │   177 │   │   │   │   def arg_wrapper(                                                                                                   │                    
       │   178 │   │   │   │   │   executor_instance, *args, **kwargs                                                                             │                    
       │   179 │   │   │   │   ):  # we need to get the summary from the executor, so we need to access                                           │                    
       │       the self                                                                                                                           │                    
       │ ❱ 180 │   │   │   │   │   return fn(executor_instance, *args, **kwargs)                                                                  │                    
       │   181 │   │   │   │                                                                                                                      │                    
       │   182 │   │   │   │   self.fn = arg_wrapper                                                                                              │                    
       │   183                                                                                                                                    │                    
       │                                                                                                                                          │                    
       │ /Users/simon/Documents/git-repo/cool/lipstick-db/executor.py:46 in index                                                                 │                    
       │                                                                                                                                          │                    
       │    43 class LipstickTrialImageExecutor(Executor):                                                                                        │                    
       │    44 │   @requests(on='/index')                                                                                                         │                    
       │    45 │   def index(self, docs: DocumentArray, **kwargs):                                                                                │                    
       │ ❱  46 │   │   trial_images = docs['@.[trial_images]c']                                                                                   │                    
       │    47 │   │   return trial_images                                                                                                        │                    
       │    48 │                                                                                                                                  │                    
       │    49 │   def _get_face_mesh(self, img_rgb):                                                                                             │                    
       │                                                                                                                                          │                    
       │ /Users/simon/.local/share/virtualenvs/lipstick-db-nPH6fbgy/lib/python3.8/site-packages/docarray/array/mixins/getitem.py:55 in            │                    
       │ __getitem__                                                                                                                              │                    
       │                                                                                                                                          │                    
       │    52 │   │   │   return self._get_doc_by_offset(int(index))                                                                             │                    
       │    53 │   │   elif isinstance(index, str):                                                                                               │                    
       │    54 │   │   │   if index.startswith('@'):                                                                                              │                    
       │ ❱  55 │   │   │   │   return self.traverse_flat(index[1:])                                                                               │                    
       │    56 │   │   │   else:                                                                                                                  │                    
       │    57 │   │   │   │   return self._get_doc_by_id(index)                                                                                  │                    
       │    58 │   │   elif isinstance(index, slice):                                                                                             │                    
       │                                                                                                                                          │                    
       │ /Users/simon/.local/share/virtualenvs/lipstick-db-nPH6fbgy/lib/python3.8/site-packages/docarray/array/mixins/traverse.py:195 in          │                    
       │ traverse_flat                                                                                                                            │                    
       │                                                                                                                                          │                    
       │   192 │   │   │   return self                                                                                                            │                    
       │   193 │   │                                                                                                                              │                    
       │   194 │   │   leaves = self.traverse(traversal_paths, filter_fn=filter_fn)                                                               │                    
       │ ❱ 195 │   │   return self._flatten(leaves)                                                                                               │                    
       │   196 │                                                                                                                                  │                    
       │   197 │   def flatten(self) -> 'DocumentArray':                                                                                          │                    
       │   198 │   │   """Flatten all nested chunks and matches into one :class:`DocumentArray`.                                                  │                    
       │                                                                                                                                          │                    
       │ /Users/simon/.local/share/virtualenvs/lipstick-db-nPH6fbgy/lib/python3.8/site-packages/docarray/array/mixins/traverse.py:234 in _flatten │                    
       │                                                                                                                                          │                    
       │   231 │   def _flatten(sequence) -> 'DocumentArray':                                                                                     │                    
       │   232 │   │   from ... import DocumentArray                                                                                              │                    
       │   233 │   │                                                                                                                              │                    
       │ ❱ 234 │   │   return DocumentArray(list(itertools.chain.from_iterable(sequence)))                                                        │                    
       │   235                                                                                                                                    │                    
       │   236                                                                                                                                    │                    
       │   237 def _parse_path_string(p: str) -> Dict[str, str]:                                                                                  │                    
       │                                                                                                                                          │                    
       │ /Users/simon/.local/share/virtualenvs/lipstick-db-nPH6fbgy/lib/python3.8/site-packages/docarray/array/mixins/traverse.py:108 in traverse │                    
       │                                                                                                                                          │                    
       │   105 │   │   """                                                                                                                        │                    
       │   106 │   │   traversal_paths = re.sub(r'\s+', '', traversal_paths)                                                                      │                    
       │   107 │   │   for p in _re_traversal_path_split(traversal_paths):                                                                        │                    
       │ ❱ 108 │   │   │   yield from self._traverse(self, p, filter_fn=filter_fn)                                                                │                    
       │   109 │                                                                                                                                  │                    
       │   110 │   @staticmethod                                                                                                                  │                    
       │   111 │   def _traverse(                                                                                                                 │                    
       │                                                                                                                                          │                    
       │ /Users/simon/.local/share/virtualenvs/lipstick-db-nPH6fbgy/lib/python3.8/site-packages/docarray/array/mixins/traverse.py:141 in          │                    
       │ _traverse                                                                                                                                │                    
       │                                                                                                                                          │                    
       │   138 │   │   │   │   for d in docs:                                                                                                     │                    
       │   139 │   │   │   │   │   for attribute in group_dict['attributes']:                                                                     │                    
       │   140 │   │   │   │   │   │   yield from TraverseMixin._traverse(                                                                        │                    
       │ ❱ 141 │   │   │   │   │   │   │   d.get_multi_modal_attribute(attribute)[cur_slice],                                                     │                    
       │   142 │   │   │   │   │   │   │   remainder,                                                                                             │                    
       │   143 │   │   │   │   │   │   │   filter_fn=filter_fn,                                                                                   │                    
       │   144 │   │   │   │   │   │   )                                                                                                          │                    
       │                                                                                                                                          │                    
       │ /Users/simon/.local/share/virtualenvs/lipstick-db-nPH6fbgy/lib/python3.8/site-packages/docarray/document/mixins/multimodal.py:119 in     │                    
       │ get_multi_modal_attribute                                                                                                                │                    
       │                                                                                                                                          │                    
       │   116 │   │   position = self._metadata['multi_modal_schema'][attribute].get('position')                                                 │                    
       │   117 │   │                                                                                                                              │                    
       │   118 │   │   if attribute_type in [AttributeType.DOCUMENT, AttributeType.NESTED]:                                                       │                    
       │ ❱ 119 │   │   │   return DocumentArray([self.chunks[position]])                                                                          │                    
       │   120 │   │   elif attribute_type in [                                                                                                   │                    
       │   121 │   │   │   AttributeType.ITERABLE_DOCUMENT,                                                                                       │                    
       │   122 │   │   │   AttributeType.ITERABLE_NESTED,                                                                                         │                    
       │                                                                                                                                          │                    
       │ /Users/simon/.local/share/virtualenvs/lipstick-db-nPH6fbgy/lib/python3.8/site-packages/docarray/array/mixins/getitem.py:108 in           │                    
       │ __getitem__                                                                                                                              │                    
       │                                                                                                                                          │                    
       │   105 │   │   │   │   raise IndexError(                                                                                                  │                    
       │   106 │   │   │   │   │   f'When using np.ndarray as index, its `ndim` must =1. However,                                                 │                    
       │       receiving ndim={index.ndim}'                                                                                                       │                    
       │   107 │   │   │   │   )                                                                                                                  │                    
       │ ❱ 108 │   │   raise IndexError(f'Unsupported index type {typename(index)}: {index}')                                                     │                    
       │   109                                                                                                                                    │                    
       ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯                    
       IndexError: Unsupported index type builtins.float: 5.0

After checking the code, I found the position attribute of the multi_modal_schema is a float:

{'nickname': {'type': 'Text', 'attribute_type': 'document', 'position': 2.0}, 'product_image': {'position': 4.0, 'type': 'Image', 'attribute_type': 'document'}, 'trial_images': {'attribute_type': 'document', 'position': 5.0, 'type': 'TrialImages'}, 'brand': {'type': 'Text', 'attribute_type': 'document', 'position': 0.0}, 'meta': {'attribute_type': 'document', 'position': 3.0, 'type': 'JSON'}, 'color': {'position': 1.0, 'attribute_type': 'document', 'type': 'Text'}}

Not sure why the positions turn into floats, but I think we need to cast it to int at the time of retrieving it to be safe. Will provide a PR for a fix. Was able to verify in my repository that the fix works.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions