Skip to content

feat: multi modal document#188

Merged
hanxiao merged 66 commits into
mainfrom
feat-multi-modal
Mar 23, 2022
Merged

feat: multi modal document#188
hanxiao merged 66 commits into
mainfrom
feat-multi-modal

Conversation

@alaeddine-13

@alaeddine-13 alaeddine-13 commented Mar 10, 2022

Copy link
Copy Markdown
Member

feat: multi modal document

  • add type annotations
  • add from_dataclass method
  • store schema in root document meta tags (possible also move other attributes to meta tags)
  • build a sub attribute accessor using schema stored in root document
  • extend traversal path syntax
  • protobuf serialization/deserialization
  • modifications to traversal path grammar:
  • [] for attributes WARNING: this entails that the grammer becomes ambiguous
  • [] optional for slice
  • tolerate white space
  • accept offset index not just slice
  • test protobuf serialization/deserialization
  • extend dataclass decorator + add method from_document to decorated classes [optional]
  • test coverage

Usage:
create schema:

from docarray.types import dataclass, Text, Image, Audio
@dataclass
class MMDocument:
    title: Text
    image: Image
    version: int

create mm_doc:

obj = MMDocument(title='hello world', image='img.png'), version=20)

translate to Document:

doc = Document.from_dataclass(obj)

traverse a DocumentArray of mm_docs:

da = DocumentArray([obj, ...])
title_docs = da['@r.[title]']
image_docs = da['@r.[image]']
image_title_docs = da['@r.[image,title]']

traverse with slice:

da = DocumentArray([obj, ...])
first_3_title_docs = da['@r[:3].[title]']
first_image_docs = da['@r[1].[image]']
last_image_title_docs = da['@r[-1].[image,title]']

Note: the syntax of traversal paths is extended, there should be no breaking changes (e.g, old r1:5c:3 syntax is still accepted)

Comment thread docarray/document/mixins/multimodal.py
@codecov

codecov Bot commented Mar 10, 2022

Copy link
Copy Markdown

Codecov Report

Merging #188 (70e565d) into main (60d1665) will increase coverage by 0.39%.
The diff coverage is 90.84%.

@@            Coverage Diff             @@
##             main     #188      +/-   ##
==========================================
+ Coverage   84.91%   85.30%   +0.39%     
==========================================
  Files         119      123       +4     
  Lines        5383     5677     +294     
==========================================
+ Hits         4571     4843     +272     
- Misses        812      834      +22     
Flag Coverage Δ
docarray 85.30% <90.84%> (+0.39%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
docarray/types/__init__.py 10.52% <0.00%> (ø)
docarray/types/deserializers.py 73.91% <73.91%> (ø)
docarray/document/mixins/_property.py 85.80% <85.71%> (+2.02%) ⬆️
docarray/types/multimodal.py 88.54% <88.54%> (ø)
docarray/document/mixins/multimodal.py 91.13% <91.13%> (ø)
docarray/types/serializers.py 97.05% <97.05%> (ø)
docarray/array/mixins/traverse.py 95.86% <98.30%> (+1.42%) ⬆️
docarray/document/data.py 91.48% <100.00%> (+0.09%) ⬆️
docarray/document/mixins/__init__.py 100.00% <100.00%> (ø)
docarray/document/pydantic_model.py 95.83% <100.00%> (+0.08%) ⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 60d1665...70e565d. Read the comment docs.

@winstonww winstonww left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder what's the best way to handle optional parameters? For instance, for img: Optional[ImageDocument], should we append an empty chunk document or not in this case?

Comment thread docarray/document/mixins/multimodal.py
Comment thread docarray/types.py Outdated
@github-actions github-actions Bot added size/l and removed size/xl labels Mar 22, 2022

@alaeddine-13 alaeddine-13 left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add test about support of default values in dataclass and see if it works

Comment thread docarray/array/mixins/traverse.py Outdated
Comment thread docarray/document/mixins/multimodal.py Outdated
Comment thread docarray/document/mixins/multimodal.py
}

elif isinstance(field.type, typing._GenericAlias):
if field.type._name in ['List', 'Iterable']:

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it better to stick to the type that the user explicitly provided.
If we rely on the dynamic type, we will introduce an implicit behaviour to the user

Comment thread docarray/types/multimodal.py
Comment thread docarray/document/mixins/multimodal.py
@github-actions github-actions Bot added size/xl and removed size/l labels Mar 22, 2022
@github-actions

Copy link
Copy Markdown

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions

Copy link
Copy Markdown

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions

Copy link
Copy Markdown

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions

Copy link
Copy Markdown

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions

Copy link
Copy Markdown

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@hanxiao hanxiao merged commit 5a8544a into main Mar 23, 2022
@hanxiao hanxiao deleted the feat-multi-modal branch March 23, 2022 14:06
@hanxiao hanxiao mentioned this pull request Mar 28, 2022
21 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants