Skip to content

fix: torch store embedding#247

Merged
alaeddine-13 merged 25 commits into
mainfrom
fix-torch-store-embedding
Apr 5, 2022
Merged

fix: torch store embedding#247
alaeddine-13 merged 25 commits into
mainfrom
fix-torch-store-embedding

Conversation

@davidbp

@davidbp davidbp commented Mar 31, 2022

Copy link
Copy Markdown
Contributor

@slettner showed this snippet

from docarray import Document, DocumentArray
import torch
import numpy as np

arr = np.ones((100, 512), dtype=np.float32)
tensor = torch.from_numpy(arr)


docsSmall = DocumentArray([Document() for _ in range(100)])
docsBig = DocumentArray([Document() for _ in range(100)])

docsBig.embeddings = tensor
docsSmall.embeddings = arr

print(f'Size docA {len(docsSmall.to_bytes())}')
print(f'Size docB {len(docsBig.to_bytes())}')

That resulted with 1500x memory usage for pytorch

Size docA 3444946
Size docB 4608772715

in this PR

Size docA 3321569
Size docB 3705623

@hanxiao hanxiao left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please pull main before making pr, also, this breaks the Pythonic design of .attr =

Comment thread docarray/array/storage/base/getsetdel.py Outdated
@codecov

codecov Bot commented Mar 31, 2022

Copy link
Copy Markdown

Codecov Report

Merging #247 (a266adc) into main (1762acc) will increase coverage by 0.03%.
The diff coverage is 94.73%.

@@            Coverage Diff             @@
##             main     #247      +/-   ##
==========================================
+ Coverage   85.84%   85.87%   +0.03%     
==========================================
  Files         131      131              
  Lines        6018     6033      +15     
==========================================
+ Hits         5166     5181      +15     
  Misses        852      852              
Flag Coverage Δ
docarray 85.87% <94.73%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
docarray/array/mixins/setitem.py 81.81% <ø> (ø)
docarray/math/ndarray.py 89.65% <90.90%> (+0.60%) ⬆️
docarray/__init__.py 100.00% <100.00%> (ø)
docarray/document/__init__.py 78.57% <100.00%> (+4.28%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 527ffcc...a266adc. Read the comment docs.

@davidbp

davidbp commented Mar 31, 2022

Copy link
Copy Markdown
Contributor Author

please pull main before making pr, also, this breaks the Pythonic design of .attr =

Note that the data is not changed at all though

import torch
import numpy as np

arr = np.ones((1500, 512), dtype=np.float32)
tensor = torch.from_numpy(arr)

x = torch.tensor(tensor.detach().numpy())
torch.equal(x, tensor) # returns true

Comment thread docarray/array/storage/base/getsetdel.py Outdated
@JoanFM JoanFM changed the title Fix: torch store embedding fix: torch store embedding Apr 1, 2022
Comment thread docarray/math/ndarray.py
@alaeddine-13 alaeddine-13 merged commit 5721391 into main Apr 5, 2022
@alaeddine-13 alaeddine-13 deleted the fix-torch-store-embedding branch April 5, 2022 09:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants