Skip to content

feat: add support for labeled dataset to evaluate function#617

Merged
JoanFM merged 14 commits into
mainfrom
feat-support-labels-in-evaluate
Oct 18, 2022
Merged

feat: add support for labeled dataset to evaluate function#617
JoanFM merged 14 commits into
mainfrom
feat-support-labels-in-evaluate

Conversation

@guenthermi

@guenthermi guenthermi commented Oct 12, 2022

Copy link
Copy Markdown
Contributor

Goals:

  • Allow the user to add labels to the documents instead of passing an addtional ground truth DocumentArray

  • check and update documentation, if required. See guide

@guenthermi

Copy link
Copy Markdown
Contributor Author

The Wikipedia article linked in the r_precision docstring does not seems to be correct. According to the article the metric is calculated based on the Top-N results, where N is determined by the number of relevant documents in the dataset. However, the implementation sets N as the last position of a relevant document as done here

@codecov

codecov Bot commented Oct 12, 2022

Copy link
Copy Markdown

Codecov Report

Merging #617 (5dbf1a5) into main (125eb3a) will decrease coverage by 0.02%.
The diff coverage is 78.78%.

@@            Coverage Diff             @@
##             main     #617      +/-   ##
==========================================
- Coverage   85.02%   84.99%   -0.03%     
==========================================
  Files         133      133              
  Lines        6718     6745      +27     
==========================================
+ Hits         5712     5733      +21     
- Misses       1006     1012       +6     
Flag Coverage Δ
docarray 84.99% <78.78%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
docarray/array/mixins/evaluation.py 84.12% <78.78%> (-4.77%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@guenthermi guenthermi marked this pull request as ready for review October 13, 2022 08:50
self,
other: 'DocumentArray',
metric: Union[str, Callable[..., float]],
ground_truth: Optional['DocumentArray'] = None,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep other to avoid breaking change

@guenthermi guenthermi Oct 13, 2022

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a breaking change anyway, because we need to turn it from a mandatory into an optional attribute, but I can change it back to other if you think it makes a difference. I just find this name confusing, especially as an keyword argument.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not breaking, passing from mandatory to Optional will not break other's code, it just adds more options

self,
other: 'DocumentArray',
metric: Union[str, Callable[..., float]],
ground_truth: Optional['DocumentArray'] = None,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please avoid breaking the interface, ground_truth can be added as an alias, u can have a deprecation decorator that has other equivalent to groundtruth

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is, that the function has atm two mandatory attributes other and metric (in this order). It does not make sense to keep other mandatory if the DocumentArray and its matches itself have labels. So I have to make it optional. If I do this I can not preserve the order since metric as a mandatory attribute has to be placed before other.

@guenthermi guenthermi Oct 13, 2022

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way how it might work is to make metric optional as well by setting a default metric. However, then this syntax will not work:

da.evaluate('precision_at_k')

you always have to do:

da.evaluate(metric='precision_at_k')

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other and ground_truth need to be equivalent, this 100% sure

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be achieved by a decorator

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why metric is precision_at_k? shouldn't be precision, k=10?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bwanglzu , the function in docarray.math is called precision_at_k, setting a k is possible but not mandatory since the matches usually contain only the Top-K

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for documentation: we agreed on checking the type of the first argument. If it is a DocumentArray we set ground_truth to metric and raise a deprecation warning. We also handle cases where someone provides a other keyword argument and raise a deprecation warning.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i remembered

Comment thread docarray/array/mixins/evaluation.py Outdated
on comparing the `matches` of `documents` inside the `DocumentArray.
This implementation expects the documents and their matches to have labels
annotated inside the tag with the key specified in the `label_tag` attribute.
Alternatively, one can provide a `ground_truth` DocumentArray that is

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This in my opinion should be the principal case, and the other the alternative

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, yes I can change this in the docstring

Comment thread docarray/array/mixins/evaluation.py Outdated
hash_fn: Optional[Callable[['Document'], str]] = None,
metric_name: Optional[str] = None,
strict: bool = True,
label_tag='label',

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

label_tag should be an Optional, plus please provide type hints

Comment thread docarray/array/mixins/evaluation.py Outdated
elif label_tag in self[0].tags:
if ground_truth:
warnings.warn(
'An ground_truth attribute is provided but does not '

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'An ground_truth attribute is provided but does not '
'A ground_truth attribute is provided but does not '

@bwanglzu bwanglzu left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some comments

Comment thread docarray/array/mixins/evaluation.py Outdated
hash_fn: Optional[Callable[['Document'], str]] = None,
metric_name: Optional[str] = None,
strict: bool = True,
label_tag='label',

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:str

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and optional is also missing.

results = []
caller_max_rel = kwargs.pop('max_rel', None)
for d, gd in zip(self, other):
for d, gd in zip(self, ground_truth):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we check the length should be identical?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done if strict=True

self,
other: 'DocumentArray',
metric: Union[str, Callable[..., float]],
ground_truth: Optional['DocumentArray'] = None,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why metric is precision_at_k? shouldn't be precision, k=10?

Comment thread docarray/array/mixins/evaluation.py Outdated
1 if m.tags[label_tag] == d.tags[label_tag] else 0 for m in targets
]
else:
raise RuntimeError(f'Unsupported groundtruth type {ground_truth_type}')

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this error message will be very confusing, user do not know what is a ground_truth_type, it's not a paramter passed from user ,but something you interpreted from self or ground_truth.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be unreachable code, but probably it is better to just remove it

@samsja samsja Oct 14, 2022

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say better to keep it but just have a better error message that does not talk about an internal object that the user is not aware of. Smth like

Suggested change
raise RuntimeError(f'Unsupported groundtruth type {ground_truth_type}')
raise RuntimeError(f'Something went wrong with the groudtruth')

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Describe what can be wrong, without talking about the internal name, but something the user can relate to

@guenthermi guenthermi force-pushed the feat-support-labels-in-evaluate branch from d12cb03 to 15ea32d Compare October 17, 2022 12:42
@guenthermi guenthermi requested a review from bwanglzu October 17, 2022 12:50
Comment on lines +79 to +80
for d in da1_index:
d.tags = {'label': 'A'}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this still needed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you are right, I think I can remove it.

assert isinstance(r, float)
assert r == 0.0
for d in da1:
d: Document

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I don't know, I will remove it

@bwanglzu bwanglzu left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments, make sure da team agree ground_truth as a breaking change

raise ValueError('It is not possible to evaluate an empty DocumentArray')
if ground_truth and len(ground_truth) > 0 and ground_truth[0].matches:
ground_truth_type = 'matches'
elif label_tag in self[0].tags:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens here if label_tag is None? is label_tag really Optional?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah sorry, yes it should not be optional

@guenthermi guenthermi requested a review from JoanFM October 18, 2022 07:32
metric_name: Optional[str] = None,
strict: bool = True,
label_tag: Optional[str] = 'label',
label_tag: str = 'label',

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather check what happens with None. One should be able to put None

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so do you mean, that it should be optional and default to label. If a user does not want to use labels, i.e., pass a ground_truth it should be ok to pass None. However, if it is set to None and no ground_truth is passed, an exception is raised.

Comment on lines +393 to +394
for d in da1_index:
d.tags = {'label': 'A'}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor issue

@bwanglzu bwanglzu left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@github-actions

Copy link
Copy Markdown

📝 Docs are deployed on https://ft-feat-support-labels-in-evaluate--jina-docs.netlify.app 🎉

@JoanFM JoanFM merged commit ea1ccf2 into main Oct 18, 2022
@JoanFM JoanFM deleted the feat-support-labels-in-evaluate branch October 18, 2022 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add support for labeled datasets in evaluate function

5 participants