feat: add support for labeled dataset to evaluate function by guenthermi · Pull Request #617 · docarray/docarray

guenthermi · 2022-10-12T15:16:01Z

Goals:

Allow the user to add labels to the documents instead of passing an addtional ground truth DocumentArray
check and update documentation, if required. See guide

guenthermi · 2022-10-12T15:25:15Z

The Wikipedia article linked in the r_precision docstring does not seems to be correct. According to the article the metric is calculated based on the Top-N results, where N is determined by the number of relevant documents in the dataset. However, the implementation sets N as the last position of a relevant document as done here

codecov · 2022-10-12T15:31:43Z

Codecov Report

Merging #617 (5dbf1a5) into main (125eb3a) will decrease coverage by 0.02%.
The diff coverage is 78.78%.

@@            Coverage Diff             @@
##             main     #617      +/-   ##
==========================================
- Coverage   85.02%   84.99%   -0.03%     
==========================================
  Files         133      133              
  Lines        6718     6745      +27     
==========================================
+ Hits         5712     5733      +21     
- Misses       1006     1012       +6

Flag	Coverage Δ
docarray	`84.99% <78.78%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
docarray/array/mixins/evaluation.py	`84.12% <78.78%> (-4.77%)`	⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

JoanFM · 2022-10-13T11:17:07Z

        self,
-        other: 'DocumentArray',
        metric: Union[str, Callable[..., float]],
+        ground_truth: Optional['DocumentArray'] = None,


keep other to avoid breaking change

It is a breaking change anyway, because we need to turn it from a mandatory into an optional attribute, but I can change it back to other if you think it makes a difference. I just find this name confusing, especially as an keyword argument.

it is not breaking, passing from mandatory to Optional will not break other's code, it just adds more options

JoanFM · 2022-10-13T14:43:15Z

        self,
-        other: 'DocumentArray',
        metric: Union[str, Callable[..., float]],
+        ground_truth: Optional['DocumentArray'] = None,


please avoid breaking the interface, ground_truth can be added as an alias, u can have a deprecation decorator that has other equivalent to groundtruth

The problem is, that the function has atm two mandatory attributes other and metric (in this order). It does not make sense to keep other mandatory if the DocumentArray and its matches itself have labels. So I have to make it optional. If I do this I can not preserve the order since metric as a mandatory attribute has to be placed before other.

The only way how it might work is to make metric optional as well by setting a default metric. However, then this syntax will not work:

da.evaluate('precision_at_k')

you always have to do:

da.evaluate(metric='precision_at_k')

other and ground_truth need to be equivalent, this 100% sure

this can be achieved by a decorator

why metric is precision_at_k? shouldn't be precision, k=10?

@bwanglzu , the function in docarray.math is called precision_at_k, setting a k is possible but not mandatory since the matches usually contain only the Top-K

Just for documentation: we agreed on checking the type of the first argument. If it is a DocumentArray we set ground_truth to metric and raise a deprecation warning. We also handle cases where someone provides a other keyword argument and raise a deprecation warning.

oh i remembered

JoanFM · 2022-10-13T14:43:52Z

-        on comparing the `matches` of `documents` inside the `DocumentArray.
+        This implementation expects the documents and their matches to have labels
+        annotated inside the tag with the key specified in the `label_tag` attribute.
+        Alternatively, one can provide a `ground_truth` DocumentArray that is


This in my opinion should be the principal case, and the other the alternative

ok, yes I can change this in the docstring

JoanFM · 2022-10-13T14:44:10Z

        hash_fn: Optional[Callable[['Document'], str]] = None,
        metric_name: Optional[str] = None,
        strict: bool = True,
+        label_tag='label',


label_tag should be an Optional, plus please provide type hints

JoanFM · 2022-10-13T14:44:47Z

+        elif label_tag in self[0].tags:
+            if ground_truth:
+                warnings.warn(
+                    'An ground_truth attribute is provided but does not '


Suggested change

'An ground_truth attribute is provided but does not '

'A ground_truth attribute is provided but does not '

bwanglzu

left some comments

bwanglzu · 2022-10-13T11:12:36Z

        hash_fn: Optional[Callable[['Document'], str]] = None,
        metric_name: Optional[str] = None,
        strict: bool = True,
+        label_tag='label',


Yes and optional is also missing.

bwanglzu · 2022-10-13T15:57:41Z

        results = []
        caller_max_rel = kwargs.pop('max_rel', None)
-        for d, gd in zip(self, other):
+        for d, gd in zip(self, ground_truth):


do we check the length should be identical?

This is done if strict=True

bwanglzu · 2022-10-13T15:59:00Z

        self,
-        other: 'DocumentArray',
        metric: Union[str, Callable[..., float]],
+        ground_truth: Optional['DocumentArray'] = None,


why metric is precision_at_k? shouldn't be precision, k=10?

bwanglzu · 2022-10-13T16:05:37Z

+                    1 if m.tags[label_tag] == d.tags[label_tag] else 0 for m in targets
+                ]
+            else:
+                raise RuntimeError(f'Unsupported groundtruth type {ground_truth_type}')


this error message will be very confusing, user do not know what is a ground_truth_type, it's not a paramter passed from user ,but something you interpreted from self or ground_truth.

It should be unreachable code, but probably it is better to just remove it

I would say better to keep it but just have a better error message that does not talk about an internal object that the user is not aware of. Smth like

Suggested change

raise RuntimeError(f'Unsupported groundtruth type {ground_truth_type}')

raise RuntimeError(f'Something went wrong with the groudtruth')

Describe what can be wrong, without talking about the internal name, but something the user can relate to

bwanglzu · 2022-10-17T15:45:25Z

+    for d in da1_index:
+        d.tags = {'label': 'A'}


is this still needed?

No, you are right, I think I can remove it.

bwanglzu · 2022-10-17T15:47:07Z

+    assert isinstance(r, float)
+    assert r == 0.0
+    for d in da1:
+        d: Document


what is this?

oh I don't know, I will remove it

bwanglzu

minor comments, make sure da team agree ground_truth as a breaking change

JoanFM · 2022-10-18T07:06:27Z

+            raise ValueError('It is not possible to evaluate an empty DocumentArray')
+        if ground_truth and len(ground_truth) > 0 and ground_truth[0].matches:
+            ground_truth_type = 'matches'
+        elif label_tag in self[0].tags:


what happens here if label_tag is None? is label_tag really Optional?

ah sorry, yes it should not be optional

JoanFM · 2022-10-18T07:34:57Z

        metric_name: Optional[str] = None,
        strict: bool = True,
-        label_tag: Optional[str] = 'label',
+        label_tag: str = 'label',


I'd rather check what happens with None. One should be able to put None

Ok so do you mean, that it should be optional and default to label. If a user does not want to use labels, i.e., pass a ground_truth it should be ok to pass None. However, if it is set to None and no ground_truth is passed, an exception is raised.

bwanglzu · 2022-10-18T07:36:43Z

+    for d in da1_index:
+        d.tags = {'label': 'A'}


minor issue

bwanglzu

LGTM!

github-actions · 2022-10-18T08:11:12Z

📝 Docs are deployed on https://ft-feat-support-labels-in-evaluate--jina-docs.netlify.app 🎉

feat: add support for labeled dataset to evaluate function

94d17c9

github-actions Bot added size/m area/core area/docs area/testing component/array labels Oct 12, 2022

guenthermi added 3 commits October 12, 2022 18:24

refactor: add matches before transfering to backend, add example to docs

99e3e66

fix: missing initialization of d2

bedfc7a

test: add tests to check if exceptions and warnings are raised

23fafd3

guenthermi marked this pull request as ready for review October 13, 2022 08:50

fix: duplicate test name

dfa4aa3

guenthermi requested review from JohannesMessner, LMMilliken, bwanglzu, gmastrapas and samsja October 13, 2022 11:06

guenthermi linked an issue Oct 13, 2022 that may be closed by this pull request

add support for labeled datasets in evaluate function #620

Closed

JoanFM requested changes Oct 13, 2022

View reviewed changes

guenthermi mentioned this pull request Oct 13, 2022

refactor: change r precision calculation and documentation #621

Merged

1 task

JoanFM requested changes Oct 13, 2022

View reviewed changes

bwanglzu suggested changes Oct 13, 2022

View reviewed changes

LMMilliken approved these changes Oct 14, 2022

View reviewed changes

guenthermi added 4 commits October 17, 2022 14:20

refactor: implement review notes

cfb6b2d

Merge branch 'main' into feat-support-labels-in-evaluate

4df372a

fix: update r_precision test

c499634

refactor: remove comment

15ea32d

guenthermi force-pushed the feat-support-labels-in-evaluate branch from d12cb03 to 15ea32d Compare October 17, 2022 12:42

guenthermi requested a review from bwanglzu October 17, 2022 12:50

guenthermi requested review from JoanFM and LMMilliken October 17, 2022 12:50

docs: fix ground_truth attribute name

8266904

bwanglzu reviewed Oct 17, 2022

View reviewed changes

refactor: implement review comments

eaf4e6f

guenthermi requested a review from bwanglzu October 17, 2022 16:00

guenthermi mentioned this pull request Oct 18, 2022

feat: multiple metrics in evaluate #643

Merged

1 task

refactor: change 'next release' to 'soon'

485db68

JoanFM requested changes Oct 18, 2022

View reviewed changes

fix: type hint for label attribute

829ff44

guenthermi requested a review from JoanFM October 18, 2022 07:32

JoanFM requested changes Oct 18, 2022

View reviewed changes

bwanglzu reviewed Oct 18, 2022

View reviewed changes

bwanglzu approved these changes Oct 18, 2022

View reviewed changes

JoanFM approved these changes Oct 18, 2022

View reviewed changes

refactor: remove unneeded code

5dbf1a5

JoanFM merged commit ea1ccf2 into main Oct 18, 2022

JoanFM deleted the feat-support-labels-in-evaluate branch October 18, 2022 11:43

NicholasDunham mentioned this pull request Oct 18, 2022

Chore: draft release note v0.18.0 #648

Closed

	'An ground_truth attribute is provided but does not '
	'A ground_truth attribute is provided but does not '

	raise RuntimeError(f'Unsupported groundtruth type {ground_truth_type}')
	raise RuntimeError(f'Something went wrong with the groudtruth')

Conversation

guenthermi commented Oct 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guenthermi commented Oct 12, 2022

Uh oh!

codecov Bot commented Oct 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guenthermi Oct 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guenthermi Oct 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bwanglzu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

samsja Oct 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bwanglzu left a comment

Choose a reason for hiding this comment

Uh oh!

guenthermi commented Oct 12, 2022 •

edited

Loading

codecov Bot commented Oct 12, 2022 •

edited

Loading

guenthermi Oct 13, 2022 •

edited

Loading

guenthermi Oct 13, 2022 •

edited

Loading

samsja Oct 14, 2022 •

edited

Loading