I found some minor issues in math.evaluation:
- The Wikipedia article linked in the r_precision docstring does not seems to be correct. According to the article the metric is calculated based on the Top-N results, where N is determined by the number of relevant documents in the dataset. However, the implementation sets N as the last position of a relevant document as done here. However, all descriptions of R-Precision which I found explain the metric like in the Wikipedia article. So it should be changed to behave like this
- The ndcg metric calculates the dcg score relative to the optimal dcg score. Basically, the implemention is correct, however, in most cases it is wrong to apply it in docarray.
The problem is, that optimal value needs to be calculated based on all possible results, however if I apply docarrays match function only the top-k ranking is returned. Therefore, the optimal top-value calculated by our implementation is wrong, e.g., if none of the top-20 results is relevant, the maximum score is 0, however, the actual maximum score might be higher because there relevant scores outside of the top-20. This should be at least explained in the documentation
- The R-Precision also requires to know the whole ranking. So this should also be noted in the documentation.
I found some minor issues in math.evaluation:
The problem is, that optimal value needs to be calculated based on all possible results, however if I apply docarrays match function only the top-k ranking is returned. Therefore, the optimal top-value calculated by our implementation is wrong, e.g., if none of the top-20 results is relevant, the maximum score is 0, however, the actual maximum score might be higher because there relevant scores outside of the top-20. This should be at least explained in the documentation