The evaluation_measures from skondo

evaluation_measures's Introduction

About

evaluation_measures is a framework that implements evaluation measures for IR systems. Following algorithm are implicated.

MRR (Mean Reciprocal Rank)

E.M. Voorhees (1999). "Proceedings of the 8th Text Retrieval Conference". TREC-8 Question Answering Track Report. pp. 77–82.

DCG (Discounted cumulative gain) and nDCG (Normalized Discounted cumulative gain)

Kalervo Jarvelin, Jaana Kekalainen: Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20(4), 422–446 (2002)Cumulated gain-based evaluation of IR techniques

ERR (Expected Reciprocal Rank for Graded Relevance)

Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected reciprocal rank for graded relevance. In Proceedings of the 18th ACM conference on Information and knowledge management (CIKM '09). ACM, New York, NY, USA, 621-630. DOI=10.1145/1645953.1646033 http://doi.acm.org/10.1145/1645953.1646033

session nDCG

K. J ̈arvelin, S. L. Price, L. M. L. Delcambre, and M. L. Nielsen. Discounted cumulated gain based evaluation of multiple-query ir sessions. In ECIR, pages 4–15, 2008.

session ERR

Our original method.

q-measure

Tetsuya Sakai. 2004. Ranking the NTCIR systems based on multigrade relevance. In Proceedings of the 2004 international conference on Asian Information Retrieval Technology (AIRS'04), Sung Hyon Myaeng, Ming Zhou, Kam-Fai Wong, and Hong-Jiang Zhang (Eds.). Springer-Verlag, Berlin, Heidelberg, 251-262. DOI=10.1007/978-3-540-31871-2_22 http://dx.doi.org/10.1007/978-3-540-31871-2_22

Risk-sensitive measure

L. Wang, P. N. Bennet and K. C-Thompson, Robust Ranking Mpodels via Risk-Sensitive Optimazation. In Proc. of the SIGIR 2012. See also TREC WebTRAC 2013 http://research.microsoft.com/en-us/projects/trec-web-2013/

==================

License

evaluation_measures is BSD 2-Clause licensed.

evaluation_measures's People

Contributors

Stargazers

Watchers

evaluation_measures's Issues

session ERR

In your list of evaluation measures you included a variant of the expected reciprocal rank that allows for evaluating sessions. Could you please give me a hint, where to find the reference paper where the adaptation of ERR for sessions is further described?

Thank you.

Why should max_grade be 2 for ERR?

There is a comment in the source,

# NOTE: max_grade should be *2

However, max_grade is a configurable parameter. Furthermore, the ERR paper does not seem to imply that there is an acceptable range of grades.
What is meant by this comment?

Thanks!

skondo / evaluation_measures Goto Github PK

evaluation_measures's Introduction

About

License

evaluation_measures's People

Contributors

Stargazers

Watchers

Forkers

evaluation_measures's Issues

session ERR

Why should max_grade be 2 for ERR?

The example might be misleading

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent