Giter Site home page Giter Site logo

rbo's Introduction

Rank-biased Overlap (RBO)

CircleCI PyPI version

This project contains a Python implementation of Rank-Biased Overlap (RBO) from: Webber, William, Alistair Moffat, and Justin Zobel. "A similarity measure for indefinite rankings." ACM Transactions on Information Systems (TOIS) 28.4 (2010): 20." (Download).

Introduction

For a more general introduction, please refer to this blog post.

RBO compares two ranked lists, and returns a numeric value between zero and one to quantify their similarity. A RBO value of zero indicates the lists are completely different, and a RBO of one means completely identical. The terms 'different' and 'identical' require a little more clarification.

Given two ranked lists:

A = ["a", "b", "c", "d", "e"]
B = ["e", "d", "c", "b", "a"]

We can see that both of them rank 5 items ("a", "b", "c", "d" and "e"), but with completely opposite order. In this case the similarity between A and B should be larger than 0 (as they contain the same items, namely, conjoint), but smaller than 1 (as the order of the items are different). If there is third ranked list

C = ["f", "g", "h", "i", "j"]

which ranks 5 totally different items, then if we ask for the similarity between A and C, we should expect a value of 0. In such a non-conjoint case, we need to be able to calculate a similarity as well.

The RBO measure can handle ranked lists with different lengths as well, with proper extrapolation. For example, the RBO between the list A and list

D = ["a", "b", "c", "d", "e", "f", "g"]

will be 1.

Usage

Installation using pip

To install the RBO module to the current interpreter with Pip:

pip install rbo

Computing RBO

The RankingSimilarity class contains the calculation for the different flavours of RBO, with clear reference to the corresponding equations in the paper. Below shows how to compute the similarity of two ranked lists S and T:

In [1]: import rbo

In [2]: S = [1, 2, 3]

In [3]: T = [1, 3, 2]

In [4]: rbo.RankingSimilarity(S, T).rbo()
Out[4]: 0.8333333333333334

Accepted data types are Python lists and Numpy arrays. Using Pandas series is possible using the underlying Numpy array as shown below. This restriction is necessary, because using [] on a Pandas series queries the index, which might not number items contiguously, or might even be non-numeric.

In [1]: import pandas as pd

In [2]: import rbo

In [3]: S = [1, 2, 3]

In [4]: U = pd.Series([1, 3, 2])

In [5]: rbo.RankingSimilarity(S, U.values).rbo()
Out[5]: 0.8333333333333334

Computing extrapolated RBO

There is an extension of the vanilla RBO implementation, in which we extrapolate from the visible lists, and assume that the degree of agreement seen up to depth $k$ is continued indefinitely.

This extrapolated version is implemented as the RankingSimilarity.rbo_ext() method.

Development

Refer to the Makefile for supplementary tasks to development, e.g., executing unit tests, or checking for proper packaging. Please let me know if there is any issue.

rbo's People

Contributors

changyaochen avatar f4lco avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

rbo's Issues

Parameterize p

I see that self.p of the RankingSimilarity Class is locked into 0.5. Could we parameterize that value before the next update, or perhaps get reasoning why it’s locked at that value?

Publish to pypi

Would it be possible to publish this to PyPI? The reason is that git dependencies cannot be used from packages published to PyPI so it means this can only be used from applications and not libraries.

Something seems wrong in your implementation

If I do the following

import rbo 
S = [1,2,3,4,5,6,7,8,0,9]; T = [1,2,3,4,5,6,7,8,9,0]
print(pip_rbo.RankingSimilarity(S, T).rbo(p=.9))

I get 0.6465385908999999

but with https://github.com/dlukes/rbo and (https://towardsdatascience.com/rbo-v-s-kendall-tau-to-compare-ranked-lists-of-items-8776c5182899)[https://towardsdatascience.com/rbo-v-s-kendall-tau-to-compare-ranked-lists-of-items-8776c5182899], I am getting 0.9952170310000001 and 0.995458491249349 respectively.

It might be just that your p is inverted, because

 print(rbo.RankingSimilarity(S, T).rbo(p=.1))

would be 0.9999999989. Still not quite yet, because for the value of p = .9, .995.. seems to makes more sense.

Output not consistent with readme description and documentation?

Hi there,

thank you for your work on this package!

I'm a little confused by the output from the rbo function, given your documentation:

You write in the readme's first example

"We can see that both of them rank 5 items ("a", "b", "c", "d" and "e"), but with completely opposite order. In this case the similarity between A and B should (and will) be 0.

However, when you run the rbo function on the lists from your example, then the score is not 0?

In [1]: import rbo

In [2]: A = ["a", "b", "c", "d", "e"]; B = ["e", "d", "c", "b", "a"]

In [3]: rbo.RankingSimilarity(A, B).rbo()
Out[3]: 0.41666666666666663

The comments in rbo.py suggests that when calling rbo() with no arguments, then the p parameter defaults to 1, and the function should simply return the average overlap:

When set to 1.0, there is no weight, the rbo returns to average overlap.

However, if I calculate it by hand, then the average overlap should be simply 0.4, not 0.4167? Am I misunderstanding something? :)

image

evaluation depth goes only to min length

In your implementation the loop only goes to

k = min(self.N_S, self.N_T, k)

However, I think it should go to the maximum.
For example $S=[1,2,3,4]$ and $T=[2]$ results in $rbo = 0$ in your implementation, although there clearly is an overlap at depth 2.

RBO greater one in extrapolated RBO and top-weightedness calculation

While the rbo of identical lists is 1, the result of rbo_ext, as well as the top-weightedness calculation, is technically outside of the [0, 1] interval.

In [1]: import rbo

In [2]: import numpy as np

In [3]: rbo.RankingSimilarity(np.arange(100), np.arange(100)).rbo_ext()
Out[3]: 1.0000000000000002

In [4]: np.isclose(1, rbo.RankingSimilarity(np.arange(100), np.arange(100)).rbo_ext())
Out[4]: True

In [5]: rbo.RankingSimilarity(np.arange(100), np.arange(100)).top_weightness()
Out[5]: 1.0000000000000222

In [6]: np.isclose(1, rbo.RankingSimilarity(np.arange(100), np.arange(100)).top_weightness())
Out[6]: True

In [7]: rbo.RankingSimilarity(np.arange(100), np.arange(100)).rbo() # correct behavior of normal rbo
Out[7]: 1.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.