lucasmaystre / choix Goto Github PK
View Code? Open in Web Editor NEWInference algorithms for models based on Luce's choice axiom
License: MIT License
Inference algorithms for models based on Luce's choice axiom
License: MIT License
Hi,
I have about 30,000 images that I need to rank on a linear scale (let's say from 1 to 10). The naive approach would dictate to do a full pairwise comparison (nĀ² comparisons), making the task impossible, however I'm searching for a method that would allow me to only do about nlogn comparisons by "wisely" choosing the pairs to compare. Of course, using such a method we can only hope to get "near" the ranking ground truth, but it should be sufficient for my needs.
I'm under the impression that choix could be used to attain my goal, but I can't determine how exactly at this point. Any pointers that might help? Thanks!
A nice feature would be adding a weights vector to the data vector, in order to be able to assign a different "importance for each observation".
Using this suggested feature, it would also be easy to implement a regularisation, by adding another observation for each pair (in the pairwise comparison models), and give it a small weight.
Currently when the directed graph is acyclic, the ML is that the root will basically have an infinite strength, and regularisation fixes that.
Thanks for the great package!
Is it possible in choix for each observation (or pair if you wish) to attribute and use some feature vector? The feature vectors, not outcomes alone, will be used to learn the preferences.
Thanks
No real issue and you can close this issue, but thank you so much for the library! I knew I needed a Bradley-Terry implementation in Python and went searching expecting to find just code snippets, not a whole nice library! The API and documentation are great.
Hi @lucasmaystre,
Thanks for taking the time to build this package. Its been of great help.
We are just struggling a bit around a particular problem and hoping you can assist. We are given a matrix pairwise probabilities and we need to compute the probability of win against all other queries in the matrix (i.e. probability of ranking first). How can we make use of your package for this problem.
Thanks.
Can you provide a function documentation?
Thanks,
Do you have a paper I can cite if I use choix in my work?
Hi! Thanks for the great package. Is there a way to train the Bradley-Terry model starting from probabilistic data? E.g. one training sample might be" "item A is preferred to item B with probability 0.7."
I'm trying to determine if Choix can be used to compare two rows of data that look like this:
Item 1 2 3 4 5 6 7 8
0 A 369248.0 12757.0 3.45% 0.83 10569.60 104.0 101.63 0.820%
1 B 35621.0 245.0 0.69% 0.90 219.74 3.0 73.25 1.220%
What I want to do is add my own weight (bias) to several of these columns to say that one feature is more important than another.
I looked at your examples and the data is in integer format. I was thinking maybe I can just rank this above data for each column to get an output of 0 - n-1 for each column but then how would I use choix to compare each row to get an output of "Best" to "Worst"?
In your paper you claim to extend the BTL for ties using P.V.Rao and L.L.Kupper paper. I was wondering if it's possible to implement it here.
Thanks!
I followed the code in your intro notebook
est = choix.ilsr_pairwise(n_items, data, alpha=1e-3)
ranking = np.argsort(est)
using the following data:
data = [(0, 1),
(0, 2),
(0, 3),
(1, 2),
(1, 3),
(2, 3)]
which produced the following output:
[ 0.02440682 0.00827314 -0.00812072 -0.02478326 0.00011201 0.00011201]
('ranking (worst to best):', array([3, 2, 4, 5, 1, 0]))
I am mainly wondering how the ranking part of the algorithm works:
My understanding was that this algorithm should produce a rank of [0, 1, 2], since 0 is the best player in this case. Clearly this is not what is happening, so what are the params actually calculating, and how would I get the ranking as I want above?
First, congratulations for your paper and the awesome piece of code you provided with!
It looks like Packet-Luce ratings can outperform other ranking algorithms such TrueSkill (see here).
Some nice feature of TrueSkill are the possibility to also rank any number of "teams" of any size of TrueSkill's ratings and update the ratings of members of a team according to the team's results.
TrueSkill also can handle "partial play" for each member of a team and adjust the ratings according to the "participation ratio" of the members.
Do you think such behaviors, especially the ability to build teams, are possible with LSR / I-LSR or have any idea how I could do that?
Edit: for partial play, I guess we could simple multiply the initial weight of an edge by the participation ratio...
The link to the example (https://pypi.org/project/choix/notebooks/ep-example.ipynb) doesn't exist.
Hi Author,
I came down to this library from the Stackoverflow question that you have answered at https://datascience.stackexchange.com/questions/18828/from-pairwise-comparisons-to-ranking-python .
I have probabilities for each pairwise comparison, can that also go as input to any of the lsr_pairwise / ilsr_pairwise methods ? Example, let's say probabilities coming from a pairwise classifier trained separately.
Thanks,
Hasan
The documentation states that to represent a top-1 list, a Python list
with an integer and a Python set
should be used. This leads to a TypeError
:
% python3 -m venv venv
% . venv/bin/activate
% pip install choix
Collecting choix
Using cached choix-0.3.5.tar.gz (63 kB)
Preparing metadata (setup.py) ... done
Collecting numpy
Using cached numpy-1.22.1-cp39-cp39-macosx_11_0_arm64.whl (12.8 MB)
Collecting scipy
Using cached scipy-1.7.3-1-cp39-cp39-macosx_12_0_arm64.whl (27.0 MB)
Using legacy 'setup.py install' for choix, since package 'wheel' is not installed.
Installing collected packages: numpy, scipy, choix
Running setup.py install for choix ... done
Successfully installed choix-0.3.5 numpy-1.22.1 scipy-1.7.3
(venv) michi@lappy ~ % python
Python 3.9.10 (main, Jan 20 2022, 11:41:00)
[Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import choix
>>> choix.ilsr_top1(3, [[0, {1, 2}]], alpha=0.1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/michi/venv/lib/python3.9/site-packages/choix/lsr.py", line 391, in ilsr_top1
return _ilsr(fun, initial_params, max_iter, tol)
File "/Users/michi/venv/lib/python3.9/site-packages/choix/lsr.py", line 30, in _ilsr
params = fun(initial_params=params)
File "/Users/michi/venv/lib/python3.9/site-packages/choix/lsr.py", line 350, in lsr_top1
val = 1 / (weights.take(losers).sum() + weights[winner])
TypeError: int() argument must be a string, a bytes-like object or a number, not 'set'
Using another list
instead works:
>>> choix.ilsr_top1(3, [[0, [1, 2]]], alpha=0.1)
array([ 0.97755805, -0.48877902, -0.48877902])
Also, IMHO a tuple
is a better fit to represent a top-1 list, e.g. [(0, {1, 2})]
. This can be accurately represented using the types from the typing
module, i.e. List[Tuple[int, Set[int]]]
where the currently documented convention can't.
I see in #17 some suggestions for handling ties in pairwise comparisons. Is there a way to do that in rankings? Specifically, I have a case where several competitors in a race may not finish, and all of the should be ranked as number of finishers + 1. Is there a way to handle such a condition?
networkx version == 2.5
choix version == 0.3.4
Running the code cell in 1. Generating sample data lead to the following error:
This can be fixed using:
neighbors = list(graph.successors(src))
The same error also occurred for me in 2. Estimating transistions using choicerank and can be fixed in the same way.
I do not still understand which part/algorithm I should use for my problem.
I have several partial ranking lists for n object (some of the rankings for some object may missing in a list) and I want to aggregate the list in a final ranking list.
Which method I should use and how?
How should I represent the item: ranking object for existing and nonexisting object?
I'd like to build a Bradley Terry model. Could you provide an example please?
Currently the various parameter inference algorithms are implemented in pure python (with some vectorized operations via numpy).
Since most of these algorithms are iterative, the implementation is still relatively slow and inefficient. From experience, numba could potentially speed up the code by several orders of magnitude.
I hence intend to speed up some of the inference algorithms with numba, starting with the opt_*
functions.
Hi @lucasmaystre,
When using choix.opt_pairwise
, setting alpha
raises an error:
TypeError: opt_pairwise() got an unexpected keyword argument 'alpha'
But the doc suggests it should be possible to set it :)
Example code:
import choix
n_items = 5
data = [
(1, 0), (0, 4), (3, 1),
(0, 2), (2, 4), (4, 3),
]
choix.opt_pairwise(n_items, data, alpha=0.1)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
š Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ššš
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ā¤ļø Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.