online-go / goratings Goto Github PK

This repository contains the (future) official rating and ranking system for online-go.com, as well as analysis code and data to develop that system and compare it to other reference systems.

License: MIT License

Makefile 0.93% Python 51.51% JavaScript 1.45% HTML 0.09% Stylus 0.26% TypeScript 9.80% C++ 35.57% C 0.38%

goratings's People

Contributors

Stargazers

Watchers

Forkers

flovo angelsesma bhostetler18 benjaminpjones animiral dexonsmith

goratings's Issues

Should we ignore handicap games when benchmarking different rating algorithms?

The discussion about ranks and handicap made me wonder if 1rank=1handicap is correct. So I tested what happens when we exclude handicap games from the rating calculation. I ran analyze_glicko2_daily_windows.py once in the original version (including handicap games) and once skipping games with handicap!=0. The result baffled me: For all inspected users, the ranks are lower (in many cases about 10 ranks lower).

pypy3 analyze_glicko2_daily_windows.py

Inspected users from ../analysis/players_to_inspect.ini
               anoek     7k     1810.17 +-  64.72 (0.060035)
               flovo     5k     1930.38 +-  50.48 (0.061096)
            Uberdude     8d     2832.08 +-  61.26 (0.059962)
                 nrx     3d     2441.80 +-  43.41 (0.060813)
            mark5000     5d     2594.46 +-  61.71 (0.061132)
               xhu98     9d     2893.92 +-  69.48 (0.063021)
          RoyalLeela     9d     2954.81 +-  82.60 (0.149770)

pypy3 analyze_glicko2_daily_windows_no_handicap.py

Inspected users from analysis/players_to_inspect.ini
               anoek    16k     1370.96 +-  63.21 (0.059982)
               flovo    15k     1379.88 +-  50.12 (0.060633)
            Uberdude     4d     2468.15 +-  63.60 (0.060043)
                 nrx     6k     1883.50 +-  43.20 (0.060516)
            mark5000     4k     1977.77 +-  61.24 (0.060536)
               xhu98     3k     2034.02 +-  65.31 (0.061660)
          RoyalLeela     4k     1985.14 +-  77.92 (0.149608)

Small error in glicko2 implementation

goratings/goratings/math/glicko2.py

Line 136 in 50b5443

if fC * fB < 0:

Should be fC * fB <= 0 according to the paper's step 5.

http://www.glicko.net/glicko/glicko2.pdf page 3:

If fCfB ≤ 0, then set A ← B and fA ← fB; otherwise, just set fA ← fA/2.

Shippable CI

Commits on GitHub show errors (red Xs), due to:

Shippable will no longer be accessible as of May 3rd 2021.

Perhaps consider shifting to https://github.com/features/actions

Number of games in a rating window depending on window size in days

According to Prof. Glickman, glicko2 works best when there are 10-15 games in a rating period. In our player base this can be any between 27 and 55 days.

I calculated the median number of games based on the window size in days. I put the results here for reference. Only players with at least 30 rated games are considered. Only periods with at least 1 game are used when calculating the median.

window size in days	median number of games	rating days ( number of rating periods with at least one game times window width
1	2	5301557
2	3	7460412
3	3	8927646
4	4	10045968
5	4	10964225
6	4	11728704
7	5	12404581
8	5	12980944
9	5	13513113
10	5	13989680
11	6	14422034
12	6	14808804
13	6	15210078
14	7	15604414
15	7	15939495
16	7	16241776
17	7	16528879
18	8	16859700
19	8	17121945
20	8	17467900
21	8	17727759
22	8	17983724
23	9	18160938
24	9	18437232
25	9	18652825
26	9	18938790
27	10	19153611
28	10	19324872
29	10	19479851
30	10	19722210
31	11	19874782
32	11	20198176
33	11	20278071
34	11	20584892
35	11	20597080
36	12	21018852
37	12	21139099
38	12	21336772
39	12	21373833
40	12	21740400
41	13	21724629
42	13	22132488
43	13	22047390
44	13	22446776
45	13	22403070
46	14	22547866
47	14	22880070
48	14	23090880
49	14	23206890
50	15	23171950
51	15	23451483
52	15	23559172
53	15	23542971
54	15	23934150
55	15	24195215
56	16	24314024

Ratings should consider komi on small boards

Ratings currently don't consider komi on small boards, but should, since usually it's the komi that changes (not the number of handicap stones) as handicap increases.

Proposal: Breakdown Ratings: Use opponents overall rating when calculating breakdown ratings

Right now we use a separate rating pool when calculating each rating in the breakdown chart. This has the big disadvantage to make the ratings not comparable to each other. We can make the breakdown ratings comparable when we use the opponents overall rating when calculating them.

When updating the ratings of a player we use the player's category rating as base rating as we do at the moment. For the opponent's rating we always use her overall rating. The update algorithm stays the same for all breakdowns.

By using the opponents overall rating, we keep the breakdown ratings on the same scale as the overall rating.

For a player playing only one board size + speed combination, all 4 breakdown ratings will be the same, while they can be quite different at the moment.

Handicap adjustment should consider "proper handicap"

There's a page about "proper handicap" over at Sensei's Library. At first I didn't get it, but now I think this could affect OGS's ranking system.

Komi is worth half a stone

Consider this: Black moves first and is therefore one stone ahead, then white follows and closes the gap, but without komi, that's unfair, because the game is always switching between "black one stone ahead" and "equal number of stones". To make this more fair, white starts off with points worth half a stone (about 7 points). This way the game switches between "white half a stone ahead" and "black half a stone ahead". To make the game fair, whoever is half a stone behind is allowed to move next.

Handicap 1 is actually handicap 0.5, handicap 2 is 1.5 and so on

On OGS a game with "handicap 1" means the two players are one rank apart and the stronger player takes white with almost no komi (0.5). White gives an additional stone per additional rank difference. This does not fit the 1 rank = 1 stone rule, as the default komi (6.5 on OGS) is only worth half a stone.

This should be considered in get_handicap_adjustment()

The players ranks are adjusted in handicap games when estimating win rates, but now the weaker player is considered x ranks stronger due to handicap, when it actually should only be considered x - 0.5 ranks stronger. Therefore black's strength and win rate is always overestimated and a loss affects black's ranking more than it should, while the opposite is true for white. I don't know how much it would contribute to improving the rating system, but to me it seems like changing get_handicap_adjustment() to reflect this issue could help reduce the volatility of ratings.

Additionally one could think of changing the handicap system OGS is using, but that should maybe be discussed separately.

Ratings sometimes increase after a loss.

online-go/online-go.com#1376

this might be related with rating recalculation after annulments.

online-go / goratings Goto Github PK

goratings's People

Contributors

Stargazers

Watchers

Forkers

goratings's Issues

Should we ignore handicap games when benchmarking different rating algorithms?

Small error in glicko2 implementation

Shippable CI

Number of games in a rating window depending on window size in days

Ratings should consider komi on small boards

Proposal: Breakdown Ratings: Use opponents overall rating when calculating breakdown ratings

Handicap adjustment should consider "proper handicap"

Komi is worth half a stone

Handicap 1 is actually handicap 0.5, handicap 2 is 1.5 and so on

This should be considered in get_handicap_adjustment()

Ratings sometimes increase after a loss.

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent