Giter Site home page Giter Site logo

online-go / goratings Goto Github PK

View Code? Open in Web Editor NEW
18.0 18.0 6.0 396 KB

This repository contains the (future) official rating and ranking system for online-go.com, as well as analysis code and data to develop that system and compare it to other reference systems.

License: MIT License

Makefile 0.93% Python 51.51% JavaScript 1.45% HTML 0.09% Stylus 0.26% TypeScript 9.80% C++ 35.57% C 0.38%

goratings's People

Contributors

anoek avatar dependabot[bot] avatar dexonsmith avatar flovo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

goratings's Issues

Should we ignore handicap games when benchmarking different rating algorithms?

The discussion about ranks and handicap made me wonder if 1rank=1handicap is correct. So I tested what happens when we exclude handicap games from the rating calculation. I ran analyze_glicko2_daily_windows.py once in the original version (including handicap games) and once skipping games with handicap!=0. The result baffled me: For all inspected users, the ranks are lower (in many cases about 10 ranks lower).

pypy3 analyze_glicko2_daily_windows.py

Inspected users from ../analysis/players_to_inspect.ini
               anoek     7k     1810.17 +-  64.72 (0.060035)
               flovo     5k     1930.38 +-  50.48 (0.061096)
            Uberdude     8d     2832.08 +-  61.26 (0.059962)
                 nrx     3d     2441.80 +-  43.41 (0.060813)
            mark5000     5d     2594.46 +-  61.71 (0.061132)
               xhu98     9d     2893.92 +-  69.48 (0.063021)
          RoyalLeela     9d     2954.81 +-  82.60 (0.149770)

pypy3 analyze_glicko2_daily_windows_no_handicap.py

Inspected users from analysis/players_to_inspect.ini
               anoek    16k     1370.96 +-  63.21 (0.059982)
               flovo    15k     1379.88 +-  50.12 (0.060633)
            Uberdude     4d     2468.15 +-  63.60 (0.060043)
                 nrx     6k     1883.50 +-  43.20 (0.060516)
            mark5000     4k     1977.77 +-  61.24 (0.060536)
               xhu98     3k     2034.02 +-  65.31 (0.061660)
          RoyalLeela     4k     1985.14 +-  77.92 (0.149608)

Number of games in a rating window depending on window size in days

According to Prof. Glickman, glicko2 works best when there are 10-15 games in a rating period. In our player base this can be any between 27 and 55 days.

I calculated the median number of games based on the window size in days. I put the results here for reference. Only players with at least 30 rated games are considered. Only periods with at least 1 game are used when calculating the median.

window size in days median number of games rating days ( number of rating periods with at least one game times window width
1 2 5301557
2 3 7460412
3 3 8927646
4 4 10045968
5 4 10964225
6 4 11728704
7 5 12404581
8 5 12980944
9 5 13513113
10 5 13989680
11 6 14422034
12 6 14808804
13 6 15210078
14 7 15604414
15 7 15939495
16 7 16241776
17 7 16528879
18 8 16859700
19 8 17121945
20 8 17467900
21 8 17727759
22 8 17983724
23 9 18160938
24 9 18437232
25 9 18652825
26 9 18938790
27 10 19153611
28 10 19324872
29 10 19479851
30 10 19722210
31 11 19874782
32 11 20198176
33 11 20278071
34 11 20584892
35 11 20597080
36 12 21018852
37 12 21139099
38 12 21336772
39 12 21373833
40 12 21740400
41 13 21724629
42 13 22132488
43 13 22047390
44 13 22446776
45 13 22403070
46 14 22547866
47 14 22880070
48 14 23090880
49 14 23206890
50 15 23171950
51 15 23451483
52 15 23559172
53 15 23542971
54 15 23934150
55 15 24195215
56 16 24314024

Proposal: Breakdown Ratings: Use opponents overall rating when calculating breakdown ratings

image
Right now we use a separate rating pool when calculating each rating in the breakdown chart. This has the big disadvantage to make the ratings not comparable to each other. We can make the breakdown ratings comparable when we use the opponents overall rating when calculating them.

When updating the ratings of a player we use the player's category rating as base rating as we do at the moment. For the opponent's rating we always use her overall rating. The update algorithm stays the same for all breakdowns.

By using the opponents overall rating, we keep the breakdown ratings on the same scale as the overall rating.

For a player playing only one board size + speed combination, all 4 breakdown ratings will be the same, while they can be quite different at the moment.

Handicap adjustment should consider "proper handicap"

There's a page about "proper handicap" over at Sensei's Library. At first I didn't get it, but now I think this could affect OGS's ranking system.

Komi is worth half a stone

Consider this: Black moves first and is therefore one stone ahead, then white follows and closes the gap, but without komi, that's unfair, because the game is always switching between "black one stone ahead" and "equal number of stones". To make this more fair, white starts off with points worth half a stone (about 7 points). This way the game switches between "white half a stone ahead" and "black half a stone ahead". To make the game fair, whoever is half a stone behind is allowed to move next.

Handicap 1 is actually handicap 0.5, handicap 2 is 1.5 and so on

On OGS a game with "handicap 1" means the two players are one rank apart and the stronger player takes white with almost no komi (0.5). White gives an additional stone per additional rank difference. This does not fit the 1 rank = 1 stone rule, as the default komi (6.5 on OGS) is only worth half a stone.

This should be considered in get_handicap_adjustment()

The players ranks are adjusted in handicap games when estimating win rates, but now the weaker player is considered x ranks stronger due to handicap, when it actually should only be considered x - 0.5 ranks stronger. Therefore black's strength and win rate is always overestimated and a loss affects black's ranking more than it should, while the opposite is true for white. I don't know how much it would contribute to improving the rating system, but to me it seems like changing get_handicap_adjustment() to reflect this issue could help reduce the volatility of ratings.

Additionally one could think of changing the handicap system OGS is using, but that should maybe be discussed separately.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.