Player Elo Rating

CarpeGuitarrem · April 29, 2021, 7:37pm

It also gets more complicated because not every tournament sees every player piloting their best deck. For example, I don’t have enough self motivation to ask for games that aren’t assigned to me, so tournament play is the bulk of my play. This means that some tournaments I’m piloting a new deck or monocolor in order to get experience.

It’s probably safest to just treat deck selection as irrelevant, or mess with K value to accommodate?

bansa · April 29, 2021, 8:02pm

I agree. There is already @charnel_mouse’s model that tracks and analyzes performance of a given spec or deck, so for the simplicity, I think it is best to ignore deck fairness in this effort for computing Elo rating.

EricF · April 30, 2021, 1:35pm

Elo doesn’t account for player skill varying in a random way. After each result, it either says “yeah, that’s what I predicted (so minimal change),” “thanks for the info, I wasn’t sure before (moderate change),” or “omg I was totally wrong about one or both of your skills, but I won’t update directly to winner > loser because the truth might be winner > loser current, or winner current > loser, so split the difference with a big change to each.”
Note that Chess ratings are a modified Elo, where players with a lot of history have a reduced K value (under the assumption that if they lose to a low-ranked player, it’s not that their skill has gotten worse, but rather that the low ranked player just doesn’t have enough history to be ranked correctly).

charnel_mouse · April 30, 2021, 1:43pm

Elo doesn’t account for change in a player’s mean skill, which is what it tries to estimate. It does account for the player’s actual skill level / performance varying around that mean.

bansa · April 30, 2021, 4:40pm

That is a good point and also the concern I had. I’m currently seeking advice from @vengefulpickle to find a right K value for Codex and two-tier system that changes K value after a certain number of games just like Chess and Yomi.

bansa · May 2, 2021, 5:00am

My brother and I tested it with constant K=32 and there is a couple of issues that I want to discuss here.

Elo range is too compressed with the leader being just slightly above 1700. I think this is because our data size is too small compared to other established Elo ratings. Both MTG and Yomi have their highest Elo at around 2400 and we know Chess is 2400+ for their GMs. So, I think we should at least see our leader at 2000+.

I think the fix would be increasing K value but the question is how high. I started with K=32 and also considered K=40 since those are the K values Chess gives to a beginner. MTG uses K=36. We would need a much higher K value than these.

Another issue is with the constant K number. As EricF mentioned, it can be disadvantageous to existing players. So, the fix would be introducing tiered K values.

I asked vengefulpickle about their K value for Yomi Elo and they use K=97 for their first 55 games and K=40 after. This seems like something we want to base off of although K=97 and 55 games were picked for Yomi data, so we don’t have to use these exact numbers.

Let me know if you guys think I’m heading the right direction or not.

bansa · May 5, 2021, 5:25pm

I was playing with the two-tier system and realized that the results look quite different depending on parameter of the values. It seems that the break where K value changes has a bigger impact more so than the actual K values. I started with the break at first 30 games since that is what Chess uses but it appears that players with total games played around 30 score higher in this scenario. We can increase this to 50 or more but everyone has different peaks, so it will affect people differently. I wanted to get you guys’ input on this before I go further. Any suggestions you guys?

bansa · May 5, 2021, 5:52pm

Here are some of the findings in the process.
Classic data sorted by most wins and highest win rate for your viewing pleasure.
CAFS16 through CAWS20

Players with 20 tourney wins by most wins

player	games	wins	win rate
zhavier	146	94	64%
FrozenStorm	139	88	63%
EricF	116	82	71%
Bomber678	79	37	47%
Marto	48	34	71%
cstick	51	28	55%
codexnewb	59	27	46%
bansa	35	24	69%
charnel_mouse	57	24	42%
Jadiel	35	23	66%
Dreamfire	36	23	64%
bolyarich	31	21	68%
petE	35	21	60%

Players with 60% win rate by highest win rate

player	games	wins	win rate
Marto	48	34	71%
EricF	116	82	71%
bansa	35	24	69%
bolyarich	31	21	68%
Jadiel	35	23	66%
zhavier	146	94	64%
Dreamfire	36	23	64%
FrozenStorm	139	88	63%
Anemone	16	10	63%
petE	35	21	60%
Zejety	20	12	60%

bansa · May 6, 2021, 5:27pm

So I did some research on Elo K value and the break. I also did some testing with our data and here are my findings and proposal.

Using higher K value to increase leader board rating was not a good idea. “If the K-factor coefficient is set too large, there will be too much sensitivity to just a few, recent events” This is from Elo rating wiki under Most accurate K-factor and multiple articles pointed out the same issue. This is true. If we set the K factor too high, the swing is too much that it only reflects performance in the recent 10 to 20 or so games. For normal K factor (after the break), even 40 seems too high. My read from multiple studies and references was that 20 to 24 is the sweet spot for this. The first x number of games (before the break) can be set higher but not too high. I believe 40 is a good number since it is being used in lots of Elo systems including Chess and prolly close to high limit where above becomes too much. This leaves us with an issue of being too compressed and no way to get to 2000+ (this is not necessary but just for easier read since we are used to seeing other leaderboard above 2000) due to small data. I don’t have a solution for the compressed part. I think it is okay though. We are a small community after all. The 2000 part can be addressed by just changing the initial rating to 2000 instead of 1500.
Now let’s talk about the break. I started at 30 because again that is what Chess uses and FIDE specifically. USCF uses a little bit different system which I don’t have the information of. I think this makes sense to our data. We have a quite of population whose number of games are sitting just above 30. Here is my take on this, a player with better than 50% win rate will likely go through 6+ games in a tourney. Let’s say it’s 7 or 8. Four tourneys in a year so it will average to 30 games in a year. I think one year of time is prolly good duration to gauge a new player’s performance. We can increase this number similar to Yomi’s approach but amongst people who have large number of games played, they have different peaks and this quite affects who benefits and not. I believe 30 is a good number to use here.

To conclude, I propose taking FIDE’s setting and applying to Codex. I mean it’s tried and true.
Only difference is that we will start at 2000 instead of 1500.
FIDE uses the following ranges:
K = 40, for a player new to the rating list until the completion of events with a total of 30 games.
K = 20, for players with a rating always under 2400.
K = 10 for players with a rating of at least 2400 and at least 30 games played. Thereafter it remains permanently at 10. (This will not apply to us at the moment but can be implemented later as necessary)

Let me know what you guys think.

Bryce_The_Rice · May 6, 2021, 6:02pm

There’s no need to start at 2000. The reason we see people over 2000 is just because of the number of people playing most games.
We will quickly figure out where the high skill number lies.

bansa · May 7, 2021, 4:22pm

Ture, but it also means we can start pretty much anywhere we want.

There is a few advantages starting at 2000 in my opinion.

This allows us to simply follow the FIDE settings and set the second break at 2400.
This is my main point and I think there is value in this. There is a commonly shared notion of Elo rating above 2000 as high tier or expert so to speak. Yes, we will quickly know where the high skill number lies no matter where we start, but it won’t be obvious to newcomers. If we shift our compressed range to North and align the skilled rating numbers with the other, we can benefit from the established knowledge of typical Elo system. The numbers 2100, 2200, 2300 and 2400 will carry intrinsic value that requires little explanation to people who are familiar with other Elo system. For example, Chess grants Master titles for these range, MTG and Yomi has their best players sitting around 2300 and 2400.

If our intent is to break away from existing Elo system, I would say starting at 1000 is cleaner and better approach. I know a community where active number of players is small that started their Elo at 1000 and currently has their top 5 players above 1200 and one person ever reached 1400. We are going to see a similar spread and this is why I propose starting at 2000. But for some reason, if people don’t want to see our rating above 2000 then I think 1000 is better than 1500 to make a clean separation from other Elos.

charnel_mouse · May 8, 2021, 7:57pm

Something else about 2400+ ratings in Chess is that, for veteran players, they will probably be based on a lot more games. I’m shocked that you’ve played as many tournament games as Dreamfire and Jadiel already.

zhavier · May 8, 2021, 8:29pm

im amused to see I am king of playing the most games. I am at least persistent.

vengefulpickle · May 10, 2021, 4:17am

One thing worth pointing out here is that the value set for the denominator when computing Qa and Qb (following the wikipedia terminology) controls what a difference of X Elo points means as far as expected outcome. I mentioned this to @bansa, but for Yomi I use 1135.77, which I think amounts to some nice round number for what a difference of 100 Elo means.

Reverse engineering my math, I think it was picked so that 200 difference in points is a 60/40 matchup. I think I picked that in particular because for Yomi, given the variance involved, seemed significantly less likely than chess to be completely one-sided, even at very high skill differentials.

bansa · May 10, 2021, 5:14am

I’ve played with different parameters and also tried other Elo settings but couldn’t find one that behaves better than the basic FIDE Elo. IMO, Codex is the closest to Chess amongst it’s cousins, so I’m comfortable using their setting as a base point. If someone wants to take this further and find customized values for Codex, I’d be happy to share my findings and help.

So, here is my first pass at it and how it looks with starting Elo at 2000.
CAFS16 thru CAWS20
K=40 for first 30 games
K=20 thereafter

Players with Elo above 2100

player	games	wins	win rate	elo
bansa	35	24	69%	2231
Marto	48	34	71%	2208
bolyarich	31	21	68%	2205
EricF	116	82	71%	2198
zhavier	146	94	64%	2193
FrozenStorm	139	88	63%	2183
Dreamfire	36	23	64%	2175
Jadiel	35	23	66%	2172
petE	35	21	60%	2111