Player Elo Rating

bansa · April 26, 2021, 11:35pm

Thanks for your input guys. I’m revising the original post as below per information I’ve gathered.

Here are the conditions of the Elo rating I’m proposing:

Everyone starts with 1500 rating.
K = 32.
Includes all tournament match results starting with CAFS16.
RACE 2 results are considered prehistoric and are excluded.
MMM matches are excluded since they not structured as open tournament.
Data set is based on @charnel_mouse’s spreadsheet provided in (Codex data thread).
Games that did not start due to inactivity are omitted.

charnel_mouse · April 27, 2021, 7:36am

I don’t think we do! Feel free to use my tournament match data to spare yourself some of the grunt work.

Nopethebard · April 27, 2021, 2:17pm

I think Vengefulpickle messed with a system like that. Might want to ask him

mysticjuicer · April 27, 2021, 4:24pm

Yeah @vengefulpickle put together a Google Sheet that uses tournament results to calculate player Elo. Definitely worth checking in with him about it, though I know he’s been quite busy with a new job recently.

bansa · April 27, 2021, 4:28pm

Thanks guys for your input. That would be wonderful if there is an existing spreadsheet I can use to simply update the recent match results.

Are you referring to the data on your website (didn’t know about this, I just checked it out, awesome!) or is there a post or a tracking spreadsheet for the tourney match results?

charnel_mouse · April 27, 2021, 4:30pm

I have a link to a match data spreadsheet at the start of the data thread.

bansa · April 27, 2021, 4:35pm

OMG!!! How the heck I didn’t see this. This is treasure!!! Wait this changes everything… I don’t need to dig old posts for results anymore. I will see if my brother-in-law can generate a coding based on this data.

bansa · April 27, 2021, 4:37pm

Hold on. I’m much more flexible if this is automated. Open to suggestions guys. Should we include the experimental tourneys, MMMs and RACE?? in this Elo rating? What was RACE?

charnel_mouse · April 27, 2021, 4:54pm

You can find the RACE tournament in the resources list, but the short version is that people were trying to win as many matches as possible during the time limit.

I’m not sure whether it’s what you’re asking, but the data collection isn’t automated. Anything labelled as a “forfeit” type victory is a match I chose not to count, for various reasons. It might be worth looking at the comments I give for those matches, so you can decide for yourself whether to include them.

bansa · April 27, 2021, 5:04pm

Yes, data collection part is all manual. I was talking about automating the Elo calculation part by coding. So do you guys think RACE, MMM and XCAS tourney results are worth to be included? I’m inclined to include them all since we have record of them already and more data is always better unless you guys think they are poor representation of player skills.

Edits: I’m leaning towards to all CAS tourneys regular or experimental starting with CAFS16. This will include LDT & LLL. Also will include RACE 3. Leaving RACE 2 and MMM matches out due to insufficient data and MMM being not really a tourney format.

EricF · April 29, 2021, 6:29pm

RACE is legit - it’s the same basic rules as CAS, just more matches played simultaneous. MMM had deck selection effects, but still probably fine.

Elo is based on the assumption that the true odds of the better player winning are 100%, and it treats the outcome of each game as evidence of which player was better (weighted by K value). This works for chess (100%), duplicate bridge (~90%). Less well for MTG (~70%), not at all for Rock/Paper/Scissors (~51%)

So, for XCAPS, I would use a lower K value, to reflect the increased random factors (eg: did you happen to pick a deck that benefited from the randomness?)

charnel_mouse · April 29, 2021, 6:58pm

Sort of. The better player wins, but each player’s skill level has a slight random element between games, and Elo just tries to track their mean skill. The main assumption to worry about is that it assumes there’s no random factor in the game itself.

bansa · April 29, 2021, 6:59pm

Can you elaborate on this?

bansa · April 29, 2021, 7:03pm

As far as I know Elo rating represents the mean of a bell curve so there is a chance that better player’s performance would hit very low end of the bell curve in an instance and the underdog performs well on the other side of his bell curve which then upset happens if I’m understanding the logic correctly.

bansa · April 29, 2021, 7:06pm

I don’t know what you guys think but my guess is Codex is somewhere between chess and MTG more close to chess, so Elo should be a good metric to implement on Codex?

charnel_mouse · April 29, 2021, 7:11pm

Roughly speaking, yes. You have some mean difference in skill, and that’s affected by some random variance. Whoever’s skill ends up larger after that wins. Elo considered this to be a variation in the player’s skill, rather than a luck element in the game, but mathematically they’re equivalent. The difference is that, if there’s luck in the game, the variance is larger relative to the mean skill values, since that’s in addition to the player’s own skill variation. You account for that by using a smaller K.

bansa · April 29, 2021, 7:15pm

That is a good idea except it would be more work, so I was thinking we start with the basic and maybe we can play with the tweaks and pick that up in the future updates.

bansa · April 29, 2021, 7:19pm

Yes, RACE 3 will be included. Anything CAFS16 and after which we have a good record of. Not MMMs though, I think it should be an open tournament format at the very least.

charnel_mouse · April 29, 2021, 7:20pm

Something else here is that, since players usually pick their own deck, you’re usually looking at both skill at playing and skill at choosing a deck, with random-deck tournaments getting rid of the latter in favour of higher luck factor, asking for a lower K. That’s something to worry about later, though.

bansa · April 29, 2021, 7:26pm

I generally agree and I think tuning down all XCAS series might be a good approach as EricF suggested. But at the same time, looking back past XCAS tourneys I think the results were fairly good representation of player skills just as much as the regular tourneys. Also, as we know Codex is designed for fairness amongst decks whether or not it is true or even close to that. I think the intent was that any random deck should have a chance in winning against roughly equally skilled opponent.