Yeah @vengefulpickle put together a Google Sheet that uses tournament results to calculate player Elo. Definitely worth checking in with him about it, though I know he’s been quite busy with a new job recently.
Thanks guys for your input. That would be wonderful if there is an existing spreadsheet I can use to simply update the recent match results.
Are you referring to the data on your website (didn’t know about this, I just checked it out, awesome!) or is there a post or a tracking spreadsheet for the tourney match results?
OMG!!! How the heck I didn’t see this. This is treasure!!! Wait this changes everything… I don’t need to dig old posts for results anymore. I will see if my brother-in-law can generate a coding based on this data.
Hold on. I’m much more flexible if this is automated. Open to suggestions guys. Should we include the experimental tourneys, MMMs and RACE?? in this Elo rating? What was RACE?
You can find the RACE tournament in the resources list, but the short version is that people were trying to win as many matches as possible during the time limit.
I’m not sure whether it’s what you’re asking, but the data collection isn’t automated. Anything labelled as a “forfeit” type victory is a match I chose not to count, for various reasons. It might be worth looking at the comments I give for those matches, so you can decide for yourself whether to include them.
Yes, data collection part is all manual. I was talking about automating the Elo calculation part by coding. So do you guys think RACE, MMM and XCAS tourney results are worth to be included? I’m inclined to include them all since we have record of them already and more data is always better unless you guys think they are poor representation of player skills.
Edits: I’m leaning towards to all CAS tourneys regular or experimental starting with CAFS16. This will include LDT & LLL. Also will include RACE 3. Leaving RACE 2 and MMM matches out due to insufficient data and MMM being not really a tourney format.
RACE is legit - it’s the same basic rules as CAS, just more matches played simultaneous. MMM had deck selection effects, but still probably fine.
Elo is based on the assumption that the true odds of the better player winning are 100%, and it treats the outcome of each game as evidence of which player was better (weighted by K value). This works for chess (100%), duplicate bridge (~90%). Less well for MTG (~70%), not at all for Rock/Paper/Scissors (~51%)
So, for XCAPS, I would use a lower K value, to reflect the increased random factors (eg: did you happen to pick a deck that benefited from the randomness?)
Sort of. The better player wins, but each player’s skill level has a slight random element between games, and Elo just tries to track their mean skill. The main assumption to worry about is that it assumes there’s no random factor in the game itself.
As far as I know Elo rating represents the mean of a bell curve so there is a chance that better player’s performance would hit very low end of the bell curve in an instance and the underdog performs well on the other side of his bell curve which then upset happens if I’m understanding the logic correctly.
I don’t know what you guys think but my guess is Codex is somewhere between chess and MTG more close to chess, so Elo should be a good metric to implement on Codex?
Roughly speaking, yes. You have some mean difference in skill, and that’s affected by some random variance. Whoever’s skill ends up larger after that wins. Elo considered this to be a variation in the player’s skill, rather than a luck element in the game, but mathematically they’re equivalent. The difference is that, if there’s luck in the game, the variance is larger relative to the mean skill values, since that’s in addition to the player’s own skill variation. You account for that by using a smaller K.
That is a good idea except it would be more work, so I was thinking we start with the basic and maybe we can play with the tweaks and pick that up in the future updates.
Yes, RACE 3 will be included. Anything CAFS16 and after which we have a good record of. Not MMMs though, I think it should be an open tournament format at the very least.
Something else here is that, since players usually pick their own deck, you’re usually looking at both skill at playing and skill at choosing a deck, with random-deck tournaments getting rid of the latter in favour of higher luck factor, asking for a lower K. That’s something to worry about later, though.
I generally agree and I think tuning down all XCAS series might be a good approach as EricF suggested. But at the same time, looking back past XCAS tourneys I think the results were fairly good representation of player skills just as much as the regular tourneys. Also, as we know Codex is designed for fairness amongst decks whether or not it is true or even close to that. I think the intent was that any random deck should have a chance in winning against roughly equally skilled opponent.