News Shop
Events Forums

More Yomi Data


I think it shows more that people only play certain characters at tournament enough to be logged as high level. For example, I’m pretty sure deluks is high level with Argagarg. It doesn’t matter though, because he never plays Argagarg in tournament. The ones that are verifiably not at the highest level are the ones that are less tall and fatter but the bubble is not near the top. For example (and I’m not calling him out, it’s just the first one I saw) djister’s Rook. He plays Rook enough in tournament for the model to have a level for him, and it’s not at the top. So not universally having flatties does not mean they are not universally high level, it just means they don’t play everyone all the time.

Another Example: if you were to look at my chart, the only character I’ve played enough in tournament for the model to get a good idea where I stand and that also ISN’T near the top is Jaina (and maybe Lum). At least that is if I’m reading these correctly. @vengefulpickle tell me if I’m wrong here.


You’ve got it exactly right. The model can only understand your skill based on the data available. So, no tournament plays means no knowledge in the model, which leads to the really long tails you see.

I’m considering trying to use color as another way of reproducing the same information that’s already in the plots, just to make them more understandable. I might also play with opacity and/or saturation to try and capture the confidence level. So, maybe lower skill tends towards blue, and high skill towards red, and lower confidence tends towards grey and higher confidence tends towards full saturation.


I have a question. Does this skill rating give a greater weighting to winning a bad matchup and vice versa? I apologize if this has been addressed. I’m the type of guy that eats the stew without concerning myself too much about how it is made (i.e. I didn’t take the time to read everything).


I think it would, yes. Your expected chance to win a match is based on the sum of the matchup value, your skill at your character, your opponents skill with theirs, and the difference in your Elo. So, if the model predicted that you would lose (because it’s a hard matchup), and you won, then that would tend to force up your skill with the character, force down your opponents skill with theirs, and force the matchup rating in the direction of your character.


It seems like you used organic vegetables and grass fed beef in your stew. Carry on.


I have a few more quick questions. Is the calculation static or does it change over time? Does each new match result change the values in all the relevant old results?


Ok, gory details time: I’m using MCMC to find parameters for the model that best fits all of the available data. So, any time I run the process, it uses all of the data it has available at that time, and it fits the model across all of the data. So, it should evolve as players play more (and an interesting side-outcome of this process would be to see what happened if I generated the MU chart month-by-month since @mysticjuicer started recording data).


Ahh…so is not live, but a process that has to be run. This brings me to another question. I am now in the stew making business. If you run the routine, does it use my endpoint ELO or my ELO at the time of the match?


It uses your Elo as it was when you played the game. Because the dataset is over such a long period, I 100% expect players to change skill levels over time. The Elo captures that, and is computed per-match (not per-game) over in my Elo google sheet. I don’t model character skill change over time simply because we don’t have enough data to support it.


I thought it was done that way. I appreciate you taking the time to explain. This type of analysis is new to me. Thanks.


Big props to @charnel_mouse for getting me started on it. These techniques were new to me as well, before I started analyzing this dataset. (I’d done some basic Frequentist stats, and used the bootstrap to get confidence intervals, but never used a Beyesian approach before.)


Happy to help :slight_smile:

I’m curious, what sort of distribution are you getting for elo_logit_scale, compared to the per-character skill levels?


I didn’t check it in my latest run, but in a prior run it was in the neighborhood of 0.1, IIRC. In other experimentation, I already knew that the match-level Elo wasn’t a great predictor of game-level performance, so I’m not super surprised that the Elo contribution got scaled down a fair amount.

It’ll be interesting to see what it looks like for the hierarchical weighted model that I’m running now, though.


One quality of life thing that you could add to the charts is to have the scale go up to 10 or something instead of going from negative numbers to zero. When I looked at my graph, I felt like I was playing the Biggest Loser. Yay, I went from a negative player to close to zero.


Yeah, good call. I can re-normalize the charts.


Remind us (or at least me) again what these numbers mean spesifically? I.e what the predicted difference between a 0, a -1 and a -2 actually is?


Well… that depends on other factors. Assuming all other factors are equal (in a 5-5 matchup, against an opponent with matching Elo values), then you can get a sense by looking at the Log Odds table in the first post in this thread (More Yomi Data).

But as you can see from that table, a constant difference in log odds (say, a delta of 0.5) causes a non-linear change in the probability estimate. It seems like I could probably build out a table that would cover matchup + elo difference (including the model scale factor) + char skill difference, but it would end up a bit of a monster.


Thanks. So the answer I was looking for was just “it’s a modifier to the log odds of winning a game with that character, compared to perfect theory (but the relationship between log odds and win probability is complex)”?


Yes, exactly.

EDIT: The key line in the model for this is:

win_chance_logit = (player_char_skill1 - player_char_skill2) + non_mirror .* mu[mup] + elo_logit_scale * elo_logit;


Well. That was humbling.