What's a Tier List? A mathematical approach

charnel_mouse · November 6, 2020, 12:38am

Yes, because their great matchup is against a bad character that isn’t expected to see play often, and their much more frequent matchup is disfavoured. Good matchups don’t help much if you’ll rarely see them.

charnel_mouse · November 6, 2020, 11:32pm

I’ve thought about this a bit. Humour me for a moment, and suppose that the Nash equilibrium is the “correct” way to calculate the optimal meta.

Character 1 has a low weight in that meta, but it’s also true that the meta is drastically changed by that character being removed, so the character has a large role in shaping / warping that meta. How much would you prefer it if that was accounted for when trying to enumerate character strength?

vengefulpickle · November 7, 2020, 11:59am

A difficulty I’m having with this is actually answering the question of: What do we mean by a tier list? What is character strength? Do we mean “this is a character that it’s important to be able to play in order to win tournaments”? Or “this is a character that can win against most other characters”? Or “this is a character who warps the meta by existing”? Each of those is going to lead to a different analysis. I think in most FGs, where most players main a single character, the question being answered is “What one character can I play that will win most often?”, which, I think leads to an analysis like the one I posted in discord (basically, just counting fraction of winning MUs, and the quantizing those into tiers) being reasonable.

For Yomi, though, I think it’s more interesting/useful to answer something closer to “Which characters will be relevant in the tournament meta?”. For that, I think an analysis that finds the Nash Equilibrium using the Bo5 or Bo7 standard-CP payoff matrix, and then from those nash percentages, looks at what characters are used during counterpicks, would be an interesting one (I feel like I did that analysis at some point, but I don’t recall the results).

TonyHole · November 7, 2020, 2:31pm

Let’s set aside Tier Lists for a bit, and let’s look at a little something.

What can I do to maximize my chances of winning a tournament?
In a BO5 where you can counterpick at every round, a very important factor is winning the first game of the BO5, since that puts you in an advantaged state (if the opponent wins the next by counterpicking, and you win the following one by counter-counterpicking and so on, you win in the end).
In order to maximize your chances of winning the first game, you want to have a favorable MU against the opponent’s character. Not knowing the opponent’s choice because of double-blind, though, limits your choice to a guessing game.
Most specifically, having an estimate of how many players are playing character A, B and C gives you an opportunity to pick a character that has an high apriori winning probability (which might not mean using the character with the best MU against the most played character, but rather with a good MU overall against the most used characters).
If you have an estimate of which characters are going to be played (i.e. historical data), just using the matrix method gives you the character with the highest apriori chance to win (step 1 on my sheet).

However, the first pick isn’t only influenced by the apriori chance of winning the first game, but also by the chance of winning against the possible counterpicks and the chances of your counter-counterpicks winning against the counterpick.
This can be calculated, too, although it isn’t covered in my sheets, with a similar method. To be precise, it should be possible to calculate a BO3, BO5 or BO7 matrix that tells you which first character has a higher chance of making you win an overall match.
If there’s any interest, I might work on that and see if I get reasonable results.

The problem with Nash Equilbrium (which also is present in eigenvector methods, although it is much stronger in NE) is that it assumes completely logical players on the whole community. As stated previously, while that assumption might somehow work for just two players, it definitely doesn’t hold for larger groups, because there are going to be more Rook or less BBB players than the math asks for simply because people like Rook’s design and dislike BBB’s.
That’s why, for large rosters, NE is usually not the most solid of choices if you want a practical definition of the metagame. One step of the matrix method gives a better answer to the question “what’s the optimal choice to win game 1?”, while two steps give a good answer to the question “what’s the optimal choice to win game 1 assuming other players are using a step 1 approach?” and so on. The eigenvector approach is just this reasoning that goes on forever. By design, no strategy has 50% or more apriori winning probability against the eigenvector strategy (assuming equally skilled players and true randomization in picking the character), although step N has a higher chance to win against step N-1 than eigenvector has against any other step because, by design, step N is the optimal counter to step N-1.

TL;DR: eigenvector solution are the ultimate “no u” which can’t be “no u”'d but, at most, “no we”'d on game 1. A similar eigenvector solution may be computed for BO3/5/7 without additional historical data.

If we define “tier list” as “a list of the most played characters assuming everyone is using the eigenvector strategy”, then here you have your tier list. You might agree with me that that’s also synonimous with how “relevant” a character is (although, as @charnel_mouse pointed out, even the least relevant character has an impact on the metagame), which we could misname as how “powerful” that character is.

vengefulpickle · November 7, 2020, 2:53pm

Yup, I’ve done that calculation in the past. I’ve got the results squirreled away in an ipython notebook somewhere. (I might have posted them in the other Yomi Data thread, too? I forgot.)

TonyHole · November 7, 2020, 2:59pm

It’s probably this one, isn’t it?

vengefulpickle · November 7, 2020, 3:18pm

Yeah, that’s definitely one. It doesn’t actually list out the payoffs for each of the best-of-N choices, sadly, so we can’t do the eigenvector method directly.

God, I’ve done so much math about this game…

charnel_mouse · November 7, 2020, 4:43pm

This isn’t true, though. The best strategy against step N-1 is the character with highest weight in step N. This can have win probability greater then 50% win probability a priori, as your spreadsheet shows for Zane and DeGrey.

Apart from rationality, the main problem with any approach like this, even if we had perfect matchup knowledge, is that player skill effects will be much stronger than the effect of character matchups, and that a player’s skill level can depend on the characters involved. This can change the “optimal” meta completely. I don’t remember how venge’s Yomi stuff worked out, but for Codex my work has player effects much larger than deck effects, and that’s just among players playing on the forum.

TonyHole · November 7, 2020, 5:34pm

This isn’t true, though. The best strategy against step N-1 is the character with highest weight in step N. This can have win probability greater then 50% win probability a priori , as your spreadsheet shows for Zane and DeGrey.

I may have said that a bit unclearly, but I think we are saying more or less the same thing.
Let’s say the step N-1 strat is 50%, 30% and 20% for three options, while the step N is 40%, 45%, 15%. This means that by choosing option 2 you’ll win 45% of the times, which is higher than 40% and 15%, so it’s of course the best choice (which is what you said).
On the other hand, I read it the opposite way: step N-1 says that the best strategy is playing option 1 50% of the times, option 2 30% of the times and option three 20% of the times. For each of these options, the best option in step N changes. To be precise, step N says that the better option will be option 1 40% of the times, option 2 45% of the times and option 3 15% of the times. Overall, if both players pick randomly among their strategy, the step N will have more than 50% likelihood of winning. If, for some reason, you know that the opponent will play the most rated character of step N-1, the strategy is not step N-1 but rather it’s a modified version of it, which can be vectorised as 100%, 0, 0.

On one hand, picking the best option from your strategy looks like an improved version of that strategy. On the other hand, randomizing your choices will increase your likelihood of winning but it comes with two downsides:

You NEED to know how to play all the match-ups
You often won’t get extremely favorable MUs, but rather only mildly favorable (in which case the skills you have with that specific character matter the most, which makes point 1 even more problematic)

Apart from rationality, the main problem with any approach like this, even if we had perfect matchup knowledge, is that player skill effects will be much stronger than the effect of character matchups, and that a player’s skill level can depend on the characters involved.

In my calculations, I’m assuming you don’t know the opponent before your match. Of course this is suboptimal: I’m not using a data I could probably use just for sake of generalization.
I’m really bad against Setsukis, for example, and really bad at playing Gwen. This needs to be accounted for when you want to compute your best choices for the next step.
A simple way to do that is to modify your own chart, and thus creating a personalized MU chart with which you can do all the computations (however, as I said earlier, you can’t compute more than 1 step since you’d be assuming your opponent has the same traits as you).
You COULD do this for your opponent: if everyone’s MU chart was public (i.e. we knew everyone perfectly), you could use that, too as possible info.
Finally, there’s some coupling happening because of specific interactions: for example, I’m pretty gutsy in some of my plays (i.e. aggressively blocking even when it’s highly likely my opponent wants to throw me, making my opponent think that dodging into throw would be better than raw throwing) but much more conservative in other plays (i.e. dodge confirming into highly-damaging attacks). A player who is more methodic in neutral and in the early game and more gutsy in the late game is basically my nemesis, and I have worst MUs against that player with most characters.
However, as all coupling effects, those are extremely tough to factor in because they are highly non-linear and usually strongly reliant on specific situations.

I purposely chose to ignore most of these factors to get a player-agnostic chart (which assumes all players to be equally strong and logical) which can be customized by each player based on their own special traits.

vengefulpickle · November 7, 2020, 6:51pm

@TonyHole: I’m not sure if you’ve already seen it, but Yomi Skill actually lets you pick out two different players, and it’ll give you a personalized estimate of MU results for all characters between those two players. I’ve used it guide my CP strategy at times.

charnel_mouse · November 7, 2020, 7:27pm

This is part where we disagree, I think. The distribution for step N is proportional to each character’s probability of success against the entire random distribution at N-1, not their probability of being the best counter-pick. Therefore, the character with highest weight in step N is exactly the character that does best against the entire random distribution at step N-1. The optimal strategy against step N-1 is a pure strategy where you only take that character, using the distribution at step N instead is strictly worse.

TonyHole · November 7, 2020, 8:06pm

The distribution for step N is proportional to each character’s probability of success against the entire random distribution at N-1, not their probability of being the best counter-pick.

We agree on this. Usually, the highest rated characters are not the best counterpicks to some relevant characters, but rather good counterpicks against most relevant characters.

Therefore, the character with highest weight in step N is exactly the character that does best against the entire random distribution at step N-1. The optimal strategy against step N-1 is a pure strategy where you only take that character, using the distribution at step N instead is strictly worse.

The character with the highest weight in step N is indeed the character with the highest expected payoff, but if its weight is under 50% it means that when it fails it fails hard.

Just in case I botched the math, I set up a quick Python script to simulate both scenarios:

In scenario 1, one player played according to the historical distribution (the one I have set up in my sheet), and the other one plays according to step 1 applied to the historical distribution.
In scenario 2, one player played according to the historical distribution, and the other one plays only the highest rated character in step 1 applied to the historical distribution.

Scenario 1, over the course of 10 simulations (with 100.000 games played per simulation) returned the following winrate:

50.304%
50.357%
50.495000000000005%
50.254%
50.071%
50.285000000000004%
50.515%
50.21600000000001%
50.148%
50.235%
[Note: the figure after the zeros in many of those numbers is due to how python does arithmetic]

Which, as expected, is always over 50% (actually, in testing I did get one under 50%, like on 49.7% or something, but still…).

On the other hand, although it shouldn’t need simulations to tell, scenario 2 gave the following results:

47.42%
47.592%
47.393%
47.392%
47.361%
47.327000000000005%
47.306%
47.257%
47.402%
47.266000000000005%

Which proves two things:

I DID botch the math, since I expected a winrate of 52% or something (Troq’s winrate). I’ll have to double check this, although it feels strange to me
A mixed strategy can be better than a pure strategy, and it always is for non-pathological cases in which you apply the matrix method.

In case anyone wants to check the script I used, you can find it at this link. (Mind you, I’m not a programmer by profession, so you’ll find ugly code in there)
data.csv is Scymrian’s corrected MU chart (the one on my sheets), and histdata.txt is basically copy-paste of the games column in the historical data I also have in my sheets.

charnel_mouse · November 8, 2020, 2:58pm

I’m not sure what’s happened, but it looks like it’s giving BBB’s win probability.

deluks917 · November 9, 2020, 12:54am

I was chatting with mastrblastr. I like his idea of reporting two numbers. The first is ‘mu sum’ the second is ‘mu sum vs the top 6 characters’ (ranked by normal mu sum). This works well for a lot of games.

In general, I think mu sum works a lot better than NE methods. If a character has a good mu sum there is probably a reason. They do powerful things. Maybe they have some unfortunate mus but they are well-tuned.

TonyHole · November 9, 2020, 1:30am

Ok, so, I botched more than one thing.
Despite triple-checking the randomization, I still messed up the way to compute whether a game was a win or a loss and I basically inverted the results. In hindsight, this was obvious since the percents I was getting where complementary to Troq’s.

On the other hand, if anything, the new simulations validate my math, so I’m 100% sure the step is computed correctly.

I also have one more errata corrige on what I said: yeah, OF COURSE the mixed strategy works worse than the pure strategy. I don’t exactly know what I was thinking of when I said that, but the reason is mathematically quite obvious. Your objective is maximizing the function Ma*b where a is the step0 distribution and b is the strategy. We have no control over M or a, and the only way to maximize the product is by maximizing the value of b at the index where Ma is largest, which, of course, means making a null vector with 1 at the best index.

If we make the assumption that characters are played in proportion to their power (apriori winning probability), you can still compute the eigenvector solution and get a steady-state solution (i.e. a stable meta that won’t change as tournaments go on), although that’s only theoretically interesting and not useful in the least (unless you want to invest on learning a character that will become relevant in a couple of years, but eh).

Now, all that’s left is finding a quick way to compute BO5 strategies. I’m pretty sure I can manage programming it.

@deluks917 the MU sum idea is a good concept, overall, since it doesn’t need any historical data (aside from the MU chart). On the other hand, it looks like a first-order approximation of the eigenvector method, so it should yield similar resulta (the eigenvector method has more or less the same characters that are considered top tier listed as more popular, so it should work the same). I can definitely try out your approach and add it to the sheets, though.

charnel_mouse · November 9, 2020, 10:45am

Maybe a 2-D tier plot, then, similar to this? But with average matchup against top quartile, instead of matchup variance, for the horizontal axis.

TonyHole · November 9, 2020, 11:09am

Updates: I should have gotten matches with counterpicks right. I don’t have any way to validate my numbers with validations, but the code I use for BO3 and BO5 works for BO1 and also works with a toy dataset I built for which I could compute manually both BO3 and BO5.
I computed BO3 and BO5 (and I’m waiting for the results of BO7, which is taking 200 times longer than it should because I’m computing the MU chart, and 200 times a lot of time for a single MU is a lot of time. I also forgot to implement a progress tracker that tells me how far into the calculation the code is, so eh). I would have loved to implement the calculations on Sheets (in order to be able to change the results dynamically), but that’s not too practical to do (requires a huge amount of sheets).
I added them to the sheets file (under the sheets BO3 and BO5), where there’s an adjusted MU chart (i.e. what will your probability of winning if you picked a certain character and the opponent played another certain character?) and an apriori probability of winning given the distribution your opponent is using to pick the first character. You can compute the eigenvector solution to this, too, using the adjusted MU chart instead of the current MU chart, but I don’t think it’d be a very interesting thing to factor in.

I do have a function (in Python) that tells you the best counterpick depending on the current scores for each player, but that’s not feasible to implement in sheets.
I might make a colab (interactive python document that can be run on the browser) with all the data in the future, so that you get the computing power of python, a decent presentation of the data (with explanations for each step written out) and interactivity if you want to edit your own MU chart in or compute on the fly the best counterpick.

vengefulpickle · November 9, 2020, 2:13pm

I think the BoN computations should be sheets-computable using a dynamic-programming type strategy, actually. For any matchup A vs B, the recursion ends up being BO(A,B,X,Y) = MU(A,B) * MIN(BO(A, *, X-1, Y)) + (1-MU(A,B))*MAX(BO(*, B, X, Y-1)). Each level only depends on the two levels below it, so you end up computing from the bottom up.

TonyHole · November 9, 2020, 2:28pm

I haven’t bothered, honestly, to try it in Sheets since I had to stop the BO7 computation in Python after some time it ran because it took way too long.
The problem was, of course, that in deep trees (which is the case for a BO7), a lot of states (i.e. character you want to counterpick optimally, number of wins you need to win the match, number of wins the opponent needs to win the match) are repeated, and even vectorializing the math as much as possible was very inefficient. After switching to a method that saved the states and their results to speed up recursion the computation for BO7 finished ~1 second.
The closest thing in Sheets would be to write an AppScript that auto-creates a sheet for each score state (2-1, 1-1, 2-0, etc…) and saves the MUC for that specific score over there.

Also, I’ve already started the colaboratory: that lets me show stuff more freely than excel and also add extensive comments to what we are doing, so that you don’t need to reference this thread to understand the data you are looking at.

vengefulpickle · November 9, 2020, 2:43pm

Yeah, that was what I meant by using a dynamic programming approach (well, effectively the same). Computing from the bottom up, and saving all the lower level calculations, like you found.

Turns out, it is pretty annoying in sheets.