Codex data thread

Yeah, I can’t really evaluate whether they faced off without doing something much more complicated. The mere presence of a spec could shape the match without it ever being used, though, and the hero can affect things too. The meaning of the Present vs. Growth component is not how well those specs face off, it’s a partial evaluation of how well decks including them face off. You couldn’t just take the neutral components to evaluate the 1-spec starter game format, for example.

2 Likes

This was a ton of work! Thanks for attempting to do this. Also, I would be interested in testing 10-match mono-color intervals (with advice) to further strengthen match-up assertions.

1 Like

Thanks! I’m going to wait and see how the results for the current tournament affect things, it should reduce the confounding between strong players and strong decks a bit. Then I’ll put up the “lopsided” matches, minus Black vs. Blue, up in a thread, and see who’s interested. If you mean advice from other players, I was thinking of suggesting some warm-up matches, à la MMM1, but not enforcing it, so that could be a good time for that sort of thing.

I’m not sure about the exact format yet. @FrozenStorm was your suggestion above to take one matchup at a time, and have different pairs of players play one game each for a total series of ten? If we do it like that, I could put up the lopsided matches and do a poll on which one to do first/next, and it’s a bit less time-demanding than two players doing ten matches.

1 Like

@charnel_mouse re-reading my previous post, I think the format I had in mind was something more akin to MMM’s format, where a sign-ups sheet is posted, either to just get a pool of players to be assigned decks, or more exactly like MMM for players to sign up for specific matchups (I think I prefer the latter, as this is meant in my eyes to be an opportunity to “challenge” what’s listed as a “bad matchup” by proving it more even.)

So something like:

  1. You compile a list of the 5-10 most “interesting” pieces of your data (perhaps we can have a Discord call w/ some experienced players to sort out which results from the data are most surprising to us, or put up a poll)
  2. Players then are invited to posit a challenge to those data (i.e. you have Purple vs Green as 2-8 and I think it’s much closer to even, so I volunteer to play purple. Or perhaps your data has Blue vs Red to be 6-4 and someone thinks it’s closer to 3-7 the opposite way. I’m making those numbers up but you catch my drift)
  3. Players w/ at least some experience on either deck are invited to help play out a 10-game series of that (I’d bring @Shadow_Night_Black in to help out with Purple maybe, for example, and I think @Mooseknuckles has played a fair share of Green starter at least? And @EricF and @zhavier are just really strong players who have played a fair bit of Growth and could probably puzzle out great plays? ). It could be where we play in council with one another, or we just split up who plays how many games on which decks

A bit sloppy of an idea, I’ll admit, but at least a means of seeing w/ experienced players “is this data reflective of more targeted testing?”

4 Likes

That sounds reasonable. Would it help if I also drew up the model’s matchup predictions when the players are taken into account? Re: Discord calls, if you wanted me on the call too we’d have to work around my being on GMT.

1 Like

While we’re waiting for XCAFRR19 to finish up, I thought I’d put up what the model retroactively considers to be highlights from CAMS19 – the most recent standard tournament – and MMM1. The model currently considers player skills to be about twice as important as the decks in a matchup, so the MMM1 highlights might be helpful to provide context to some of the recent monocolour deck results.

CAMS19 fairest matches

match name result probability fairness
CAMS19 R4 FrozenStorm [Future/Past]/Finesse vs. charnel_mouse [Balance]/Blood/Strength, won by charnel_mouse 0.50 0.99
CAMS19 R3 bansa [Law/Peace]/Finesse vs. zhavier Miracle Grow, won by bansa 0.51 0.98
CAMS19 R10 bansa [Law/Peace]/Finesse vs. zhavier Miracle Grow, won by zhavier 0.49 0.98
CAMS19 R3 codexnewb Nightmare vs. Legion Miracle Grow, won by codexnewb 0.53 0.95
CAMS19 R6 Persephone MonoGreen vs. bansa [Law/Peace]/Finesse, won by bansa 0.47 0.94

CAMS19 upsets

match name result probability fairness
CAMS19 R2 Legion Miracle Grow vs. EricF [Anarchy/Blood]/Demonology, won by Legion 0.39 0.79
CAMS19 R9 EricF [Anarchy/Blood]/Demonology vs. zhavier Miracle Grow, won by zhavier 0.46 0.92
CAMS19 R6 Persephone MonoGreen vs. bansa [Law/Peace]/Finesse, won by bansa 0.47 0.94
CAMS19 R10 bansa [Law/Peace]/Finesse vs. zhavier Miracle Grow, won by zhavier 0.49 0.98
CAMS19 R4 FrozenStorm [Future/Past]/Finesse vs. charnel_mouse [Balance]/Blood/Strength, won by charnel_mouse 0.50 0.99

MMM1 fairest matches

match name P1 win probability fairness observed P1 win rate
zhavier MonoGreen vs. EricF MonoWhite 0.51 0.99 0.6 (3/5)
FrozenStorm MonoBlue vs. Dreamfire MonoRed 0.49 0.99 0.6 (3/5)
Bob199 MonoBlack vs. FrozenStorm MonoWhite 0.47 0.95 0.6 (3/5)
cstick MonoGreen vs. codexnewb MonoBlack 0.53 0.94 0.6 (3/5)
codexnewb MonoBlack vs. cstick MonoGreen 0.45 0.90 0.4 (2/5)

I’d recommend looking at the P1 Black vs. P2 White matchup, and both Black/Green matchups, in the latest monocolour matchup plot, to see how much the players involved can change a matchup.

MMM1 unfairest matchups

match name P1 win probability fairness observed P1 win rate
HolyTispon MonoPurple vs. Dreamfire MonoBlue 0.08 0.15 0.0 (0/5)
Nekoatl MonoBlue vs. FrozenStorm MonoBlack 0.09 0.17 0.0 (0/5)
FrozenStorm MonoBlack vs. Nekoatl MonoBlue 0.90 0.20 1.0 (5/5)
Shadow_Night_Black MonoPurple vs. Bob199 MonoWhite 0.87 0.27 1.0 (5/5)
Bob199 MonoWhite vs. Shadow_Night_Black MonoPurple 0.16 0.32 0.0 (0/5)
Dreamfire MonoBlue vs. HolyTispon MonoPurple 0.80 0.40 0.8 (4/5)
EricF MonoWhite vs. zhavier MonoGreen 0.78 0.43 1.0 (5/5)

A quick update after adding the results from XCAFRR19.

The player chart is getting a little crowded with all the new players recently – hooray! – so I’ve trimmed it to only show players that were active in 2018–2019.

Player skill

Player skill is currently not dependent on turn order, time, or decks used.

Monocolour matchups

Row is for P1, column is for P2. Player skills are assumed to be equal.

Monocolour matchup details

Sorted by matchup fairness.

P1 deck P2 deck P1 win probability matchup fairness
Green Purple 0.501 5.0-5.0 1.00
Red Black 0.503 5.0-5.0 0.99
Green Red 0.493 4.9-5.1 0.99
White Blue 0.484 4.8-5.2 0.97
Purple Purple 0.522 5.2-4.8 0.96
Blue Red 0.541 5.4-4.6 0.92
Purple Red 0.459 4.6-5.4 0.92
Blue Blue 0.453 4.5-5.5 0.91
White Black 0.443 4.4-5.6 0.89
Black Black 0.426 4.3-5.7 0.85
Blue Purple 0.427 4.3-5.7 0.85
Red Green 0.581 5.8-4.2 0.84
Blue White 0.583 5.8-4.2 0.83
White White 0.584 5.8-4.2 0.83
Red Red 0.595 6.0-4.0 0.81
Green White 0.593 5.9-4.1 0.81
Black Purple 0.602 6.0-4.0 0.80
Green Green 0.378 3.8-6.2 0.76
Red Blue 0.631 6.3-3.7 0.74
Red White 0.367 3.7-6.3 0.74
Blue Green 0.635 6.4-3.6 0.73
Black Blue 0.668 6.7-3.3 0.66
Black Red 0.670 6.7-3.3 0.66
White Red 0.677 6.8-3.2 0.65
Purple Blue 0.308 3.1-6.9 0.62
Blue Black 0.306 3.1-6.9 0.61
Red Purple 0.712 7.1-2.9 0.58
Black White 0.709 7.1-2.9 0.58
Green Blue 0.721 7.2-2.8 0.56
White Green 0.726 7.3-2.7 0.55
Purple Black 0.253 2.5-7.5 0.51
Black Green 0.745 7.5-2.5 0.51
White Purple 0.241 2.4-7.6 0.48
Green Black 0.237 2.4-7.6 0.47
Purple Green 0.203 2.0-8.0 0.41
Purple White 0.798 8.0-2.0 0.40

Some highlights from XCAFFR19. These are retrospective, i.e. after including their results in the (training) data.

Fairest XCAFFR19 matches
match name result probability fairness
XCAFRR19 R1 FrozenStorm [Discipline]/Past/Peace vs. codexnewb [Future]/Anarchy/Peace, won by codexnewb 0.50 1.00
XCAFRR19 R7 Bomber678 [Feral/Growth]/Disease vs. James MonoBlack, won by James 0.50 1.00
XCAFRR19 R4 Nekoatl [Balance/Growth]/Disease vs. James [Disease/Necromancy]/Law, won by Nekoatl 0.49 0.99
XCAFRR19 R8 Nekoatl [Feral/Growth]/Disease vs. charnel_mouse [Discipline]/Law/Necromancy, won by charnel_mouse 0.52 0.96
XCAFRR19 R3 bolyarich [Feral/Growth]/Disease vs. charnel_mouse [Balance/Growth]/Disease, won by charnel_mouse 0.52 0.95
XCAFRR19 R4 EricF [Fire]/Disease/Truth vs. bolyarich MonoBlack, won by bolyarich 0.47 0.94
XCAFRR19 R5 codexnewb [Future]/Anarchy/Peace vs. OffKilter [Fire]/Growth/Present, won by OffKilter 0.46 0.93
XCAFRR19 R1 UrbanVelvet [Anarchy]/Past/Strength vs. CarpeGuitarrem Nightmare, won by UrbanVelvet 0.54 0.93
XCAFRR19 R2 codexnewb Nightmare vs. dwarddd MonoPurple, won by codexnewb 0.54 0.92
XCAFRR19 R5 Leaky MonoPurple vs. CarpeGuitarrem [Demonology]/Growth/Strength, won by CarpeGuitarrem 0.55 0.91
XCAFFR19 upsets
match name result probability fairness
XCAFRR19 R7 codexnewb MonoPurple vs. zhavier [Anarchy]/Past/Strength, won by codexnewb 0.36 0.73
XCAFRR19 R6 zhavier [Anarchy]/Past/Strength vs. FrozenStorm MonoPurple, won by FrozenStorm 0.40 0.80
XCAFRR19 R3 codexnewb [Future]/Anarchy/Peace vs. Leaky [Discipline]/Disease/Law, won by Leaky 0.41 0.81
XCAFRR19 R5 codexnewb [Future]/Anarchy/Peace vs. OffKilter [Fire]/Growth/Present, won by OffKilter 0.46 0.93
XCAFRR19 R4 EricF [Fire]/Disease/Truth vs. bolyarich MonoBlack, won by bolyarich 0.47 0.94
XCAFRR19 R4 Nekoatl [Balance/Growth]/Disease vs. James [Disease/Necromancy]/Law, won by Nekoatl 0.49 0.99
XCAFRR19 R1 FrozenStorm [Discipline]/Past/Peace vs. codexnewb [Future]/Anarchy/Peace, won by codexnewb 0.50 1.00

Here’s something new: the model estimates the variance in player skills and opposed deck components, in terms of their effect on the matchup. This means that I can directly compare the effect different things have on a matchup:

Matchup variance breakdown

The boxes are scaled according to how many elements of that type go into each matchup: 2 players, 1 starter vs. starter, 6 starter vs. spec, 9 spec vs. spec. On average, the players’ skill levels have twice the effect on the matchup that the decks do.

Roughly speaking, that means that, if we took the multi-colour decks with the most lopsided matchup in the model, and gave the weak deck to an extremely strong player, e.g. EricF, and gave the strong deck to an average-skill player, e.g. codexnewb (average by forum-player standards, remember), we’d expect the matchup to be roughly even.

As always, comments or criticism about the results above are welcome. Let me know if there are particular games you’d like the results for too, although those should be available to view on the new site soon.


I’m in the process of making a small personal website, so I can stick the model results somewhere where I have more presentation format options. In particular, I look at the model’s match predictions using JavaScript-style DataTables, where you can sort and search on different fields (match name, model used, match fairness, etc.), so it would be nice if other people could use them too.

It will also let me more easily make the inner model workings more transparent. When I get time, I’m planning to add ways to examine cases of interest, like the Nash equilibrium among a given set of decks, or a way to view the chances of different decks against a given opposing deck and with given players. The latter, in particular, would be useful to let other people evaluate the model for matches that they’re playing.

I have other versions of the model that use Metalize’s data, but I’m going to delay showing those until I’ve finished putting up the site, and have tidied up the data a bit.

4 Likes

OK, it’s pretty rough, but I now have a first site version up here. I had some grappling with DataTables and Hugo to do, so I should be able to add things more easily after this first version.

Things I’m planning to add first:

  • Results for other versions of the model, to show how it’s improved since the start of the project.
  • A match data CSV, and R/Stan code.
  • Downloadable simulation results data for people to play with, if anyone’s interested.
  • Interactive plots. I’d like to have something to put in players and decks and see the predicted outcome distribution, like vengefulpickle’s Yomi page, instead of having to fiddle with column filters on the DataTable in the Opposed component effects section.
  • Tighten up the wording to make it more clear to people who haven’t been following the thread.

I’ll still do update posts here about model improvements, interesting results etc.; the site’s there for people to play around with the model themselves.

2 Likes

Oh, and I’ve developed things to the point where I can easily plug in players/decks in a tournament and get matchup predictions. I’ll post up the predictions for CAWS19 matchups after it’s finished – I don’t want to prime anyone (else) by posting them beforehand – and take a look at how good they were.

Next tournament, my aim is to let the model tell me which deck to use. Not to play to win, more to find the model’s flaws.

3 Likes

The version on the site now has a section for Nash equilibria: against an evenly-matched opponent with double-blind deck picks, how should you best randomly decide to deck to pick? I’ve currently done for this for monocolour decks, and for all multicolour decks used in a recorded game. Doing this for all possible multicolour in the current way I do this would currently require me to have a several hundred Gb of RAM, I’m working on a version that doesn’t.

There are results for the usual case of not knowing who’ll go first / alternating turn order in a series, and for the case where you know before picks whether you’re going first or second.

A lot of the results – which I will try to make more intelligible soon – come down to “the model isn’t very certain about anything.” Go to the site if you want to stare at a bunch of tables. Here’s a few highlights, though:

  • For monocolour games, if you’re going first, half the time you’d take Black or Red. If you’re going second, half the time you’d take Black or sometimes Purple. If you don’t know or are alternating turn order, most of the time you’d take Black. All six colours are still present in the picks in all three cases, so still some uncertainty here.
  • For multicolour games with unknown/alternating turn order, the most favoured deck of those recorded is [Necromancy]/Finesse/Strength, but that’s only picked about 4% of the time, so there’s no dominating strategy. In fact, of the 105 decks in recorded matches, all of them have a non-zero chance to be picked. There’s still a lot of uncertainty here, perhaps made most obvious by the fact that there are two decks with Bashing in the ten highest-weighted decks: [Bashing]/Demonology/Necromancy (#2, 3%) and [Bashing]/Fire/Peace (#7, 2%). Compare to MonoBlack (#21, 1.3%) and Nightmare (#30, 1.1%).
  • In the recorded multicolour case, for unknown/alterating turn order, MonoBlack is the highest-weighted monocolour deck, at 1.3% of total weight.
  • MonoGreen generally has the lowest weight in both monocolour and multicolour. Yes, lower than MonoBlue.
  • Nash equilibria in both cases are slightly in favour of Player 1, so slightly that I don’t think it matters much.

I don’t know who was playing Bashing/Fire/Peace, but the #2 deck is being skewed by having a 100% win rate and only getting played in one event.

I know you’re doing something to tease out the player vs deck contributions, but intuitively I would expect a deck with this profile:
Player Average Win % = 55%
Player + Deck Average Win % = 60%
to be a better choice than this profile:
Player Average Win % = 85%
Player + Deck Average Win % = 100%

It should be stressed that none of the decks are particularly strongly favoured, I was just using Bashing’s presence as an example. I could pull out the sampled Nash weights for the #2 deck, and I’d expect them to have a large variance, with the weight often being zero. I expect this for most of the decks, actually: most matchups will have no recorded matches at all, and variability in a small number of them can result in very different Nash equilibria. I especially expect it for any decks containing Bashing, since it’s so rarely used.

Even for decks with a lot of representation, like Nightmare, I’m not sure how helpful looking at their average win rates would be. They’ll still have highly-uncertain matchups, and it wouldn’t take many matchups to flip away from a deck’s favour in a sample for it to be dominated by other decks. For example, if you look at the “greedy” Nash equilibria, which just uses the mean predicted matchups, most of the decks drop out.

EDIT: Even without everything I mentioned above, from my intuition I’d expect the latter deck in your example to be better, at least in the model, because the player and deck effects are modelled as additive on the log-odds scale. It’s therefore much harder to get from 85% to 100%, or even 90%, than it is to get from 55% to 60%. It’s differences in sample size that would make the former deck more attractive.

EDIT2: The first table on the model page should let you search for a deck’s name to find the matches where it was used. Looks like [Bashing]/Fire/Peace was used by flagrantangles in CAWS18.

EDIT3: Final comment. Having little-played decks highly-weighted is actually a feature I’d expect to see. You can see the same thing happening on the plot for player’s probability of being the best player. Having a high mean skill helps, but so does having a high skill variance. Compare FrozenStorm (low variance, so small probability despite a high mean) to various players like deluks917 (low mean, but high variance, so higher probability).

The Codex page has a new URL while I change the site around. It now has the Codex logo! You can also have a look at how the CAWS19 page is going, now that we’re into the final few rounds. (Its layout quality might vary widely, depending on your screen size.) I’ll go through the results here after the tournament’s finished.

Time to evaluate the model’s CAWS19 predictions. Join me as I despair at the model’s lack of progress.

I’ll be skipping over some over things I calculated, head over to the CAWS19 model page if you want to poke around. It will stay up until the start of CAMS20, when I’ll start tracking the model’s performance for that.

Predictions

Matchups

I’ve split the matchups into two cases. First, we ignore the player skill levels and just look at the decks:

Second, we add on the player skill levels:

Matchups by round

Using the mean matchups from the plots above, the model made the following predictions for the matches that took place:

newplot

newplot(1)

I like plotting the matches by round like this, but it does mean you can’t find a particular match easily. For that, you want the interactive version.

There are two main things I can see here. Firstly, the match fairness tends to improve, especially in the last few rounds. Given that we use a hybrid Swiss / triple elimination tournament, this is what we’d expect to happen, so the model predicting it too is a good sign.

Secondly, and less of a good sign, is the comparison of the predicted P1 win probabilities to the actual results: within the 25%-75% range for predicted matchups, there are far more upsets than there should be, for both players.

Model (Brier) score

I’ve somehow not talked about scoring rules yet, so I’ll briefly do that now.

We’d like some way to evaluate how well our model is doing with predicting match outcomes, while accounting for those predictions being probabilistic. If we had a model that correctly predicted a match as having a 75% chance for a P1 win, we’d still see a P2 win 25% of the time. It’s therefore misleading to evaluate a model just in terms of minimising the number of upsets. We’d like to reward high predicted probabilities for observed outcomes, while not encouraging overcertainty.

The most obvious way to do this is assign a model a score penalty equal to its predicted probability of the observed outcome not happening. In the above case, it would get 0.25 for a P1 win, and 0.75 for a P2 win. A smaller score is better. Unfortunately, this scoring rule encourages extreme predictions of 0% and 100%, since these minimise the expected penalty. The model might think P1 has 75% chance of winning, but to minimise its score penalty it should claim 100%.

Instead, we can use a proper scoring rule, which encourages predictions that you believe to be true. In this case, I’ll be using the Brier score, where the penalty is the square of the model’s probability for the observed event not happening. In the above example, a P1 win gives a penalty of (0.25)^2 = 1/16. Again, a smaller score is better.

The Brier score has the advantage that we have an easy baseline to compare to: if we predict a match as being exactly 5-5, then we always take a score penalty of 0.25, no matter what the outcome, no matter what the “true” chance of a P1 win is.

So, how did the model do for CAWS19, compared to saying everything is 5-5, and compared to how it did on its training data?

prediction used Brier score Calibrated-equivalent forecasts
always predict 5-5 0.250 0.5
model’s prediction 0.225 0.341, 0.659
expected score if model’s perfectly calibrated 0.207 0.293, 0.706
model’s training data performance 0.123 0.144, 0.856

Second column is the Brier score, third column gives the predicted probabilities that would be expected to get that score if they were perfectly calibrated, i.e. the predicted probabilities are exactly correct. The third row looks at the case where the model gave the same predictions as it does now, but they happened to be perfectly calibrated.

The model’s doing better than saying everything is 5-5, at least. It doesn’t look like much of an improvement, but then neither is the expected score if it was perfectly calibrated. So maybe the model’s doing OK, it could certainly be doing much worse. It did much better on its training data, but, since that’s the data it was trained on, that’s not a fair comparison.

Highlights

Some of the most-balanced matches and the biggest upsets, as I’ve done before.

Fairest CAWS19 matches

match result probability fairness
R4 dwarddd [Past/Present]/Peace vs. bolyarich Nightmare, won by bolyarich 0.507 0.985
R2 dwarddd [Past/Present ]/Peace vs. Akiata [Past]/Anarchy/Peace, won by dwarddd 0.518 0.964
R1 EricF [Feral]/Fire/Law vs. bolyarich Nightmare, won by bolyarich 0.481 0.963
R4 charnel_mouse [Balance]/Blood/Strength vs. codexnewb Miracle Grow, won by codexnewb 0.528 0.944
R8 codexnewb Miracle Grow vs. bolyarich Nightmare, won by bolyarich 0.542 0.916

Biggest CAWS19 upsets

match result probability fairness
R1 FrozenStorm [Future]/Necromancy/Peace vs. dwarddd [Past/Present]/Peace, won by dwarddd 0.272 0.544
R4 EricF [Feral]/Fire/Law vs. FrozenStorm [Future]/Necromancy/Peace, won by FrozenStorm 0.295 0.590
R5 Bomber678 MonoRed vs. EricF [Feral]/Fire/Law, won by Bomber678 0.307 0.614
R3 dwarddd [Past/Present]/Peace vs. charnel_mouse [Balance]/Blood/Strength, won by dwarddd 0.338 0.675
R7 codexnewb Miracle Grow vs. zhavier [Balance/Growth/Finesse, won by codexnewb 0.361 0.722

What’s next

Adding the CAWS19 data to the main training set, as well as the MMM1 matches that have been happening in the meantime.

I said I’d let the model choose a deck for me next tournament. I’m going to postpone that to CAMS20, since the upcoming tournament is using the map cards, so all bets are off and the model won’t have a clue how to handle it.

I’ve written a function for listing decks by how good they should be to counter a given opposing deck, but I need to think about how to add it to the site, because the number of possible decks makes the resulting table huge. In the meantime, here’s the top few entries for opposing Nightmare as an example:

deck probability best (n = 4000) mean win probability
[Blood]/Finesse/Peace 0.01275 0.691
[Blood]/Finesse/Strength 0.01025 0.703
[Blood]/Bashing/Finesse 0.01000 0.633
[Finesse]/Peace/Strength 0.00975 0.663
[Finesse]/Blood/Strength 0.00750 0.662

Note that the probabilities of being best are all very small, so high uncertainty here.

The next feature to add is either inter-deck interactions, or having player skill depend on the deck they’re using. I’m leaning towards doing the former first.

Again, the most important thing here is more data. I did some posterior predictive checks to look for any clear systematic biases, but the low number of matches per player/deck pairing – often just one outside of monocolour – mean that it’s hard to make any good conclusions.

Before that, though, I want to make it easier to look at the model’s matchup predictions for monocolour decks, accounting for players. I’ll try and make this easily accessible on the site, that should be doable for monocolour.

The data sheet’s been updated to include CAWS19, and the MMM1 matches that have been happening in the meantime. I’ve also gone back and fixed some mistakes: I hadn’t realised that the players had swapped round a few times on the Green vs. White MMM1 set. I’ve updated the CAWS19 “predictions”, they haven’t changed very much.

2 Likes

I’ve added the Nash equilibrium over all multicolour-legal decks. It only took four days to run! Ugh. Anyway, here are the ten decks with the highest weight if you pick before finding out whether you go first (Both):

Deck P1 P2 Both
[Necromancy]/Finesse/Strength 0.0021 0.0017 0.0040
[Necromancy]/Blood/Strength 0.0020 0.0024 0.0038
[Necromancy]/Blood/Finesse 0.0035 0.0004 0.0034
[Necromancy]/Blood/Peace 0.0023 0.0013 0.0032
[Necromancy]/Finesse/Peace 0.0024 0.0011 0.0032
[Necromancy]/Bashing/Finesse 0.0020 0.0012 0.0030
[Strength]/Demonology/Necromancy 0.0021 0.0011 0.0029
[Necromancy]/Balance/Strength 0.0007 0.0018 0.0028
[Necromancy]/Peace/Strength 0.0007 0.0026 0.0028
[Strength]/Blood/Necromancy 0.0029 0.0006 0.0026

Garth reigns supreme.

I’ve also added a section for finding good counter-picks to specific decks. The full table is huge, so I’ve picked out the most-likely-best counter-picks for Nightmare as an example:

Deck Probability best counter Counter win probability
[Blood]/Finesse/Strength 0.01625 0.727
[Finesse]/Blood/Strength 0.01575 0.703
[Blood]/Bashing/Finesse 0.01050 0.639
[Blood]/Finesse/Peace 0.01000 0.658
[Peace]/Finesse/Strength 0.00900 0.654

Given how large the best-counter probabilities are, obviously there’s still a very high level of uncertainty.

I’ve got a few more examples on the site, plus the full counter-pick table for monocolour decks (Black is dominant, except against Red and White). I can add a link to the full counter-pick table if anyone’s interested in looking at it, just be aware that it contains all 3084^2 matchups so is rather large (about 80Mb when compressed).

2 Likes

Oh, and here’s how much weight each component gets in the multicolour Nash equilibrium, that gives a clearer indication of what’s better-favoured:

Starter P1 P2 Both
Black 0.251 0.280 0.336
White 0.199 0.119 0.152
Purple 0.088 0.168 0.117
Blue 0.119 0.113 0.114
Neutral 0.107 0.120 0.107
Red 0.161 0.075 0.094
Green 0.076 0.125 0.079
Spec P1 P2 Both
Necromancy 0.306 0.249 0.331
Strength 0.182 0.277 0.262
Finesse 0.278 0.144 0.227
Demonology 0.202 0.180 0.224
Blood 0.255 0.123 0.210
Peace 0.154 0.195 0.190
Bashing 0.119 0.174 0.150
Disease 0.115 0.169 0.150
Anarchy 0.208 0.121 0.141
Present 0.097 0.169 0.129
Balance 0.111 0.161 0.129
Past 0.093 0.174 0.123
Growth 0.146 0.112 0.113
Future 0.080 0.159 0.109
Truth 0.111 0.126 0.101
Law 0.115 0.093 0.092
Discipline 0.152 0.068 0.090
Ninjitsu 0.098 0.109 0.082
Fire 0.079 0.123 0.079
Feral 0.101 0.075 0.069
2 Likes

OK, I’m putting this project on hiatus until we get close to the beginning of CAMS 2020. That previous post seems like a nice point to pause on, it’s the closest thing the model will have to a tier list for the specs any time soon.

If there’s anything else people want to see in the meantime, let me know. Tell me if you use the site at all, I don’t know how easy it is to find stuff on there.

We now have all Seasonal Swiss tournaments on this forum recorded! Only LDT 1 & 2, RACE 3, and LLL 2 still to go. Unfortunately, that will be as far back in time as I can record, with the death of the old forum.

Given @Castanietzsche’s performance in the 2016 Swiss tournaments, and their lack of tournament presence elsewhere, I suspect the model will now think Bashing and MonoBlue are really good. I hope to get the time to read the matches and find out how he pulled that off.

2 Likes

We now have all tournament matches recorded! I threw in the semi-finals and final for RACE 2 too, I assume the rest happened on the old forum.

The meta looked very different in 2016, so next update will be some plots for how the starter / spec choices changed over time.

4 Likes