[Strategy Guide] Game theory fundamentals of YOMI

mysticjuicer · September 24, 2016, 1:40pm

This guide was written by DrFaustus

Preface

Games like Yomi or Kongai have a clear relation to game theory and require playing a mixed strategy. That’s roughly clear to almost anyone, but many people don’t have a clear understanding of those concepts.

So I thought I’d give a small lecture on the topic that tries to summarise the important definitions and to explain things for once, so that it can simply be referenced, should the topic arise again, without derailing other topics. I also collected some fallacious statements and discuss why I think they are wrong.

I do not claim to be 100% correct or complete. If you think I get something wrong here, I’d be happy to hear and discuss it. Or to add something that you think is important here as well.

Scope

This thread will focus on Rock-Paper-Scissors-like situations in two-player games, that is choices that are made in a double-blind manner and to any option I might take my opponent has access to a counter option.

The simplest game of that type is, of course, Rock-Paper-Scissors. However, it’s so simple that the solution is trivial, which might be the source for some misconceptions (they might be extrapolated form simple RPS to more complicated cases). Sirlin wrote a great article on how to make this more interesting. First you can make the pay-offs for Rock, Paper and Scissors unequal. Note that even Sirlin made a mistake, when he gave the game-theoretical solution to his easy example (see the first comment below the article). That shows how tricky it might be to find such a solution.

A further step away from simple RPS that Sirlin doesn’t even mention is asymmetry (the options for the players are different). Then the pay-offs might be unclear. Take Yomi, for example: the opponent’s hand is hidden. So you do not know what his fastest possible attack is, or how much damage he can combo into, or if he has a Joker available. You have to make assumptions based on the opponents hand-size, discard and the way he has been playing so far.

Also you can’t simply take damage as the pay-off of your moves in Yomi. What exactly is the value of landing a knock-down? Or drawing extra cards? Keeping a useful move in hand and saving it for it’s ability? So even getting a feeling for how good which move is, is a skill (valuation). Then, of course, in a game of Yomi the pay-offs are shifting permanently due to different health-totals, changing hand-sizes, powering up, effects like knock-down or abilities in play (e.g. Midori’s Dragon Form or Geiger’s temporal distortion).

That’s all quite interesting, but now let’s have a look at what game theory can teach us about strategy in games like this.

Definitions from game theory

Game theory tries to solve games. For some games the solutions are pure strategies, which means that at any given decision point in the game there is exactly one move that a player should make in order to win. But due to the very design of RPS-games there will always be a hard counter-strategy to any pure strategy, so those can be ruled out as solutions.

Instead, we have to consider mixed strategies. That is, at any given decision point we assign a probability distribution (often called range) to the available moves according to which we (randomly) choose our move.

At a given situation game theory asks us to set up a pay-off matrix, which lists the pay-offs for all possible combinations of the possible moves the players might make. From that, game theory computes the Nash-equilibirum(s), that is a pair of strategies for the two players, where no player, given the knowledge that the opponent plays the according strategy, can increase his chances of winning by playing something else.

For more formal definitions and more details check wikipedia. Especially the traffic-game example illustrates the possible existence of multiple nash-equilibria and the difference between stable and unstable ones quite well.

Note that in order to be in a nash-equilibrium, both players have to know, what the equilibrium strategies are, both players have to play these strategies and both players have to know that their opponent plays that strategy.

While there are games with multiple different equilibria, the cases I consider here typically have a unique nash-equilibrium, which consists of mixed strategies for the two players. In those cases playing the range determined by game-theory is called optimal play. If one player chooses to play optimally, he makes sure that the expected (averaged) pay-offs are at least those of the nash-equilibrium, no matter what the opponent does. In other words: he plays in a way where he minimizes the possibilities to be exploited. But by that he skips the opportunity to exploit his opponent himself. Since it is typically infeasible to A) calculate the equilibrium exactly and B) to play that range in a truly random way, those possibilities for exploitation do exist, even if the opponent tries to play optimally.

How to exploit suboptimal play

So due to unclear payoffs it is difficult to set up an accurate payoff matrix. Then, solving this for optimal play is mathematically complex and the results are usually not intuitive. And once a range is calculated humans are bad at acting random, even when they try. So the chances are big that your opponent doesn’t play optimal. How do you exploit that? It depends on which mistake your opponent makes and your ability to identify his mistake (reading him).

Bias

If your opponent plays randomly according to a range that differs from optimal play, you can adapt. Of course, to detect that, first you need to have an idea of what his range would be. If he is playing a certain option too often, you can play the counter to that and beat him. Of course, if you only play the counter for a few turns, you will only beat him as long as he doesn’t change his range. And he likely will change it when he realizes that you counter him, because now you are highly exploitable yourself. So instead you might play some range that is a mix of what you think is optimal and a bias towards the option that beats your opponent. Or just play the counter for one or two turns and then switch your strategy.

The actually difficult thing here is to detect the bias. Let’s say the opponent should attack with 40% probability, but you suspect that he uses 30%. If you want to sample enough data to confirm that at a 5%-significance level, you will need to watch him for more than 20 turns! That’s roughly the length of a single match of Yomi, so the game might already be over. Not to mention that during the match the situation has changed all the time and you couldn’t use most moves here. In practice, you will have to trust your instinct to detect such biases unless they are really strong.

The good news is that this also makes it difficult for your opponent to detect that you have changed your range to counter him. Losing two or three combats in a row might still be due to chance if both players play the NE.

Patterns I: Humans are bad RNGs

In contrast to having a bias your opponent might play his options at reasonable frequencies, but might show a pattern in the way he tries to make it random. For example, if he tries to block 25% of the time, but he didn’t block for the last 4 turns, he might have an increased probability to block now. A perfect random number generator (RNG) wouldn’t have that, but people often try to stay too close to the expected frequencies. Or maybe, while he tries to attack with a 50% probability, he never attacks three turns in a row. People have such patterns and it is difficult to supress them. So if you identify one, go ahead and exploit it. But be aware: Humans are also prone to see patterns in places where there actually are none!
There is a nice article on what randomness really looks like in contrast to what people expect.

Patterns II: Reactions to events

People also tend to over- or underreact to some events. For example, last turn I got hit by a big damage attack. Of course, the changed live totals should have an impact on my range, but maybe my reaction is to become paranoid and overplay block/dodge for the next few turns, while actually I could play much more relaxed, as the dangerous attack is no longer in my opponents hand.

Another great example is powering up: If I see my opponent’s Grave fetch three aces for his super, I should play more carefully. But maybe I identified in previous games that my opponent will never use those aces directly on the turn after he powerd up for them, but instead is highly likely to play them two or three turns after that. This knowledge will give me a great advantage, while the opponent might not even be aware of the fact that he is predictable.

Intuition and mental modelling

Somtimes you cannot clearly put your finger on how you figured something out, but your intuition tells you that your opponent might be highly likely to play a certain move next. Don’t ignore that feeling just because you cannot explain logically where that prediction came from. Your subconciousness is able to detect patterns, that your rational analysis is unable to find. However, your subconciousness sometimes also suggests impulsive reactions out of fear from something or as reaction to getting countered. Those impulsive reactions are highly exploitable by your oppontent. So you have to learn, when to listen to your intuition.

Beyond those observations you can try to make a model of your opponent’s thought-process or draw conclusions about his tendencies. Maybe he shys away from high risk/high reward moves? So he probably won’t play his AAAA-super directly, but will try to dodge into it. Put yourself into your opponents position and ask yourself, what his thoughts could look like. If you manage to master this technique, you can predict your opponent’s behavior in situations, that didn’t come up before. Here again your intuition is your friend.

Yomi-traps

This is a special tactic that only works, if your opponent tries to exploit you. Therefore you should first make sure that your opponent is not just playing according to what he thinks is best in the match-up, but that he reacts to the way how you play.

Now a Yomi-trap consists of two steps. The first one - the conditioning - is to actually play exploitable (for example, never play one of the options like never throw, or play too many attacks, or play a rather obvious pattern). Ideally you do it in a way where getting exploited doesn’t do too much harm, but there is some risk that you have to take here. It’s also most efficient if your pattern or bias requires the opponent to play a specific option to counter it. Now you need to get the timing right and estimate how long it takes your opponent to identify your pattern or bias, and then, when he wants to exploit you, you switch gears and play the counter to his counter.

Donkey Space

Now we have everything together: Two players trying to get an estimate what optimal play might be, trying to find flaws in their opponents play and exploiting that, which makes them exploitable in turn. Or was that just the trap?

This space of ever-changing ranges in which the players dance around each other is sometimes called Donkey Space. This article illustrates how this concept makes poker interesting.

Miscellaneous fallacies

Finally I want to address some fallacious statements that appeared throughout the forum explicitly.

“It’s solvable, therefore it’s boring.”

This one somehow popped up regularly on Kongregate in discussions about Kongai. Actually it is wrong in two ways: First, almost any game is “solvable”. But often it is a long way to actually solve them. And second, the game theoretical solution would only be a mixed strategy that minimizes the danger of being exploited. Playing this way ignores the possibility to exploit the opponent and thus does not maximize the chances of winning (which is equivalent to minimize the options of loosing in two-player, zero-sum games).

“By adapting your range to your opponent you can make anything at least 50:50.”

This one probably comes from the fact that in standard (symmetrical) Rock-Paper-Scissors optimal play gives you 50% chances of winning (if you reapeat after draws). But in asymetric situations that is no longer true! Imagine a RPS-variant where player A’s moves win if B plays the same move, but loses to both other moves. From symmetry we can identify the Nash-equilibrium as both players playing all moves with 1/3 probability. But that means that A will loose in 2 out of 3 rounds. That’s why balancing Yomi is so important.

Also note that due to different hand sizes, different card in hand and different life totaly, even mirror matches in Yomi are hardly ever completely symmetrical.

“Optimal doesn’t mean best way to play? So gametheory is useless!”

I guess this one is clear by now: If I want to exploit my opponent, I need to be able to realize if he strays away from optimal play. But how could I do this if I have no idea of what optimal play would be in the first place? And in situations where you have no idea what to expect from your opponent optimal play allows you to stay save from being exploited, while you try to find flaws in your opponent’s play.

“At high level players just play the Nash-Equilibrium and who get’s closer to it wins.”

I think I talked enough about how difficult it is to actually find the equilibrium. So let’s assume that in a perfectly balanced match up, both players play according to what they think optimal play is. Player A’s idea of optimal play is really close to the true range, but has a tiny bias towards throw. Player B happens to play a bit further away from what the equilibrium really is, but he happens to have his bias towards attack. So in this situation, B is more likely to win, because he happens to (accidentally) play the counter to A more often. A should change his range here, but not to the equilibrium. Actually he should introduce a bias towards more blocks/dodges and away from throws.

“If my opponent is better than me at reading, I should play the Nash-Equilibrium.”

I’m not sure why I should base this decision on a comparisson of reading skills, but as long as I do identify something exploitable in my opponents play, I should use this information to increase my chances of winning. Of course, the opponent might be trying to mislead me into a Yomi-trap, but that’s a judgment call I have to make. As long as I do manage to predict some of my opponent’s plays, I should try to counter them, because that is what maximizes my chances of winning. The only situation where I should retreat to optimal play is, when I have absolutely no clue on what I could exploit about my opponent (i.e. I think he already plays optimal, or I have no clue what to expect from him).

Also note that unless you use a RNG playing optimally only protects you from developing an exploitable bias. You might still unconsciously show a pattern in the way you try to randomise your moves, and thus can still get read.

“On the internet, where you do not see your opponent, reading is only possible, because people are bad RNGs.”

This fallacy is rooted in the assumption, that finding the equilibrium is trivial, or doesn’t matter. But since the equilibrium is hard-to-find, even playing with the help of an RNG is highly likely biased and thus exploitable.

“I play exactly this [insert range here] in that MU (using a RNG), but get beaten regularly. So that MU is unbalanced!”

First, as explained above, playing optimal doesn’t maximize your chances of winning. So as long as the opponent plays no moves that have zero probability in the optimal range (he plays no sub-optimal moves), you restrict your expected pay-off to that of the equilibrium (50-50 in a balanced situation). But you could increase that expected pay-of by taking reads on your opponents play into account.

That only explains, why you don’t win too most of the time, but why do you still get beaten so often?
Because the optimal range is not determined by the MU alone! Even on your first turn you should consider your different options: If your fastest throw in hand is on a T, that has an ability, you probably do not want to throw at all now. Also different attacks might have very different pay-offs (a normal attack without follow-up builds your hand, a 4-card normal string gives you Aces, a 0.0 speed ender has higher chances to win combat, but doesn’t do that much damage, etc.). So those options shouldn’t be put together under the option “Attack 40%”.

Each of those options should get it’s own weight. Then, of course, you have to change that valuation of your hand after each combat, because your hand now looks differently. Also you shouldn’t neglect the knowledge you have about your opponent (hand-size, revealed cards etc.).

“100%-reads in Yomi are impossible, because you draw a random card at the start of your turn.”

Some good players claim to get reads (against weaker players) as good as that they are 100% sure about the exact card the opponent will play next. I’m not sure if perfect prediction is really possible, but there are situations where you get really sure what the opponent will do. And the unknown, drawn card at the begin of the turn doesn’t change that. If you figured out that a Grave player always plays a Q if he is knocked down and has a Q in hand, and you know he has one (because he powered up for it earlier), then the card that Grave drew doesn’t change a thing. He already has exactly the thing he wants to play, and so he does.

“Playing optimal cannot excede the expected, average pay-off of the Nash-equilibrium.”

This one makes optimal play look worse than it really is. There is usually a multitude of different moves available to each player, but the Nash-equilibrium will typically consist of only a few of those options. All other options are over-shadowed by the strictly better range that constitutes the equilibrium.

For example, if I have a string of normal attacks 2,3,4,5 in my hand, I want to start it by attacking with the 2. First, because it’s the the only way to play all those cards in a string to get both Aces, and second, because the 2 is the fastest out of these and thus has the best chances of winning combat. The 4-attack won’t be part of the optimal range. In this example some moves were completely dominated by a single move. There are also situations, where a move, while not being completely dominated by another, still is dominated by a mix of other moves and therefore is not a part of the equilibrium.

Now if I play the equilibrium-range and my opponent makes the mistake of playing some of those sub-optimal moves, I am automatically advantaged against him by “just” playing optimally.

This guide was written by DrFaustus