Minimaxing the Minimax Algorithm in Tic-Tac-Toe
In Praise of Hardcoding
27 September 2015
For a recent coding challenge, I had to implement an AI for a Tic-Tac-Toe game. This AI should not be able to lose a match. A quick Google search reveals that the most efficent way is to use a minimax algorithm to determine what moves are likely to cause the AI to either tie or win against another player. However, nobody I found seemed to analyze what decisions the AI actually makes.
The minimax algorimth works by having the computer play Tic-Tac-Toe against itself, figuring out the best moves to use against itself…and then figuring out the best counters to those moves, and so on and so forth. It will then rank all the possible moves it can take:
-10 | - The AI will lose if it chooses this move. Avoid it. |
0 | - It's a tie. |
10 | - The AI will win. |
The AI will pick the move that has the highest score.
Therefore, the computer will play perfectly and never lose. It may never win either though, if the human it is playing against is also playing perfectly.
But there is an extreme amount of trust in how the Minimax Algorithm works. There is a lot of literature examining how to program the algorithm, and very little literature on what choices the algorithm ends up making. I assume that most people assume that since the algorithm will choose the best possible move for Tic-Tac-Toe, examining what moves it actually picks would be a waste of time.
But the problem is that the computer has to play through every possible game of Tic-Tac-Toe before it can make its choice. This is not an instantaneous process. For the worst case scenario (where the AI goes first, and thus must play through all Tic-Tac-Toe games), I had to wait 114 seconds for a response from the AI. This is absurd.
There are ways to reduce the number of games a computer has to play through (since many Tic-Tac-Toe games are really just duplicates of each other, with the boards being rotated)…but there had to be a better way to make the game go faster.
I decided to look “under the hood” of the AI, because I was curious in seeing how the AI thought for itself. I soon realize that if I know that the AI is thinking, I could ‘hardcode’ in the AI’s choices. The AI will just end up doing its move rather than spending 114 seconds learning to do that same move.
In the following terminal outputs, O is human and X is the computer. I also output a Hash showing the computer’s thought process (“MOVE”=>SCORE).
So here’s three starting scenarios:
O goes first and pick a corner space.
0 | 1 | 2 |
3 | 4 | 5 |
6 | 7 | O |
It is now X’s Turn.
{“0”=>-10, “1”=>-10, “2”=>-10, “3”=>-10, “4”=>0, “5”=>-10, “6”=>-10, “7”=>-10}
X has picked Spot 4. </code>
The computer will choose the space that has the highest score. In this case, every space other than 4 has a score of -10. The only space that allows the computer to tie the human player is 4, the spot right in the middle. So X picks that.
O goes first and pick a edge space.
0 | 1 | 2 |
3 | 4 | 5 |
6 | O | 8 |
It is now X’s Turn.
{“0”=>-10, “1”=>0, “2”=>-10, “3”=>-10, “4”=>0, “5”=>-10, “6”=>0, “8”=>0}
X has picked Spot 1. </code>
In this case, now, there are four moves that may lead to a tie: 1, 4, 6, and 8. Any other move will lead to the computer’s loss.
But it does not matter how the computer gets to that tie. A tie is a tie. So the computer can easily pick any move that leads to a tie, and the outcome will always be the same. In this case, the computer just choses Spot 1.
What happens if I let the computer go first?
0 | 1 | 2 |
3 | 4 | 5 |
6 | 7 | 8 |
{“0”=>0, “1”=>0, “2”=>0, “3”=>0, “4”=>0, “5”=>0, “6”=>0, “7”=>0, “8”=>0}
X has picked Spot 0. </code>
None of the spaces are any more advantageous than the others. No matter what move the computer picks, a tie will result. So the computer could pick literally any move and be ensured that the other player won’t win. In this case, the computer choses Spot 0.
I now know that the computer sees the space “4” as an excellent move to make, no matter whether O goes first or second. (Other moves can be just as excellent as “4”, but “4” will always be excellent.)
I then programmed the computer to always pick “4” as its first move. The minimax algorithm would still be used after the first move though. After the first few moves, the number of possible Tic-Tac-Toe games decreases, and so the minimax algorithm is speedy enough for the user to not be annoyed.
Result: The AI is still unbeatable, but now the game is faster to play! It takes less than a second for the computer to make its first move, and its second move against the player is still just as fast.
It is cool to have the AI think how to beat its opponent. But just because something is cool does not mean that it is necessary or worthwhile in the real world. Thinking takes time, and a user is not going to be happy waiting for 114 seconds while the computer figure out what it should do next. Hardcoding in the AI choice ensure that the game runs faster. And since the hardcoded choice is actually based on the results of the minimax algorithm, we are not risking the chance of the AI losing.