Making tic-tac-toe interesting with AI
Tic-tac-toe, also known as noughts and crosses, is a simple two-player game played on a 3x3 grid. Each player alternates marking a space in the grid with their symbol: the first player uses 'X' and the second 'O'. The goal is to be the first to get three of their symbols along a row, column, or diagonal. The game is a draw if all nine squares are filled and no player has achieved this goal. Under these rules, tic tac toe is a perfect information zero-sum game. This page includes my implementation of tic-tac-toe above where both X and O can be optionally controlled by a custom designed AI algorithm. I explain below how I designed the AI to work with varying degrees of proficiency, and how this feature can inform on how optimal strategies can depend on the skill level of each player.
It is possible to calculate an upper bound on the total number of achievable unique board states as , since there are nine squares, each with three possible states: empty, X, or O. However, most of these states are unreachable, such as those where two players have three in a row simultaneously, or where player O has taken more turns than player X. Therefore, the actual number of unique accessible board states is only . This means a brute force algorithm which explores all possibilities when calculating the optimal move for each board state is feasible. A typical approach to solving a two-player zero-sum game is finding the Nash equilibrium using the minimax algorithm, which identifies the optimal move for each player, assuming the opponent also plays optimally. This involves recursively exploring all possible moves and their outcomes, with player X aiming to maximise the score and player O aiming to minimise it. Scores are assigned to each board state: a win for X as , a win for O as , and a draw as .
In tic-tac-toe, both players choosing moves this way always results in a draw. However, optimal play by one player can depend on the other player's strategy, and many human players do not choose moves optimally. To make the game more interesting, I implemented an AI using a softmax-based probabilistic variant of the minimax algorithm, where the AI considers actions as if both players take them randomly, with probabilities based on the expected scores. More specifically, the log-probability of a move depends linearly on its expected score, meaning the probability of a move is calculated using the softmax function with a temperature parameter. This means that moves with higher expected scores are more likely to be chosen, but not every time. The temperature parameter controls the randomness of the AI's play. Lower temperatures mean the AI is more likely to choose the best move, while higher temperatures makes all moves more equally likely.
To compute the expected resulting score of a move, the AI needs to predict how its opponent will play. Therefore, the AI actually utilises two temperature parameters, one for choosing its own moves, and the other for how it models the other player's strategy. Interpreting temperature as a kind of proxy for skill, then computing expected scores with two paramters in this way lets the algorithm account for the different 'skill' levels of each player. Given a starting board state, the algorithm for computing the expected score works as follows:
- Check if the game is over. If so, then return 1 if X won, -1 if O won, or 0 if the game is a tie.
- Identify all available moves (all empty cells).
- Recursively compute expected scores of each move based on player temperatures.
- Compute probabilities of each move using the softmax function.
- Return the average score weighted by their respective probabilities.
Let's get into the details of the mathematics of the algorithm. For a vector with components and a temperature , the softmax function is
This function normalises the vector so that all probabilitis are strictly positive and sum to 1 (try prove it yourself by adding the components). The probability, , that player, , chooses to move to the board state, from the board state is
where is the set of all board states that can be reached in one move from , when it is player X's turn, and when it is player O's turn. is the expected score given a board state and whos turn it is not. If the game is over then the score is if X wins, if O wins, and if it is a draw. Otherwise, we can calculate the expected score as
These two expressions provide a recursive relationship for calculating the expected score of any board state. The AI uses this recursive function to compute the expected scores of all its available next moves. It then uses the softmax function to convert these scores into probabilities and randomly chooses a move based on those probabilities. The figure below illustrates an exmaple where player X is playing less randomly than player O with and .
In this example, if X is controlled by the AI, then it has a 97.46% chance of picking the winning move immediately. It also has a 0.02% chance of choosing the bottom right corner which would allow O to win next turn. However, O has a high temperature of 1, so it still has a 26.89% chance of not choosing that winning move.
I mentioned before that the temperature parameter can be interpreted as a kind of proxy for skill. To illustrate this, I simulated games between to AI controlled players with varying temperatures. For each combination of temperatures, I simulated 1000 games and estimated the win probabilities of each player. This produced some interesting results, which are illustrated in the figure below.
There are two key take-aways from these results. Firstly, player X has a large advantage by going first. Secondly, players do perform better when they play with a lower temperature. When X plays with a temperature of 1 then there is approximately a 20% chance of a draw regardless of O's temperature. In this case, if O plays with a low temperature, then X will almost never win. However, if O plays with a temperature of 1 also, then X has more than 50% chance of winning. Both players have an equal chance of winning when O plays with a temperature of about 0.5. When X plays with a temperature of 0.1, then the probability of O winning is almost zero regardless of O's temperature. This is because X is almost always choosing the best move. When O's temperature is low, the game usually results in a draw, but when O's temperature is high, X wins roughly 90% of the time.
One thing I found interesting was how the optimal strategy depends on both temperature parameters. For example, at the start of the game, X has three options: choosing a corner, the centre, or an edge. The figures below illustrate how the expected scores of these moves depend on the player temperatures.
If X plays with a high temperature then their best first move is always the centre, probably because it allows for the most possible winning options. However, if X plays with a low temperature then their best first move is usually a corner, probably because it can guarentee a win if O does not subsequently choose the centre. When O plays with a high temperature then the expected score is slightly higher when X chooses the centre.
Below is my Python code to compute these results.
import matplotlib.pyplot as plt
import numpy as np
_scores = {}
def score(board_state, player, temperatures, return_probs=False):
pos_str = stringify_board(board_state)
if not return_probs and pos_str in _scores:
return _scores[pos_str]
game_over, winner = check_gameover(board_state)
if game_over:
_scores[pos_str] = winner
return _scores[pos_str]
moves = get_moves(board_state)
values = np.array(
[
score(make_move(board_state, player, move), -player, temperatures)
for move in moves
]
)
probs = softmax(player * values, temperatures[player])
_scores[pos_str] = np.dot(values, probs)
if return_probs:
return moves, values, probs
return _scores[pos_str]
def stringify_board(board_state):
return "\n".join("".join("_XO"[cell] for cell in row) for row in board_state)
def softmax(z, T):
z = np.array(z)
e_z = np.exp(z / T)
return e_z / e_z.sum(axis=0)
def get_moves(board_state):
return list(zip(*np.where(board_state == 0)))
def make_move(board_state, player, move):
new_board = board_state.copy()
new_board[move] = player
return new_board
def check_gameover(board_state):
sums = np.stack(
[
*board_state, # Rows
*board_state.T, # Columns
board_state.diagonal(), # Main diagonal
np.fliplr(board_state).diagonal(), # Anti-diagonal
]
).sum(1)
if 3 in sums:
return (True, 1)
if -3 in sums:
return (True, -1)
return (0 not in board_state, 0)
def print_probs(board_state, player, temperatures):
move_scores = score(board_state, player, temperatures, return_probs=True)
board_score = score(board_state, player, temperatures)
print("Current board: {0:.4f}".format(board_score))
print(stringify_board(board_state))
for m, v, p in zip(*move_scores):
print("Move [{0},{1}]: {2:.4f}, {3:.4f}%".format(*m, v, p * 100))
def simulate_game(temperatures, print_game=False):
board = np.zeros([3, 3], dtype=int)
player = 1
game_over = False
while not game_over:
moves, _, probs = score(board, player, temperatures, return_probs=True)
move = moves[np.random.choice(len(moves), p=probs)]
board = make_move(board, player, move)
game_over, winner = check_gameover(board)
player = -player
if print_game:
print("\n" + stringify_board(board))
return winner
def count_wins(temperatures, trials):
X, O = 0, 0
for t in range(trials):
winner = simulate_game(temperatures)
if winner > 0:
X += 1
elif winner < 0:
O += 1
return X / trials, O / trials, (trials - X - O) / trials
# print example
board = np.array([[1, 1, 0], [-1, -1, 0], [1, -1, 0]])
print_probs(board, 1, {1: 0.2, -1: 1})
# plot win probabilities via simulation
temps = np.arange(0.1, 1.1, 0.1)
num_trials = 1000
np.random.seed(745)
for T in [0.1, 1]:
counts = []
for t in temps:
_scores = {}
counts.append(count_wins({1: T, -1: t}, num_trials))
plt.figure(figsize=[4, 3])
plt.plot(temps, counts, label=["X Win", "O Win", "Tie"])
plt.axis([0, 1.1, 0, 1])
plt.xlabel("$T_O$")
plt.ylabel("Probability")
plt.title(f"Win Estimates with $T_X$={T}")
plt.legend()
plt.savefig(f"p_estimates_T{T}.svg", bbox_inches="tight")
# plot expected scores based on first move
board_corner = np.array([[1, 0, 0], [0, 0, 0], [0, 0, 0]])
board_centre = np.array([[0, 0, 0], [0, 1, 0], [0, 0, 0]])
board_edge = np.array([[0, 1, 0], [0, 0, 0], [0, 0, 0]])
for T in [0.1, 1]:
scores = []
for t in temps:
_scores = {}
scores.append(
[
score(board_corner, -1, {1: T, -1: t}),
score(board_centre, -1, {1: T, -1: t}),
score(board_edge, -1, {1: T, -1: t}),
]
)
plt.figure(figsize=[4, 3])
plt.plot(temps, scores, label=["Corner", "Centre", "Edge"])
plt.axis([0, 1.1, -1, 1])
plt.grid()
plt.xlabel("$T_O$")
plt.ylabel("Score")
plt.title(f"Expected Scores with $T_X$={T}")
plt.legend(loc="lower right")
plt.savefig(f"scores_T{T}.svg", bbox_inches="tight")