Authors: *Zhiyang Zhang, Ningcong Chen*

Part 1: Parallelized Gomuku environment Part 2: Monte Carlo Tree Search (MCTS) & Self-Play Part 3: Neural Network & MCTS

Code: https://github.com/Zhiyang-Z/AZ You can run play_gomoku.py to play with Gomoku in terminal. Trained parameters can be downloaded from here: AZ_gomoku_000984.zip

0. Intention

I think you're like me - astonished by AlphaZero, which defeated the world champion in Go, a game considered the most complex board game.

https://youtu.be/WXuK6gekU1Y?si=ngCG1fMyJuRm2qbv

This brought me mixed emotions - it feels like a shame that the smartest human brain was defeated by machine, diminishing the glory of human mind. But at the same time, I'm curious: How could this happen? What is the mechanism behind AlphaZero?

What shocked me even more is that recently another AI, AlphaProof, won a silver medal in International Mathematical Olympiad (IMO). I studied Olympiad Math in high school, so I personally experienced the subtlety - some problems can take days just to understand the solution. How could a machine win a silver medal in its very first attempt?

AI achieves silver-medal standard solving International Mathematical Olympiad problems

Behind AlphaProof lies the influence of AlphaZero (since AlphaProof combines Large Language Models (LLMs) and AlphaZero, and works in Lean system, a formal language framework for math proof verification). These two AIs motivated me to explore the core of AlphaZero. That's why I decided to write this article - to uncover the details of how AlphaZero works. In this blog, I will dive into its implementation and show how we can train a superhuman-level model in Gomoku(also called five in a row) within one week (depending on the machine you use, 10 hours training on two NVIDIA L40S can achieve a decent model in Gomoku). Gomoku shares the same board structure as Go, with simpler rules but not that easy to become a master for human being, making it an ideal choice for our learning.

Gomoku

Play gomoku online with 2 player or multiplayer - papergames.io

Have a try on online platform to get familiar with the game rules

Finally, you can train a Gomoku master. I trained one myself and tested it on online platforms - it was able to defeat human players with ease, using state-of-the-art moves. Like In this game, the AI planned impressive moves that led to a winning position, which completely surprised me(machine is white, and human is black).

machine is white, and human is black

machine is white, and human is black

Before starting the journey, please install the libraries below:

conda create --name gomoku python=3.12.7
conda activate gomoku
pip install -U "jax[cuda12]"
pip install mctx
pip install chex flax

1. Parallelized Environment

Buddhism: “The universe consisting of a triple-thousand great one-thousand worlds.” (三千大千世界)