A chess engine in Go: part 6 - play vs. MOCHI on Lichess!

2024-12-05

MOCHI improvements

I’ve taken up the development of my toy chess engine once more. Check out the → MOCHI repository on Codeberg.

Recent improvements were:

UCI protocol support is now more robust (there was a bug where it would get out of sync with the GUI in case it got a STOP command before it could compute any move).
The move generation is a lot quicker. This mostly comes from removing a string field from the board struct and replacing the information with an uint16 field.
MOCHI plays faster, instead of a fixed search depth of 5 plies (half-moves), now it searches 4 plies deep and some branches up to 6 plies.
The evaluation function was extended with some positional elements, e.g. rooks should be connected and many more.

Lichess

Lichess is libre chess: a free and Open Source chess server where people can play chess against each other. If you play chess, you should absolutely check it out and have an account there: → Lichess.org.

Lichess also offers an API for bots to join and play against humans and against each other. Bots are, of course, clearly marked with a BOT tag. Using computer assistance as a player is not allowed (and gets you banned).

I’ve set up a server and used the nice → lichess-bot bridge to turn MOCHI into a bot you can challenge on Lichess! Go to → BOT mochi_bot’s page and click on the hamburger menu on the right to challenge it to a game. BOT mochi_bot accepts games for time controls between ~~3+0 and 15+10 (blitz to rapid)~~ 1+0 and 15+10 (bullet, blitz or rapid). It will only play ~~one game~~ two games at a time, so you need to be patient if it’s playing somebody else at the moment.

Challenge → MOCHI to a game on Lichess!

Update: apparently one needs to be logged into Lichess for the challenge to go through.

Benchmarking

So, how good does MOCHI play?

If you play chess, you know a player can be rated by her Elo (a rating system invented by Arpad Elo).

First of all, Elo ratings are not absolut. They’re valid in a certain pool of players who play each other. So Elo is not really comparable between different servers or different leagues in real life. Moreover, humans and engines play differently. It’s common for a weaker engine to beat a human in tactical play in blitz, while performing very poorly in an end game in longer time controls.

That being said, Elo is perfect to measure improvements. There is a nice engine testing tool, → cutechess-cli, that can run a match between two engines and compute the relative Elo difference.

So one can pit an engine against another one with known Elo to estimates its strength and see how a change to code influences the rating. One can also pit an engine against an older version of itself to test improvements. This is, by the way, how development of Stockfish (currently the best engine) is done.

Out of curiosity I’ve benchmarked the current version of MOCHI (v.20241203_1800) against Stockfish 17 with a set Elo limit. Stockfish Elo limits → have been recently calibrated.

Here’s a match against Stockfish 17 instructed to simulate 1400 Elo:

./cutechess-cli/cutechess-cli 
    -rounds 200
    -each proto=uci
    -ratinginterval 1
    -outcomeinterval 1
    -engine cmd=./stockfish-17-avx512 option.UCI_LimitStrength=true option.UCI_Elo=1400 tc=2:00+0:01
    -engine cmd=mochi-20241203_1800 restart=on option.Threads=16 tc=5:00+0:00

Score of Stockfish 17 vs MOCHI v.20241203_1800: 81 - 108 - 11  [0.432] 200
...      Stockfish 17 playing White: 41 - 53 - 6  [0.440] 100
...      Stockfish 17 playing Black: 40 - 55 - 5  [0.425] 100
...      White vs Black: 96 - 93 - 11  [0.507] 200
Elo difference: -47.2 +/- 47.5, LOS: 2.5 %, DrawRatio: 5.5 %
SPRT: llr 0 (0.0%), lbound -inf, ubound inf

Player: Stockfish 17
   "Draw by 3-fold repetition": 6
   "Draw by fifty moves rule": 4
   "Draw by insufficient mating material": 1
   "Loss: Black mates": 53
   "Loss: White mates": 55
   "Win: Black mates": 40
   "Win: White mates": 41
Player: MOCHI v.20241203_1800
   "Draw by 3-fold repetition": 6
   "Draw by fifty moves rule": 4
   "Draw by insufficient mating material": 1
   "Loss: Black mates": 40
   "Loss: White mates": 41
   "Win: Black mates": 53
   "Win: White mates": 55

This would put MOCHI at 1447.2 +/- 47.5 Elo.

And another one against Stockfish 17 at 1500 Elo:

./cutechess-cli/cutechess-cli
    -rounds 200
    -each proto=uci
    -ratinginterval 1
    -outcomeinterval 1
    -engine cmd=./stockfish-17-avx512 option.UCI_LimitStrength=true option.UCI_Elo=1500 tc=2:00+0:01
    -engine cmd=mochi-20241203_1800 restart=on option.Threads=16 tc=5:00+0:00

Score of Stockfish 17 vs MOCHI v.20241203_1800: 112 - 76 - 12  [0.590] 200
...      Stockfish 17 playing White: 51 - 45 - 4  [0.530] 100
...      Stockfish 17 playing Black: 61 - 31 - 8  [0.650] 100
...      White vs Black: 82 - 106 - 12  [0.440] 200
Elo difference: 63.2 +/- 47.7, LOS: 99.6 %, DrawRatio: 6.0 %
SPRT: llr 0 (0.0%), lbound -inf, ubound inf

Player: Stockfish 17
   "Draw by 3-fold repetition": 8
   "Draw by fifty moves rule": 3
   "Draw by insufficient mating material": 1
   "Loss: Black mates": 45
   "Loss: White mates": 31
   "Win: Black mates": 61
   "Win: White mates": 51
Player: MOCHI v.20241203_1800
   "Draw by 3-fold repetition": 8
   "Draw by fifty moves rule": 3
   "Draw by insufficient mating material": 1
   "Loss: Black mates": 61
   "Loss: White mates": 51
   "Win: Black mates": 45
   "Win: White mates": 31

And this would put it at 1436.8 +/- 47.7.

So roughly 1450 +/- 50.

Curiously, that’s quite in line with BOT mochi_bot’s Lichess rating after playing 53 rated blitz games there: 1413.0 +/- 57.5.

To double-check, I’ve had MOCHI play against Hiarcs, another engine that can be instructed to play at a given rating (1400 in this case). Outcome was:

MOCHI v.20241203_1800-HIARCS 15.3.1 match (20 games, 2 minutes per game plus 2 seconds per move) (20), 2024.12.04, 120+2
MOCHI v.20241203_1800 - HIARCS 15.3.1 10:10 (+8-8=4)

That’s 8 wins, 8 losses and 4 draws. So, yes, it looks like 1400 Elo is a good estimate at MOCHIs current playing level.

That’s not very good. In blitz, that’s a lower rated local club-level player or dedicated online player, if we want to compare it to a human. For longer time controls, MOCHI would be worse than a 1400 Elo rated human player, though, as MOCHI always plays fast. It doesn’t yet know how to calculate better when given more time.

But, that’s also a great opportunity for improvements. Thanks to cutechess-cli, any potentially play improving patch can be objectively tested.

My plan is to improve to at least 1600 Elo as measured against Stockfish and then release version 1.0 :)

Update

On December 5, 2024, for the NOI Software Developer’s Thursday series, I’ve given a talk about this topic. Thanks to everyone who joined the event “How to create a chess engine in Go” in presence. We had a lot of fun! NOI has uploaded a video to their YouTube channel, enjoy me blundering a mate in one.

MOCHI improvements

Lichess

Benchmarking

Language selector