11 December 2017

Houdini, Komodo, Stockfish, and AlphaZero

Last week's report on TCEC Season 10, Engine-to-engine, Head-to-head, finished with a prediction:-

The engines are slugging it out as I write this and have finished 80 games out of the 100 scheduled for the event. Houdini leads [Komodo] with a score of +13-7=60, meaning that we can project a final score of something like +16-9=75. I'll come back to the event when it's over.

With the score at +14-9=73 after 96 games, Houdini was declared the winner; Houdini is TCEC Season 10 champion! (chessdom.com):-

With its gold medal Houdini becomes the engine with most titles in TCEC history. Robert Houdart shared, "I’ve worked non-stop for the past two years to bring Houdini back to the top level, and I’m really happy that this has resulted in a new TCEC title, which is the equivalent of World Champion status."

Houdini won one more game to achieve a final score of +15-9=76. The match was punctuated by two events. The first was a technical problem; Houdini with a six point lead near the halfway point of TCEC:-

Komodo Team Reports Compiler Glitch – Version Update Rejected [...] At the beginning of the Superfinal many noticed that Komodo’s speed (in nodes per second) was lower than previously seen. Tournament director Anton Mihailov, with the help of the server administrator Martin Thoresen, double- and triple-checked that the engine was installed correctly. The details were sent to the Komodo team and everyone agreed there was no problem during the Superfinal setup.

Statement by team Komodo [...] In summary, there is indeed a slowdown in the version now running in TCEC, which appears to be due to a compiler bug.

In a software development chain, the compiler is the tool that translates the high-level code that the developer writes into the low-level code that the processor understands. The Komodo team asked to submit a substitute executable, but the request was rejected by Houdart and Mihailov. TCEC openings for all games of the superfinal were announced in advance and any new version of an engine introduced mid-match might conceivably take advantage of this knowledge.

The second event, external to the TCEC, was the publication of a paper by Google's DeepMind titled, 'Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm'. The paper appeared a few days before the TCEC's close, when the winner was nearly certain. Its abstract said,

The game of chess is the most widely-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. In contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go, by tabula rasa reinforcement learning from games of self-play.

In this paper, we generalise this approach into a single AlphaZero algorithm that can achieve, tabula rasa, superhuman performance in many challenging domains. Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a world-champion program in each case.

The phrase 'tabula rasa' can be understood as 'blank slate'. In Tabula rasa, Wikipedia says,

Tabula rasa refers to the epistemological idea that individuals are born without built-in mental content and that therefore all knowledge comes from experience or perception.

In other words, starting with only the rules of chess, AlphaZero progressed in a few hours of computational time to a level where it 'convincingly defeated a world-champion program'. For chess, its opponent was '2016 TCEC world-champion program Stockfish' (TCEC season 9). In the season 10 semifinal, Stockfish finished a half point behind the two eventual finalists. Informed observers consider the three engines to be of approximately equal strength, comfortably ahead of the competition. The DeepMind paper continued,

We evaluated the fully trained instances of AlphaZero against Stockfish, Elmo and the previous version of AlphaGo Zero (trained for 3 days) in chess, shogi and Go respectively, playing 100 game matches at tournament time controls of one minute per move. AlphaZero and the previous AlphaGo Zero used a single machine with 4 TPUs. Stockfish and Elmo played at their strongest skill level using 64 threads and a hash size of 1GB. AlphaZero convincingly defeated all opponents, losing zero games to Stockfish and eight games to Elmo (see Supplementary Material for several example games), as well as defeating the previous version of AlphaGo Zero.

We can quibble about whether the AlphaZero - Stockfish match was indeed a fair fight -- 1 GB hash size is a severe restriction -- but the final score of +28-0=72 for AlphaZero was more than convincing to all but the most vehement skeptics. The new TCEC champion expressed his thoughts just after the TCEC finished; Interview with Robert Houdart, author of the champion engine Houdini (chessdom.com):-

Q: AlphaZero just defeated last year’s champion Stockfish 8. Your opinion on the paper published and the match that took place?

A: It’s fascinating and amazing, and at the same time very much expected! We even discussed this during the interview with Nelson and the Komodo authors. It opens entirely new, astonishing possibilities for chess engines! I do hope Google will publish more details about their approach, so that the chess world in general and the computer chess world in particular can benefit from their achievement.

Q: Now that Houdini is the reigning champion, would you issue a challenge for AlphaZero? Under what conditions?

A: It’s normally up to the challenger to issue a challenge, not the reigning champion :) A big discussion point about a possible match between a "normal" engine and AlphaZero would be the hardware to use -- how can you make sure that hardware is comparable? If I can run Houdini on 2000 cores it will be a lot stronger than when running on 64 cores. That said, I’m not sure how Google is viewing their project -- is it a research/marketing project (like Deep Blue was for IBM), or do they intend to use AlphaZero competitively or as an analysis engine available to the general public?

The 'interview with Nelson [Hernandez] and the Komodo authors', which appeared two weeks before the end of season 10 was Interview with Robert Houdart, Mark Lefler and GM Larry Kaufman (chessdom.com). Now that the TCEC event has finished, I would like to look a little more at the technology behind AlphaZero. I started with yesterday's post Giraffe and AlphaZero, which included a link to the DeepMind paper, and will spend the next few Mondays following up.

No comments: