Google DeepMind is making the front page of Nature (again) with a new AI for Go, named AlphaGo (see also this Nature youtube video). Computer Go is a notoriously difficult problem, and up to now AI were faring very badly compared to good human players. In their paper the DeepMind team reports that AlphaGo won 5-0 against the best European player Fan Hui!!! This is truly a jump in performance: the previous best AI, Crazy Stone, needed several handicap stones to compete with pro players. Congratulations to the DeepMind team for this breakthrough!
How did they do it? From a very high level point of view they simply combined the previous state of the art (Monte Carlo Tree Search) with the new deep learning techniques. Recall that MCTS is a technique inspired from multi-armed bandits to efficiently explore the tree of possible action sequences in a game, for more details see this very nice survey by my PhD advisor Remi Munos: From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning. Now in MCTS there are two key elements beyond the bandit part (i.e., how to deal with exploration v.s. exploitation): one needs a way to combine all the information collected to produce a value function for each state (this value is the key term in the upper confidence bound used by the bandit algorithm); and one needs a reasonable random policy to carry out the random rollouts once the bandit strategy has gone deep enough in the tree. In AlphaGo the initial random rollout strategy is learned via supervised deep learning on a dataset of human expert games, and the value function is learned (online, with MCTS guiding the search) via convolutional neural networks (in some sense this corresponds to a very natural inductive bias for this game).
Of course there is much more to AlphaGo than what I described above and you are invited to take a look at the paper (see this reddit thread to find the paper)!