AlphaGo Zero自學(xué)成才，輕易擊敗上一代AlphaGo

所屬教程:雙語(yǔ)閱讀

瀏覽:

2017年11月28日

手機(jī)版

掃描二維碼方便學(xué)習(xí)和分享

A self-taught computer has become the world’s best player of Go, the fiendishly complex board game, without any input from human experts.

一臺(tái)無(wú)師自通的電腦，在沒(méi)有任何人類(lèi)專(zhuān)家輸入的前提下，成為了極其復(fù)雜的棋盤(pán)游戲圍棋的世界頂級(jí)高手。

DeepMind, Google’s artificial intelligence subsidiary in London, announced the milestone in AI less than two years after the highly publicised unveiling of AlphaGo, the first machine to beat human champions at the ancient Asian game. Details are published in the scientific journal Nature.

在高調(diào)推出AlphaGo不到兩年后，谷歌(Google)旗下位于倫敦的人工智能公司DeepMind宣布了人工智能(AI)技術(shù)的又一里程碑，AlphaGo是在這項(xiàng)古老的亞洲游戲上擊敗人類(lèi)冠軍的第一臺(tái)機(jī)器?？茖W(xué)期刊《自然》(Nature)發(fā)表了相關(guān)細(xì)節(jié)。

Previous versions of AlphaGo learned initially by analysing thousands of games between excellent human players to discover winning moves. The new development, called AlphaGo Zero, dispenses with this human expertise and starts just by knowing the rules and objective of the game.

前幾代AlphaGo最初通過(guò)分析成千上萬(wàn)場(chǎng)優(yōu)秀人類(lèi)玩家間的對(duì)決來(lái)發(fā)現(xiàn)制勝招數(shù)。新開(kāi)發(fā)的AlphaGo Zero則根本不需要人類(lèi)專(zhuān)長(zhǎng)，只要知道游戲規(guī)則和目標(biāo)就可以投入游戲。

“It learns to play simply by playing games against itself, starting from completely random play,” said Demis Hassabis, DeepMind chief executive. “In doing so, it quickly surpassed human level of play and defeated the previously published version of AlphaGo by 100 games to zero.”

“它學(xué)游戲僅僅是通過(guò)跟自己玩，從完全的隨機(jī)玩游戲開(kāi)始，”DeepMind首席執(zhí)行官杰米斯•哈薩比斯(Demis Hassabis)說(shuō)。“在玩的過(guò)程中，它很快就超過(guò)了人類(lèi)的水平，并以100比0的戰(zhàn)績(jī)擊敗了在論文中介紹過(guò)的上一代AlphaGo。”

His colleague David Silver, AlphaGo project leader, added: “By not using human data in any fashion, we can create knowledge by itself from a blank slate.” Within a few days, the computer had not only learned Go from scratch but surpassed thousands of years of accumulated human wisdom about the game.

他的同事、AlphaGo項(xiàng)目負(fù)責(zé)人戴維•西爾弗(David Silver)補(bǔ)充稱(chēng)：“我們不以任何方式使用人類(lèi)數(shù)據(jù)，就可以讓它從一塊白板創(chuàng)造知識(shí)。”在幾天時(shí)間里，AlphaGo不僅學(xué)會(huì)了下圍棋，而且還勝過(guò)了人類(lèi)歷經(jīng)數(shù)千年在該游戲上累積的智慧。

The team developed a new form of “reinforcement learning” to create AlphaGo Zero, combining search-based simulations of future moves with a neural network that decides which moves give the highest probability of winning. The network is constantly updated over millions of training games, producing a slightly superior system each time.

該團(tuán)隊(duì)開(kāi)發(fā)了一種新的“強(qiáng)化學(xué)習(xí)”形式來(lái)創(chuàng)造AlphaGo Zero，將基于搜索的未來(lái)走法模擬與神經(jīng)網(wǎng)絡(luò)相結(jié)合，決定如何出招才能獲得最高的獲勝概率。該網(wǎng)絡(luò)用數(shù)百萬(wàn)場(chǎng)培訓(xùn)游戲不斷更新，每次更新都會(huì)帶來(lái)稍稍增強(qiáng)的系統(tǒng)。

Although Go is in one sense extremely complex, with far more potential moves than there are atoms in the universe, in another sense it is simple because it is a “game of perfect information” — chance plays no part, as with cards or dice, and the state of play is defined entirely by the position of stones on the board.

盡管?chē)逶谀撤N層面上非常復(fù)雜，具有比宇宙中的原子更多的潛在走法，但從另一個(gè)層面來(lái)說(shuō)它也是簡(jiǎn)單的，因?yàn)樗且环N“完美信息的游戲”——它不會(huì)像撲克牌或骰子一樣與機(jī)會(huì)有關(guān)，而且棋局完全由棋子的位置決定。

The game involves surrounding more territory than your opponent. This aspect of Go makes it particularly susceptible to the computer simulations on which AlphaGo depends. DeepMind is now examining real-life problems that can be structured in a similar way, to apply the technology.

下圍棋需要占據(jù)比對(duì)手更多的地盤(pán)。圍棋的這個(gè)特征讓它特別容易受到AlphaGo所依賴(lài)的計(jì)算機(jī)模擬的影響。DeepMind正在考慮將該技術(shù)應(yīng)用于那些能以類(lèi)似方式結(jié)構(gòu)化的現(xiàn)實(shí)生活問(wèn)題。

Mr Hassabis identified predicting the shape of protein molecules — an important issue in drug discovery — as a promising candidate. Other likely scientific applications include designing new materials and climate modelling.

哈薩比斯指出，它很有希望應(yīng)用于預(yù)測(cè)蛋白質(zhì)分子形狀-——藥物發(fā)現(xiàn)中的一個(gè)重要問(wèn)題。其他可能的科學(xué)應(yīng)用包括設(shè)計(jì)新材料和氣候建模。