哈薩比斯:我們來找AlphaGo弱點 沒設定贏多少
5月23日,圍棋峰會柯潔VS阿爾法圍棋第一局戰罷,賽後進行瞭發佈會,阿爾法圍棋團隊回答瞭很多問題,以下是網友@JacquelineHan 在微博上發佈的全文聽議。原文傳送門
Q: 你認汽車擴大機電容為阿爾法圍棋有什麼缺陷嗎?
Demis Hassabis: You know, so…you know that’s why we are here for the summit, is that we want to discover if a great player like Ke Jie can find some weaknesses that we don‘t know, and even AlphaGo doesn’t know from playing against itself。 So of course when we played our match against Lee Sedol, in Game 4, Lee Sedol, with his brilliant creativity, found a weakness and managed to win this game, and it was very interesting for us to see this gap in the knowledge of AlphaGo。 And so in the last year, we went back to try and improve the architecture and the system and for it to learn more against itself and to see if we could solve this knowledge gap。 So we believe we have fixed that knowledge gap, but of course there could be many other new areas that it doesn‘t know, and we don’t know either。 And that’s why we are here, to see if it can be discovered。
這就是我們為什麼來到這次峰會,我們想要知道一個柯潔這樣的頂尖棋手是否能找出AlphaGo身上一些我們所不知道的、甚至AlphaGo自己也不知道的弱點。當我們挑戰李世石時,在第四局中,他用他的絕頂創造力發現瞭一個弱點並贏得瞭比賽,能在AlphaGo的知識框架中看到這個裂隙對我們來說也很有意思。所以去年我們回去試圖改進它的架構和系統,讓它能從自我對弈中學到更多,並嘗試解決這個知識框架裂隙的問題。我相信我們已經修復好瞭,但是當然還可能有更多新的它所不瞭解的領域,我們也不瞭解,所以我們來到這裡來看是否能找出新的弱點。
Q: 這次比賽柯潔小負AlphaGo,有一種比較有腦洞的說法是AlphaGo已經不滿足於僅僅獲勝瞭,而是希望能具體地控制輸贏的差距。請問AlphaGo真的達到這樣的程度瞭嗎?如果沒有的話,還有多久才能做到?
Demis Hassabis: So AlphaGo always tries to maximize its probability of winning rather than to maximize the size of the winning margin。 So whenever we see it has a decision to make, it will always try to pick the more certain path… that it thinks is a more certain path to victory with less risk。 So often in positions that’s what we see the tradeoff that AlphaGo is making is to decide about how certain it is about the margin of victory and how likely the probability of victory。 David, if you want to add anything to that。
AlphaGo總是盡量將贏棋的可能性最大化而不是將贏的目數最大化。我們看到它每次面臨決策的時候,總是會選擇它自己認為更穩妥、風險更小的路線。在它的落子中我們能看到AlphaGo在判斷贏得的目數有多穩妥和勝出的可能性時所做出的權衡。
David Silver: So…it’s a very interesting question。 The way AlphaGo works is as Demis said, it maximizes the probability of winning the game。 This means that we program into AlphaGo a goal。 That goal is in match what we really want it to do, which is to try and win games of Go。 You could imagine other objectives being applied, such as maximizing the gap, the margin of victory, but this is not the objective that we chose for AlphaGo to play in the game of Go。 So if you really focus on victory, then it leads to these behaviors where AlphaGo will try to win, and in doing so, it may give up a number of points in favor of actually just reducing any risks it may perceives, even if that risk seems to be very small。
很有趣的問題。AlphaGo的決策過程就像是Demis所說的那樣,它最大化贏棋的可能性。意思就是我們給AlphaGo植入瞭一個目標,這個目標才是我們想要它在比賽中做到的,也就是贏得比賽。你可以想象有其他的目標被設定進去,比如將勝出的目數最大化,但是這不是我們為AlphaGo選定的目標。當你把贏棋作為中心的時候,就會導致AlphaGo在爭取贏棋時的一些行為,它可能會放棄一些目數以求降低它感知到的風險,即使這個風險非常小。
Q: 我是不是可以這麼說,未來AlphaGo會探知人類的一些極限?
Demis Hassabis: I think the way to think about this is that Go is this amazing subject that is…got almost limitless possibilities。 You know… as I said in my opening talk, I see AlphaGo as a tool for Go players and the Go community to use, to explore these mysteries and truth of Go, and find out more。 And I hope that the Go players are enjoying the last year, including these matches and the matches online, the Master series。 And I hope that it has contributed to improving our understanding of this amazing game。 So I see it as a tool we can use, for great players like Ke Jie and Lee Sedol to discover more about the game that we all love。
我認為看待這件事的方式是,圍棋是一個非常令人驚訝的有著無限可能性的事物。就像我在開幕式上所說的,我把AlphaGo看作是一個供棋手和圍棋界使用的工具,用它探索圍棋的神秘和真理,去探尋更多可能。我希望棋手們都能享受過去的一年,包括去年的比賽和Master的網棋。我希望它對提高人類對圍棋的理解上有所貢獻。我把它看作是一個我們能使用的工具,為瞭讓柯潔和李世石這樣偉大的棋手們探索更多關於這個我們所熱愛的遊戲。
Q: 這次的AlphaGo是純凈版的AlphaGo嗎?也就是說,它是否是完全不依賴人類大師的棋譜來自我學習的?
Demis Hassabis: I’m not sure if I understand the question correctly, but… You know… obviously the version… AlphaGo initially learns from human games, and then…most of its learning now is from its own play against itself。 So…but of course to truly test what it knows, we have to play against human experts, because we don‘t know playing the game against itself is not going to expose its weaknesses, because it will obviously fix those during the self-play。 So we really have to test it against the world’s best players。
我不太確定我是否正確理解瞭這個問題。當然在最初的版本中,AlphaGo從人類棋譜中學習,後來到現在它大部分的學習材料都來自於自我對弈的棋譜。但是當然為瞭真正地測試它的所學,我們必須和人類高手對弈,因為我們不知道在自我對弈的過程中它是否會顯露出它的缺點,因為顯然它在自戰過程中會避開不足。所以我們必須和世界上最優秀的棋手們對弈以測試它。
David Silver: Perhaps I could just add to that。 One of the innovations of AlphaGo-Master, is that it actually relies much more on learning from itself。 So in this version, AlphaGo has actually become its own teacher, learning from moves which are taken from examples of its own searches, that relies much less actually on human data than previous versions。 And one of our goals in doing so is to make it more and more general so that its principal can be applied to other domains beyond Go。
我補充一下。AlphaGo-Master的一大創新就是它更多地依靠自我學習。在這個版本中,AlphaGo實際上成為瞭它自己的老師,從它自己的搜索中獲得的下法中學習,和上一個版本相比大幅減少瞭對人類棋譜的依賴。我們這樣做的目標之一就是是它變得更為通用,從而能被應用在圍棋以外的領域上。
Q:我想知道Master的版本是V25,那麼現在和柯潔對弈的AlphaGo是不是一個更新的版本?另外我想知道這是我們最後一次見到AlphaGo嗎?AlphaGo未來會成為一個工具,幫助職業棋手繼續提升自己的技術,還是從此就會和我們說再見?
David Silver: So maybe I can answer the first part to that question, regarding the technology inside AlphaGo。 So AlphaGo-Master is a new version of AlphaGo, and we worked very hard to improve the fundamental algorithm that is used in AlphaGo。 In fact, it turns out that the algorithm often matters more than the amount of data, or the amount of compute that actually goes into it。 And if you get the algorithms right to make them general and powerful enough, then they can really progress very rapidly。 So in fact in AlphaGo-Master, actually uses 10 times less computation, and is trained in match in weeks rather than months, compare to the version that played against Lee Sedol last year。 So it is a different version, and is at least in self-play performance considerably stronger。 And we are here to find out if indeed it’s stronger a重低音一定要裝電容嗎s it seems in self-play, or if it has weaknesses that can be exposed。
我可以回答問題的第一部分,關於AlphaGO內部的技術問題的。AlphaGo-Master是一個全新版本的AlphaGo,我們非常努力地工作,改進瞭AlphaGo的基礎算法。事實證明,算法常常比數據的多少或者運算汽車音響電容推薦力更重要。當你把算法弄對使它們足夠通用和強大,它們運行的速度是非常快的。所以事實上AlphaGo-Master用瞭和去年挑戰李世石的那個版本相比來說十分之一的計算能力,用瞭幾周在棋盤上訓練而不是幾個月。所以這是一個不同的版本,至少在自我對弈中它表現的更為強大瞭。我們來這裡就是為瞭看看它是否真的像在自戰中所表現的那樣強大,還是它依然存在能被暴露出來的弱點。
Demis Hassabis: And as far as the second part of the question, I’ll just answer that。 And later on in the event we will be announcing the next steps for AlphaGo。 So I don‘t want to say anything in advance of that, but we will be talking about that later in the week。 But one thing I want to say is that, just like with the last version of AlphaGo where we published all the technical details and results of the AlphaGo program in the Nature article, in the scientific journal Nature。 And we published all the details and that allowed other companies, you know… Tencent and Japanese companies, to make their own versions of AlphaGo, and some of them are very strong now as well, I’m sure you all know, playing online, probably 9 Dan level。 And we plan to publish more details of the new version of AlphaGo in the next few months。 So we will review those technical details, and then again other teams and academic labs will be able to implement their versions of this AlphaGo-Master architecture。
至於第二部分的問題,由我來回答。今後在這個峰會上我們會公佈AlphaGo的下一步計劃,所以在那之前我不想多說,我們會在這周稍後談到。但是有一件事是我想說的,我們在《自然》雜志中公佈瞭上一個版本AlphaGo的技術細節和成果,這允許瞭其他的公司,比如騰訊和一些日本公司開發瞭他們自己版本的AlphaGo,這些程序中有一些已經很強大瞭,我相信你們都知道,它們在網上下棋,有著大概9段的水平。我們也計劃在幾個月內公佈更多關於新版AlphaGo的技術細節。我們會回顧這些技術細節,然後其他的團隊和實驗室將會能夠再次構建他們自己的AlphaGo-Master框架。
Q: 今天AlphaGo使用瞭多少個GPU?柯潔今天的表現是否讓AlphaGo後臺的機器出現瞭發熱甚至運算力不足的情況?
David Silver: So the answer to the technical question is that AlphaGo actually in this match is playing on a single machine on the Google cloud。 So this is quite different to the computer that was used last year where we were using a distributed implementation that used many machines within the Google cloud。 Now because we have a much more powerful, efficient algorithm that works in a much better, simpler way, it is actually able to use more than a tenth of the computation to achieve stronger and even better results。 So AlphaGo is just playing on a single machine that will be available on the Google cloud to someone who has access to that。 And that machine is based on TPUs, which are Tensor Processing Units, they were announced by Google recently。
這個技術問題的回答是在這次比賽中,AlphaGo實際上是在谷歌雲端的單一一臺機器上運行的。這和去年我們使用的用上瞭谷歌雲端中多臺機器的分佈式結構有很大區別。因為現在我們有瞭一個運行起來更好、更簡單的更加強大、高效的算法,它能夠用十分之一的運算力來得到更強大甚至更好的結果。所以AlphaGo是運行在谷歌雲端上的一臺機器上的,任何有權限的人都能使用它。這臺機器是建立在TPU上的,也就是谷歌最近發佈的張量處理單元。
Demis Hassabis: So just to be clear, we are using 10 times less computation power roughly than for the Lee Sedol Match。
重點說明一下,和對李世石的版本相比,我們現在使用的是大約十分之一的運算力。
Q: 所以說這是一個單機版?
Demis Hassabis: Yes。
是的。
Q: 當越來越多頂尖棋手不願意和AlphaGo對弈時,我們是否會考慮到用AlphaGo和AlphaGo對弈?
Demis Hassabis: We want to use AlphaGo, as I said, as a tool for the Go community to improve their knowledge about the game。 We hope to, you know, release some details about the architecture we are using, maybe also some of the games that AlphaGo plays against itself。 So we maybe will make some announcement about this later in the week。 But don‘t forget, the reason, ultimately, we are developing these technologies is also to use them more widely in areas of science and medicine, and to try and help human experts in those areas。 So we have lot of work ahead of us in the coming years。
就像我所說的,我們希望AlphaGo會是一個供圍棋界提高他們對於這個遊戲的認知的工具。我們會公佈我們所使用的程序架構的細節,也可能還會公佈一些AlphaGo自我對弈的棋譜,這周稍後會正式宣佈。但是別忘瞭,我們發展這些科技的最終目的是為瞭在科學和醫學領域更廣闊地應用它們,也為瞭給人類專傢提供幫助。所以在接下來幾年我們還有很多工作要做。
AUGI SPORTS|重機車靴|重機車靴推薦|重機專用車靴|重機防摔鞋|重機防摔鞋推薦|重機防摔鞋
AUGI SPORTS|augisports|racing boots|urban boots|motorcycle boots
留言列表