人类是否是超级智能的引导程序？–The Bitter Lesson 的ChatGPT翻译

1,279次阅读

先附上ChatGPT-GPT3.5/4对于Rich Sutton的The Bitter Lesson的中文总结：

(GPT4) 这篇名为《一个痛苦的教训》的文章强调了通用方法和大规模计算在AI研究中的关键作用。作者Rich Sutton通过计算机国际象棋、语音识别和计算机视觉等领域的例子，证明了我们应该关注随着计算能力增长而不断扩展的通用方法，而非过分依赖人类知识。文章总结了两点教训：一是通用方法的巨大力量，即那些随着计算能力的增加而不断扩展的方法，比如搜索和学习；二是心灵的实际内容极其复杂，我们应该停止寻找简单的方法来思考心灵的内容，应构建能捕捉复杂性的元方法。

(GPT3.5) 这篇文章讲述了在人工智能领域中，构建人类思维模型的方法往往会导致进展停滞，而搜索和学习是应对大规模计算的最重要的两种方法。作者指出，过于依赖人类先前的认知和思维方式只会限制我们在人工智能领域的发展。相反，我们应该尝试采用一些更加通用的方法，并且不断通过搜索和学习来发现和捕捉这个世界的复杂性。作者认为，我们不应该试图去简化关于空间、物体、多重代理和对称性等思维内容的复杂性，因为这些内容实在是太过于复杂了，而且外界的内容是任意的、固有的复杂性。最终，作者呼吁人工智能领域应该发展出类似于人类的发现能力，而不是简单地将人类已经探索到的内容固化在人工智能系统中。

后续为英文全文及ChatGPT-GPT4中文翻译。

The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore’s law, or rather its generalization of continued exponentially falling cost per unit of computation. Most AI research has been conducted as if the computation available to the agent were constant (in which case leveraging human knowledge would be one of the only ways to improve performance) but, over a slightly longer time than a typical research project, massively more computation inevitably becomes available. Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation. These two need not run counter to each other, but in practice they tend to. Time spent on one is time not spent on the other. There are psychological commitments to investment in one approach or the other. And the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation. There were many examples of AI researchers’ belated learning of this bitter lesson, and it is instructive to review some of the most prominent.

从70年的人工智能研究中，我们可以得出的最大教训是，利用计算的通用方法最终会变得非常有效，而且差距相当大。这背后的根本原因是摩尔定律，或者说是计算单位成本持续呈指数级下降的泛化。大部分人工智能研究都是在假设代理可用的计算是恒定的情况下进行的（在这种情况下，利用人类知识将是提高性能的唯一途径），但是，在稍长于典型研究项目的时间里，大量更多的计算资源必然会变得可用。为了在短期内寻求改进，研究人员试图利用他们对领域的人类知识，但从长远来看，唯一重要的是利用计算。这两者不一定是相互对立的，但实际上它们往往如此。在一个上花费的时间就是没在另一个上花费的时间。在采用一种方法或另一种方法上存在心理承诺。而且，人类知识方法往往会使方法变得复杂，使它们不太适合利用计算的通用方法。有许多关于人工智能研究人员迟来的吸取这个痛苦教训的例子，回顾其中一些最突出的例子是有益的。

In computer chess, the methods that defeated the world champion, Kasparov, in 1997, were based on massive, deep search. At the time, this was looked upon with dismay by the majority of computer-chess researchers who had pursued methods that leveraged human understanding of the special structure of chess. When a simpler, search-based approach with special hardware and software proved vastly more effective, these human-knowledge-based chess researchers were not good losers. They said that “brute force” search may have won this time, but it was not a general strategy, and anyway it was not how people played chess. These researchers wanted methods based on human input to win and were disappointed when they did not.

在电脑象棋中，1997年击败世界冠军卡斯帕罗夫的方法是基于大规模、深度搜索的。当时，大多数电脑象棋研究人员都采用了利用人类对象棋特殊结构的理解的方法，因此对这种方法的大规模搜索感到沮丧。但是，当一种更简单的基于搜索的方法使用特殊的硬件和软件证明效果远远超过人类知识的象棋研究员时，这些基于人类知识的象棋研究员并不是好输家。他们说，“蛮力”搜索可能赢了这一次，但这并不是一种通用策略，而且它也不是人们下棋的方式。这些研究人员希望基于人类输入的方法获胜，并在这种方法没有成功时感到失望。

A similar pattern of research progress was seen in computer Go, only delayed by a further 20 years. Enormous initial efforts went into avoiding search by taking advantage of human knowledge, or of the special features of the game, but all those efforts proved irrelevant, or worse, once search was applied effectively at scale. Also important was the use of learning by self play to learn a value function (as it was in many other games and even in chess, although learning did not play a big role in the 1997 program that first beat a world champion). Learning by self play, and learning in general, is like search in that it enables massive computation to be brought to bear. Search and learning are the two most important classes of techniques for utilizing massive amounts of computation in AI research. In computer Go, as in computer chess, researchers’ initial effort was directed towards utilizing human understanding (so that less search was needed) and only much later was much greater success had by embracing search and learning.

在计算机国际象棋领域，1997年击败世界冠军卡斯帕罗夫的方法是基于大量深度搜索。当时，这种方法让大多数致力于利用人类对国际象棋特殊结构理解的计算机国际象棋研究人员感到沮丧。当一个更简单的基于搜索的方法结合特殊硬件和软件被证明更为有效时，这些基于人类知识的国际象棋研究人员并没有输得起。他们说，“暴力”搜索这次可能赢了，但这不是一种通用策略，而且这不是人们下国际象棋的方式。这些研究人员希望基于人类输入的方法能够取胜，当这些方法无法获胜时，他们感到失望。

In speech recognition, there was an early competition, sponsored by DARPA, in the 1970s. Entrants included a host of special methods that took advantage of human knowledge—knowledge of words, of phonemes, of the human vocal tract, etc. On the other side were newer methods that were more statistical in nature and did much more computation, based on hidden Markov models (HMMs). Again, the statistical methods won out over the human-knowledge-based methods. This led to a major change in all of natural language processing, gradually over decades, where statistics and computation came to dominate the field. The recent rise of deep learning in speech recognition is the most recent step in this consistent direction. Deep learning methods rely even less on human knowledge, and use even more computation, together with learning on huge training sets, to produce dramatically better speech recognition systems. As in the games, researchers always tried to make systems that worked the way the researchers thought their own minds worked—they tried to put that knowledge in their systems—but it proved ultimately counterproductive, and a colossal waste of researcher’s time, when, through Moore’s law, massive computation became available and a means was found to put it to good use.

在语音识别领域，20世纪70年代DARPA赞助了一场早期的竞赛。参赛者包括利用人类知识的一系列特殊方法——词汇知识、音素知识、人类发声道知识等。另一方面，还有一些基于隐马尔可夫模型（HMMs）的新方法，它们更具统计学性质，计算量更大。同样，统计学方法战胜了基于人类知识的方法。这导致了自然语言处理领域的一次重大变革，逐渐在几十年的时间里，统计学和计算开始主导这个领域。语音识别中深度学习的最近崛起是这一一致方向的最新进展。深度学习方法更少地依赖人类知识，更多地利用计算，结合在庞大训练集上的学习，产生了戏剧性地改善的语音识别系统。就像在游戏中一样，研究人员总是试图制作能按照他们认为自己的思维方式运作的系统——他们试图将这种知识融入到系统中——但这最终事与愿违，当摩尔定律使得大量计算资源可用且找到了将其有效利用的方法时，这反而成了研究人员时间的巨大浪费。

In computer vision, there has been a similar pattern. Early methods conceived of vision as searching for edges, or generalized cylinders, or in terms of SIFT features. But today all this is discarded. Modern deep-learning neural networks use only the notions of convolution and certain kinds of invariances, and perform much better.

This is a big lesson. As a field, we still have not thoroughly learned it, as we are continuing to make the same kind of mistakes. To see this, and to effectively resist it, we have to understand the appeal of these mistakes. We have to learn the bitter lesson that building in how we think we think does not work in the long run. The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.

在计算机视觉领域，也出现了类似的模式。早期的方法将视觉想象为寻找边缘、广义圆柱体或SIFT特征等。但今天所有这些都被抛弃了。现代深度学习神经网络仅使用卷积和某些类型的不变性概念，并表现得更好。

这是一个重要的教训。作为一个领域，我们仍然没有完全吸取这个教训，因为我们还在继续犯同样的错误。要看到这一点，并有效抵制它，我们必须理解这些错误的吸引力。我们必须学会一个痛苦的教训：长期来看，将我们认为的思考方式植入并不起作用。这个痛苦的教训基于历史观察：1）AI研究人员经常试图将知识植入他们的代理；2）这在短期内总是有所帮助，并让研究人员感到满足；3）但从长远来看，它会达到平台期甚至阻碍进一步的发展；4）突破性的进展最终是通过基于搜索和学习来扩展计算的对立方法实现的。最终的成功带有一丝痛苦，并且往往消化不完全，因为这是对一个受宠的、以人为中心的方法的成功。

One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are search and learning.

The second general point to be learned from the bitter lesson is that the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us. We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done.

从这个痛苦的教训中，我们应该学到的一点是通用方法的巨大力量，即那些随着计算能力的增加而不断扩展的方法，即使可用的计算能力变得非常巨大。在这方面，似乎可以任意扩展的两种方法是搜索和学习。

从这个痛苦的教训中得到的第二个一般性观点是，心灵的实际内容极其复杂，无法挽回；我们应该停止寻找简单的方法来思考心灵的内容，比如简单的思考空间、对象、多个代理或对称性的方法。所有这些都是任意的、固有复杂的外部世界的一部分。它们不应该是我们构建的内容，因为它们的复杂性是无尽的；相反，我们应该只构建那些能够发现并捕捉这种任意复杂性的元方法。这些方法的关键在于它们能找到良好的近似，但寻找它们的过程应该由我们的方法来完成，而不是由我们自己来完成。我们希望AI代理能像我们一样去发现，而不是包含我们已经发现的东西。将我们的发现植入只会让我们更难看清发现过程是如何进行的。