谷歌 | Transformer架构扩展，记忆Transformer

推特热议

https://twitter.com/nearcyan/status/1637891562385317897

https://twitter.com/amasad/status/1637911012774141953

标题：Memorizing Transformers

Yuhuai Wu, Markus N. Rabe, DeLesley Hutchins, Christian Szegedy

[google]

本文提出了Transformer架构的一个简单扩展，称为kNN-augmented attention，它通过使用k-nearest-neighbor lookup进入一个大的外部存储器，极大地增加了语言模型可以关注的上下文长度。

在一系列语言建模实验中证明了外部存储器的有效性，这些实验涉及各种长文档数据集，包括LaTeX文档、源代码、形式证明和书籍。在研究的所有数据集和架构中，记忆转化器显示出比基线有很大的改进；它可以与拥有5倍参数数量的虚构转化器相媲美。

尽管有一个收益递减点，但困惑度继续随着内存大小的增加而改善。此外，即使Transformer的参数从200M扩展到8B，外部存储器也会继续提供好处。也许最耐人寻味的是，记忆Transformer不需要从头开始预训练；通过向现有的预训练模型添加记忆，然后对其进行微调，有可能获得巨大的收益。与其他形式的关注不同，kNN检索可以很容易地扩展到巨大的内存规模，因此有可能利用庞大的知识库或代码库。

语言模型通常需要进行训练或微调以获得新的知识，这涉及到更新其权重。相反，论文设想的语言模型可以在推理时简单地阅读和记忆新数据，从而立即获得新知识。

在这项工作中，用记忆过去输入的内部表征的能力来扩展语言模型。证明了近似的kNN查找最近的（键，值）对的无差别记忆，改善了语言模型的各种基准和任务，包括通用网络文本（C4），数学论文（arXiv），书籍（PG-19），代码（Github），以及正式定理（Isabelle）。

实验表明，当我们把内存的大小增加到262K tokens时，性能会稳步提高。在包括代码和数学的基准测试中，发现该模型能够在测试时间内利用新定义的函数和定理。

论文地址：https://arxiv.org/pdf/2203.08913.pdf

谷歌 | Transformer架构扩展，记忆Transformer

ufabet มีเกมให้เลือกเล่นมากมาย: เกมเดิมพันหลากหลาย ครบทุกค่ายดัง

tornado crypto mixer Discover the power of privacy with TornadoCash! Learn how this decentralized mixer ensures your transactions remain confidential.

ดูบอลสด Very well presented. Every quote was awesome and thanks for sharing the content. Keep sharing and keep motivating others.

ดูบอลสด Pretty! This has been a really wonderful post. Many thanks for providing these details.

ดูบอลสด Hi there to all, for the reason that I am genuinely keen of reading this website’s post to be updated on a regular basis. It carries pleasant stuff.

Obrazy Sztuka Nowoczesna Thank you for this wonderful contribution to the topic. Your ability to explain complex ideas simply is admirable.

ufabet Hi there to all, for the reason that I am genuinely keen of reading this website’s post to be updated on a regular basis. It carries pleasant stuff.

ufabet You’re so awesome! I don’t believe I have read a single thing like that before. So great to find someone with some original thoughts on this topic. Really.. thank you for starting this up. This website is something that is needed on the internet, someone with a little originality!

ufabet Very well presented. Every quote was awesome and thanks for sharing the content. Keep sharing and keep motivating others.

谷歌 | Transformer架构扩展，记忆Transformer

模型“看视频写网页”，GPT-5仅36.35分！首个video2code基准发布

人类遗忘的难题解法，被GPT-5重新找出来了

Vidu Q2携「王炸」登场！杀手锏「参考生」功能全球上线，APP体验全面革新

2025全球人工智能+大会将于11月15日至17日在京启幕

阿里发布Qoder CLI，可在终端一键实现AI编程

马斯克从英伟达挖人做AI游戏！第一步：研发世界模型

Meta「分割一切」3.0曝光！技能语义分割加入概念提示，好好玩，要爆了

刚得诺奖的成果被做成芯片了

拒绝“熵崩塌”和“熵爆炸”！这项研究让大模型推理成绩飙升

Sora2“复活”已故名人，家属强烈反对