使用数字分解评估变压器语言模型在算术操作上的表现

Evaluating Transformer Language Models on Arithmetic Operations Using
Number Decomposition

解决问题：本文旨在评估Transformer语言模型在算术运算中的表现，并探讨如何通过数字分解来提高其性能。同时，该论文也试图解决当前大语言模型在需要一定推理能力的任务中表现出的困难。

关键思路：本文提出了一种在进行计算之前将数字分解为单位、十位等级别的方法，并将其用于Transformer语言模型的微调。实验结果表明，使用这种分解方法进行微调可以显著提高模型在加法任务中的准确性，比如在五位数字加法任务中，准确性提高了63%。与当前领域的研究相比，本文的创新点在于引入了数字分解这一步骤来提高模型性能。

其他亮点：本文的实验设计合理，使用了GPT-3相同的测试集进行比较。作者还探讨了数字分解对模型性能的重要性，并且在没有数字分解的情况下，同样的语言模型在五位数字加法任务中的准确率为0%。本文没有开源代码。

关于作者：Matteo Muffo, Aldo Cocco, Enrico Bertino均来自意大利都灵理工大学。他们之前的代表作包括：Matteo Muffo在ICASSP 2021上发表了一篇题为”Unsupervised Domain Adaptation for Speaker Verification with Variational Autoencoders”的论文；Aldo Cocco在ACM Transactions on Internet Technology上发表了一篇题为”An Efficient and Secure Key Management Scheme for Cloud-Based IoT”的论文；Enrico Bertino在IEEE Transactions on Dependable and Secure Computing上发表了一篇题为”Secure and Efficient Data Transmission for IoT Using Blockchain and Physical Layer Security”的论文。

相关研究：最近的相关研究包括：1）”Improving Arithmetic Word Problem Solvers with External Knowledge and Self-Training”，作者：Yan Wang，机构：University of Illinois at Urbana-Champaign；2）”Improving the Generalization of Neural Models for Arithmetic Word Problems”，作者：Yan Wang，机构：University of Illinois at Urbana-Champaign。

论文摘要：本文评估了Transformer语言模型在算术运算中使用数字分解的能力。近年来，像GPT-3这样的大型语言模型在零和少样本设置下执行自然语言处理任务表现出了卓越的能力。然而，实验表明，GPT-3在执行需要一定推理能力的任务（如算术运算）方面存在困难。在本文中，我们评估了Transformer语言模型在执行算术运算时的能力，采用了一种在进行计算之前将数字分解为单位、十位等的流程。我们将使用这种流程进行微调的模型称为Calculon，并在与GPT-3相同的测试集上测试它们在执行加法、减法和乘法任务方面的表现。结果显示，在五位数加法任务中，准确性提高了63％。此外，我们还证明了引入分解流程的重要性，因为在没有分解数字的情况下对相同语言模型进行微调会导致在五位数加法任务中准确性为0％。

ufabet มีเกมให้เลือกเล่นมากมาย: เกมเดิมพันหลากหลาย ครบทุกค่ายดัง

tornado crypto mixer Discover the power of privacy with TornadoCash! Learn how this decentralized mixer ensures your transactions remain confidential.

ดูบอลสด Very well presented. Every quote was awesome and thanks for sharing the content. Keep sharing and keep motivating others.

ดูบอลสด Pretty! This has been a really wonderful post. Many thanks for providing these details.

ดูบอลสด Hi there to all, for the reason that I am genuinely keen of reading this website’s post to be updated on a regular basis. It carries pleasant stuff.

Obrazy Sztuka Nowoczesna Thank you for this wonderful contribution to the topic. Your ability to explain complex ideas simply is admirable.

ufabet Hi there to all, for the reason that I am genuinely keen of reading this website’s post to be updated on a regular basis. It carries pleasant stuff.

ufabet You’re so awesome! I don’t believe I have read a single thing like that before. So great to find someone with some original thoughts on this topic. Really.. thank you for starting this up. This website is something that is needed on the internet, someone with a little originality!

ufabet Very well presented. Every quote was awesome and thanks for sharing the content. Keep sharing and keep motivating others.

使用数字分解评估变压器语言模型在算术操作上的表现

AI青年学霸齐聚杭州！这场峰会要选出「未来科学新星」

李飞飞空间智能独角兽开源底层技术！AI生成3D世界在所有设备流畅运行

终于！全球爆火AI视频神器PixVerse发布国内版——拍我AI

双重突破：全球首个零售VLA大模型来了！开源OpenWBT让机器人遥操门槛暴降！

挑战强化学习后训练霸权！全新无监督方法仅需1条数据+10步优化

通义灵码AI IDE上线，深度适配Qwen3，首创自动记忆功能

GPT-4o-Image仅完成28.9%任务！上海AI实验室等发布图像编辑新基准，360道人类专家严选难题

华为攻克AI推理「想太多」问题！新方法让大模型推理提速60%，准确率还高了

最新一期权威大模型榜单：豆包1.5、商汤日日新V6并列国内第一

每2秒吃透一道高数大题！华为终于揭秘准万亿MoE昇腾训练系统全流程