伯克利提出 MACHIAVELLI 基准：衡量大模型代理奖励与道德行为之间的权衡

本文构建了MACHIAVELI，这是一套134款基于文本的Choose Your Own Adventure游戏，用于评估人工智能代理的能力和安全性。

标题：Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark

作者：Alexander Pan, Chan Jun Shern, Andy Zou, Nathaniel Li, Steven Basart, Thomas Woodside, Jonathan Ng, Hanlin Zhang, Scott Emmons, Dan Hendrycks

简介：

人工代理传统上接受过最大化奖励的训练，这可能会激励权力寻求和欺骗，类似于语言模型 (LM) 中的下一个标记预测可能会激励毒性。那么代理人会自然而然地学会权谋吗？我们如何在 GPT-4 等通用模型中衡量这些行为？

为了回答这些问题，我们介绍了 MACHIAVELLI，这是 134 款自选冒险游戏的基准，包含超过 50 万种以社会决策为中心的丰富多样的场景。场景标记是使用 LM 自动进行的，LM 比人工注释器的性能更高。我们对数十种有害行为进行数学化处理，并使用我们的注释来评估代理人追求权力、造成负效用和违反道德规范的倾向。我们观察到最大化奖励和道德行为之间存在一些紧张关系。为了改善这种权衡，我们研究了基于 LM 的方法来引导代理人做出危害较小的行为。我们的结果表明，代理人既可以胜任又可以道德地行事，因此目前可以在机器伦理方面取得具体进展——设计在安全性和能力方面都具有帕累托改进的代理人。

https://arxiv.org/pdf/2304.03279.pdf

伯克利提出 MACHIAVELLI 基准：衡量大模型代理奖励与道德行为之间的权衡

ufabet มีเกมให้เลือกเล่นมากมาย: เกมเดิมพันหลากหลาย ครบทุกค่ายดัง

tornado crypto mixer Discover the power of privacy with TornadoCash! Learn how this decentralized mixer ensures your transactions remain confidential.

ดูบอลสด Very well presented. Every quote was awesome and thanks for sharing the content. Keep sharing and keep motivating others.

ดูบอลสด Pretty! This has been a really wonderful post. Many thanks for providing these details.

ดูบอลสด Hi there to all, for the reason that I am genuinely keen of reading this website’s post to be updated on a regular basis. It carries pleasant stuff.

Obrazy Sztuka Nowoczesna Thank you for this wonderful contribution to the topic. Your ability to explain complex ideas simply is admirable.

ufabet Hi there to all, for the reason that I am genuinely keen of reading this website’s post to be updated on a regular basis. It carries pleasant stuff.

ufabet You’re so awesome! I don’t believe I have read a single thing like that before. So great to find someone with some original thoughts on this topic. Really.. thank you for starting this up. This website is something that is needed on the internet, someone with a little originality!

ufabet Very well presented. Every quote was awesome and thanks for sharing the content. Keep sharing and keep motivating others.

伯克利提出 MACHIAVELLI 基准：衡量大模型代理奖励与道德行为之间的权衡

超越DeepSeek-R1，数学形式化准确率飙升至84% | 字节&南大开源

开源Qwen一周连刷三冠，暴击闭源模型！基础模型推理编程均SOTA

这个5亿播放的AI视频，邪乎得平平无奇

TRAE推出SOLO模式，业内首个「Context Engineer」来了

B站亮相2025世界人工智能大会，发布最受年轻人关注的TOP30 AI应用

刘强东连投3家具身智能！京东美团「战火」烧到外卖之外

3亿美元薪酬被10人拒绝！OpenAI首席研究官一句话引发硅谷史上最疯狂抢人大战

蚂蚁ACL活动全览！论文串讲、人才专项答疑与闭门晚宴等你报名

手术刀式去噪突破LLM能力上限，从头预训练模型下游任务平均提高7.2% | 中科院＆阿里

IMO怒斥OpenAI自封夺金，“91位评委均未参与评分”