ACT-1: Transformer for Actions

1,833次阅读
没有评论

AI has moved at an incredible pace in the last few years. Scaling up Transformers has led to remarkable capabilities in language (e.g., GPT-3, PaLM, Chinchilla), code (e.g., Codex, AlphaCode), and image generation (e.g., DALL-E, Imagen).

At Adept, we are building the next frontier of models that can take actions in the digital world—that’s why we’re excited to introduce our first large model, Action Transformer (ACT-1).

Why are we so excited about this?

First, we believe the clearest framing of general intelligence is a system that can do anything a human can do in front of a computer. A foundation model for actions, trained to use every software tool, API, and webapp that exists, is a practical path to this ambitious goal, and ACT-1 is our first step in this direction.

Second, the next era of computing will be defined by natural language interfaces that allow us to tell our computers what we want directly, rather than doing it by hand. We hope these snippets of ACT-1 will give you a window into the next frontier of computing as we see it!

Sign up here to join the waitlist for the upcoming alpha release of our first product built around ACT-1.

Capability preview

ACT-1 is a large-scale Transformer trained to use digital tools — among other things, we recently taught it how to use a web browser. Right now, it’s hooked up to a Chrome extension which allows ACT-1 to observe what’s happening in the browser and take certain actions, like clicking, typing, and scrolling, etc. The observation is a custom “rendering” of the browser viewport that’s meant to generalize across websites, and the action space is the UI elements available on the page.

There’s a lot of room to make it faster, both on the modeling side and on the software side – so we expect future systems will have latency that’s largely imperceptible to humans. These videos have been sped up to make them easier for you to view. An upcoming technical post will go into much more detail on all of these topics.

Here are some cool things ACT-1 can do!

ACT-1 can take a high-level user request and execute it. The user simply types a command into the text box and ACT-1 does the rest. In this example, this requires repeatedly taking actions and observations over a long time horizon to fulfill a single goal.

https://player.vimeo.com/video/749413832?h=15f094bbb9&title=0&byline=0&portrait=0

[fvplayer id=”2″]

This can be especially powerful for manual tasks and complex tools — in this example, what might ordinarily take 10+ clicks in Salesforce can be now done with just a sentence.

https://player.vimeo.com/video/749413804?h=15f094bbb9&title=0&byline=0&portrait=0

Working in-depth in tools like spreadsheets, ACT-1 demonstrates real-world knowledge, infers what we mean from context, and can help us do things we may not even know how to do.

https://player.vimeo.com/video/749413815?h=15f094bbb9&title=0&byline=0&portrait=0

The model can also complete tasks that require composing multiple tools together; most things we do on a computer span multiple programs. In the future, we expect ACT-1 to be even more helpful by asking for clarifications about what we want.

https://player.vimeo.com/video/749413825?h=15f094bbb9&title=0&byline=0&portrait=0

The internet contains a lot of knowledge about the world! When the model doesn’t know something, it knows how to just look up the information online (seen here in voice mode).

https://player.vimeo.com/video/749413798?h=15f094bbb9&title=0&byline=0&portrait=0

ACT-1 doesn’t know how to do everything, but it’s highly coachable. With 1 piece of human feedback, it can correct mistakes, becoming more useful with each interaction.

https://player.vimeo.com/video/749597375?h=15f094bbb9&title=0&byline=0&portrait=0

Looking ahead

Natural language interfaces, powered by action transformers like ACT-1, will dramatically expand what people can do in front of a computer/phone/internet-connected device. A few years from now, we believe:

  • Most interaction with computers will be done using natural language, not GUIs. We’ll tell our computer what to do, and it’ll do it. Today’s user interfaces will soon seem as archaic as landline phones do to smartphone users.
  • Beginners will become power users, no training required. Anyone who can articulate their ideas in language can implement them, regardless of expertise. Software will become even more powerful as advanced features become accessible to everyone and no longer constrained by the length of a drop-down menu.
  • Documentation, manuals, and FAQs will be for models, not for people. No longer will we need to learn the quirky language of every individual software tool in order to be effective at a task. We will never search through forums for “how to do X in Salesforce or Unity or Figma” — the model will do that work, allowing us to focus on the higher-order task at hand.
  • Breakthroughs across all fields will be accelerated with AI as our teammate. Action transformers will work with us to bring about advances in drug design, engineering, and more. Collaborating with these models will make us more efficient, energized, and creative.

While we’re excited that these systems can transform what people can do on a computer, we clearly see that they have the potential to cause harm if misused or misaligned with user preferences. Our goal is to build a company with large-scale human feedback at the center — models will be evaluated on how well they satisfy user preferences, and we will iteratively evaluate how well this is working as our product becomes more sophisticated and load-bearing. To combat misuse, we plan to use a combination of machine learning techniques and careful, staged deployment.

What we’ve shown above is only scratching the surface — we’re making great progress towards Adept being able to do arbitrary things on a computer. We have ambitious goals in both the short and long term, and we’re hiring visionary and talented people across roles to make it happen — you can apply here.

正文完
可以使用微信扫码关注公众号(ID:xzluomor)
post-qrcode
 0
评论(没有评论)

文心AIGC

2023 年 3 月
 12345
6789101112
13141516171819
20212223242526
2728293031  
文心AIGC
文心AIGC
人工智能ChatGPT,AIGC指利用人工智能技术来生成内容,其中包括文字、语音、代码、图像、视频、机器人动作等等。被认为是继PGC、UGC之后的新型内容创作方式。AIGC作为元宇宙的新方向,近几年迭代速度呈现指数级爆发,谷歌、Meta、百度等平台型巨头持续布局
文章搜索
热门文章
清库存!DeepSeek突然补全R1技术报告,训练路径首次详细公开

清库存!DeepSeek突然补全R1技术报告,训练路径首次详细公开

清库存!DeepSeek突然补全R1技术报告,训练路径首次详细公开 Jay 2026-01-08 20:18:...
训具身模型遇到的很多问题,在数据采集时就已经注定了丨鹿明联席CTO丁琰分享

训具身模型遇到的很多问题,在数据采集时就已经注定了丨鹿明联席CTO丁琰分享

训具身模型遇到的很多问题,在数据采集时就已经注定了丨鹿明联席CTO丁琰分享 衡宇 2026-01-08 20:...
「北京版幻方」冷不丁开源SOTA代码大模型!一张3090就能跑,40B参数掀翻Opus-4.5和GPT-5.2

「北京版幻方」冷不丁开源SOTA代码大模型!一张3090就能跑,40B参数掀翻Opus-4.5和GPT-5.2

「北京版幻方」冷不丁开源SOTA代码大模型!一张3090就能跑,40B参数掀翻Opus-4.5和GPT-5.2...
开源“裸考”真实世界,国产具身智能基座模型拿下全球第二!

开源“裸考”真实世界,国产具身智能基座模型拿下全球第二!

开源“裸考”真实世界,国产具身智能基座模型拿下全球第二! 西风 2026-01-08 19:02:20 来源:...
最新评论
ufabet ufabet มีเกมให้เลือกเล่นมากมาย: เกมเดิมพันหลากหลาย ครบทุกค่ายดัง
tornado crypto mixer tornado crypto mixer Discover the power of privacy with TornadoCash! Learn how this decentralized mixer ensures your transactions remain confidential.
ดูบอลสด ดูบอลสด Very well presented. Every quote was awesome and thanks for sharing the content. Keep sharing and keep motivating others.
ดูบอลสด ดูบอลสด Pretty! This has been a really wonderful post. Many thanks for providing these details.
ดูบอลสด ดูบอลสด Pretty! This has been a really wonderful post. Many thanks for providing these details.
ดูบอลสด ดูบอลสด Hi there to all, for the reason that I am genuinely keen of reading this website’s post to be updated on a regular basis. It carries pleasant stuff.
Obrazy Sztuka Nowoczesna Obrazy Sztuka Nowoczesna Thank you for this wonderful contribution to the topic. Your ability to explain complex ideas simply is admirable.
ufabet ufabet Hi there to all, for the reason that I am genuinely keen of reading this website’s post to be updated on a regular basis. It carries pleasant stuff.
ufabet ufabet You’re so awesome! I don’t believe I have read a single thing like that before. So great to find someone with some original thoughts on this topic. Really.. thank you for starting this up. This website is something that is needed on the internet, someone with a little originality!
ufabet ufabet Very well presented. Every quote was awesome and thanks for sharing the content. Keep sharing and keep motivating others.
热评文章
悲报!Stack Overflow彻底凉了,比18年前上线首月问题数量还少

悲报!Stack Overflow彻底凉了,比18年前上线首月问题数量还少

悲报!Stack Overflow彻底凉了,比18年前上线首月问题数量还少 闻乐 2026-01-05 19:...
全自主、更好用!北京人形 “干活机器人” 惊艳亮相 CES2026

全自主、更好用!北京人形 “干活机器人” 惊艳亮相 CES2026

全自主、更好用!北京人形 “干活机器人” 惊艳亮相 CES2026 量子位的朋友们 2026-01-06 16...
港科大教授实测AI眼镜“作弊”:30分钟碾压95%的学生,把传统教学评估体系整破防了

港科大教授实测AI眼镜“作弊”:30分钟碾压95%的学生,把传统教学评估体系整破防了

港科大教授实测AI眼镜“作弊”:30分钟碾压95%的学生,把传统教学评估体系整破防了 梦瑶 2026-01-0...
海信CES发布全新一代RGB-Mini LED,全球首创玲珑4芯真彩背光

海信CES发布全新一代RGB-Mini LED,全球首创玲珑4芯真彩背光

海信CES发布全新一代RGB-Mini LED,全球首创玲珑4芯真彩背光 量子位的朋友们 2026-01-06...