青源TALK116期:WonderJourney:创造属于你的开放式三维世界

458次阅读
没有评论

你是否也曾好奇《爱丽丝梦游仙境》中的种种奇幻经历,但却难以仅从文字或插图中想象?在这次演讲中,我将介绍我们近期的工作,“WonderJourney”。从一张图片或一段文字出发,WonderJourney能合成一系列多样且自然连接的3D场景,让用户能够看到一个独特的“Wonderland”。WonderJourney是一个用于持续生成3D场景(Perpetual 3D scene generation)的模块化框架。与之前专注于单一场景类型的视角生成工作不同,我们从任何用户提供的位置(通过文本描述或图片)出发,生成一条穿越一系列多样但又连贯相接的3D场景的旅程。我们利用一个大型语言模型(LLM)来生成这次旅程中场景的文本描述,一个基于文本驱动的点云生成流程来创造引人入胜且连贯的3D场景序列,以及一个大型的视觉语言模型(VLM)来验证生成的场景。我们展示了各种场景类型和风格上引人注目、多样化的视觉结果,形成了想象中的“奇幻旅程“(”wonderjourney”)。

结果可以在项目网站上浏览:https://kovenyu.com/wonderjourney/

Have you ever wonder what Alice saw in her adventure in the Wonderland, but struggled to imagine it solely through the text or illustrations? In this talk, I will introduce “WonderJourney: Going from Anywhere to Everywhere”. From a single image or text, WonderJourney synthesizes a long series of diverse yet naturally connected 3D scenes, giving the user a unique experience of seeing a “wonderland”. WonderJourney is a modularized framework for perpetual 3D scene generation. Unlike prior work on view generation that focuses on a single type of scenes, we start at any user-provided location (by a text description or an image), and generate a journey through a long sequence of diverse yet coherently connected 3D scenes. We leverage an LLM to generate textual descriptions of the scenes in this journey, a text-driven point cloud generation pipeline to make a compelling and coherent sequence of 3D scenes, and a large VLM to verify the generated scenes. We show compelling, diverse visual results across various scene types and styles, forming imaginary “wonderjourneys”. See our results at: https://kovenyu.com/wonderjourney/

青源TALK116期:WonderJourney:创造属于你的开放式三维世界

俞洪兴(Hong-Xing “Koven” Yu),斯坦福大学四年级博士生,导师为吴佳俊教授。他的研究兴趣为机器感知,主要包括物理场景理解(physical scene understanding),动态模型(dynamics models),以及视觉生成模型(visual generative models)。他曾多次获得中国国家奖学金,斯坦福大学 SoE 奖学金,Qualcomm 奖学金,两次获得 Nvidia 奖学金提名,Meta 奖学金提名,以及 SIGGRAPH Asia 最佳论文奖。

关注俞洪兴:https://kovenyu.com/
 

Read More 

正文完
可以使用微信扫码关注公众号(ID:xzluomor)
post-qrcode
 
评论(没有评论)
Generated by Feedzy