智源LIVE59期：当软工遇上NLP，代码大模型综述

近年来，基于 Transformer 的语言模型在自然语言处理中取得了巨大成功，而程序语言作为一种特殊的自然语言，也已被广泛使用语言模型进行建模。我们的工作对基于语言模型的代码处理与生成进行系统性调研，覆盖超过50个模型、30个下游任务、170个数据集，以及700篇相关工作。

我们系统梳理使用人工智能技术处理代码的历史 – 从 n-gram 到 RNN 到 Transformer，并深入讨论近期 NLP 与软工两个学科呈现出的交叉融合趋势。NLP 中的最新技术，包括指令微调、强化学习、数据工程以及模型架构的改进等，已经被广泛应用于代码处理，而软件工程中的的各下游任务也为大语言模型提出了新的挑战与应用机会。如何将程序语言独有的特征，包括抽象语法树、数据流、控制流、编译器中间表示等无缝融合进大语言模型中，是当下面临的一个关键挑战。

Transformer based language models have achieved huge success in natural language processing, and have been subsequently applied to code processing – a special kind of natural language. In this work we systematically review the recent advancements in code processing with language models, covering 50+ models, 30+ evaluations tasks, 170+ datasets, and 700 related works.

We review the history of AI application in code processing – from n-gram to RNN, and lately to Transformer, which is exactly the historical course of NLP. We provide a unified view between NLP and software engineering, observing that advanced topics from NLP have been recently introduced into code processing, including instruction tuning, reinforcement learning, data engineering, and architectural improvements, while downstream tasks in software engineering are in return posing challenges for LLMs and driving them forward into production. It remains a key challenge to seamlessly integrate code-specific features, such as abstract syntax tree, control flow graph, data flow graph, and intermediate representation into LLMs.

https://github.com/codefuse-ai/Awesome-Code-LLM

https://simg.baai.ac.cn/paperfile/e3d3c624-dc64-4fa1-9bdd-f67f2964781b.pdf

智源LIVE59期：当软工遇上NLP，代码大模型综述

张子殷，上海交通大学计算机系本科、硕士在读，主要研究自然语言处理方向，目前在蚂蚁集团学术实习。

ufabet มีเกมให้เลือกเล่นมากมาย: เกมเดิมพันหลากหลาย ครบทุกค่ายดัง

tornado crypto mixer Discover the power of privacy with TornadoCash! Learn how this decentralized mixer ensures your transactions remain confidential.

ดูบอลสด Very well presented. Every quote was awesome and thanks for sharing the content. Keep sharing and keep motivating others.

ดูบอลสด Pretty! This has been a really wonderful post. Many thanks for providing these details.

ดูบอลสด Hi there to all, for the reason that I am genuinely keen of reading this website’s post to be updated on a regular basis. It carries pleasant stuff.

Obrazy Sztuka Nowoczesna Thank you for this wonderful contribution to the topic. Your ability to explain complex ideas simply is admirable.

ufabet Hi there to all, for the reason that I am genuinely keen of reading this website’s post to be updated on a regular basis. It carries pleasant stuff.

ufabet You’re so awesome! I don’t believe I have read a single thing like that before. So great to find someone with some original thoughts on this topic. Really.. thank you for starting this up. This website is something that is needed on the internet, someone with a little originality!

ufabet Very well presented. Every quote was awesome and thanks for sharing the content. Keep sharing and keep motivating others.

智源LIVE59期：当软工遇上NLP，代码大模型综述

n8n实战：Webhook、条件判断与API集成详解

国产GPU历史性时刻！摩尔线程、沐曦同日获IPO受理

让创新创造被更多看见，2025Inclusion·外滩大会创新者舞台全球征集正式启动！

华为又开源了个大的：超大规模MoE推理秘籍

OpenAI华人AI大牛集体跳槽Meta！清华北大浙大中科大校友各一位，多模态后训练、感知团队负责人全走了

Labubu后，一款AI毛球潮玩火了：朱啸虎押注，定价399元开售就卖爆

航空发动机用上大模型：解决复杂时序问题，性能超越ChatGPT-4o实现SOTA｜上交创智复旦

小扎千亿挖人名单下一位：硅谷华人AI高管第一人

拯救P图废柴，阿里上新多模态模型Qwen-VLo！人人免费可玩

OpenAI华人AI大牛集体跳槽Meta！清华北大浙大中科大校友各一位，多模态后训练、感知团队负责人全走了