llama.cpp 的載入速度加速

1,397次阅读
没有评论

llama.cpp 的載入速度加速

Hacker News 上看到「Llama.cpp 30B runs with only 6GB of RAM now (github.com/ggerganov)」這個消息,原 pull request 在「Make loading weights 10-100x faster #613」這邊。

這個 PR 的作者 Justine Tunney 在 PR 上有提到他改變 model 檔案格式,以便改用 mmap(),大幅降低了需要預先讀取的時間 (因為變成 lazy-loading style),而且這也讓系統可以利用 cache page,避免了 double buffering 的問題:

This was accomplished by changing the file format so we can mmap() weights directly into memory without having to read() or copy them thereby ensuring the kernel can make its file cache pages directly accessible to our inference processes; and secondly, that the file cache pages are much less likely to get evicted (which would force loads to hit disk) because they’re no longer competing with memory pages that were needlessly created by gigabytes of standard i/o.

這讓我想到在資料庫領域中,PostgreSQL 也會用 mmap() 操作,有點類似的概念。

另外 Justine Tunney 在這邊的 comment 有提到一個意外觀察到的現象,他發現實際在計算的時候用到的 model 內容意外的少:他用一個簡單的 prompt 測試,發現 20GB 的 30B model 檔案在他的 Intel 機器上實際只用到了 1.6GB 左右:

If I run 30B on my Intel machine:

[…]

As we can see, 400k page faults happen, which means only 1.6 gigabytes ((411522 * 4096) / (1024 * 1024)) of the 20 gigabyte weights file actually needed to be used.

這點他還在懷疑是不是他的修改有 bug,但目前他覺得不太像,也看不太出來:

Now, since my change is so new, it’s possible my theory is wrong and this is just a bug. I don’t actually understand the inner workings of LLaMA 30B well enough to know why it’s sparse. Maybe we made some kind of rare mistake where llama.cpp is somehow evaluating 30B as though it were the 7B model. Anything’s possible, however I don’t think it’s likely. I was pretty careful in writing this change, to compare the deterministic output of the LLaMA model, before and after the Git commit occurred. I haven’t however actually found the time to reconcile the output of LLaMA C++++ with something like PyTorch. It’d be great if someone could help with that, and possibly help us know why, from more a data science (rather than systems engineering perspective) why 30B is sparse.

如果不是 bug 的話,這其實冒出了一個很有趣的訊號,表示這些 model 是有可能再瘦身的?

Read More 

正文完
可以使用微信扫码关注公众号(ID:xzluomor)
post-qrcode
 0
评论(没有评论)

文心AIGC

2023 年 4 月
 12
3456789
10111213141516
17181920212223
24252627282930
文心AIGC
文心AIGC
人工智能ChatGPT,AIGC指利用人工智能技术来生成内容,其中包括文字、语音、代码、图像、视频、机器人动作等等。被认为是继PGC、UGC之后的新型内容创作方式。AIGC作为元宇宙的新方向,近几年迭代速度呈现指数级爆发,谷歌、Meta、百度等平台型巨头持续布局
文章搜索
热门文章
潞晨尤洋:日常办公没必要上私有模型,这三类企业才需要 | MEET2026

潞晨尤洋:日常办公没必要上私有模型,这三类企业才需要 | MEET2026

潞晨尤洋:日常办公没必要上私有模型,这三类企业才需要 | MEET2026 Jay 2025-12-22 09...
面向「空天具身智能」,北航团队提出星座规划新基准丨NeurIPS’25

面向「空天具身智能」,北航团队提出星座规划新基准丨NeurIPS’25

面向「空天具身智能」,北航团队提出星座规划新基准丨NeurIPS’25 鹭羽 2025-12-13 22:37...
商汤Seko2.0重磅发布,合作短剧登顶抖音AI短剧榜No.1

商汤Seko2.0重磅发布,合作短剧登顶抖音AI短剧榜No.1

商汤Seko2.0重磅发布,合作短剧登顶抖音AI短剧榜No.1 十三 2025-12-15 14:13:14 ...
反超Nano Banana!OpenAI旗舰图像生成模型上线

反超Nano Banana!OpenAI旗舰图像生成模型上线

反超Nano Banana!OpenAI旗舰图像生成模型上线 Jay 2025-12-17 10:25:43 ...
OpenAI突然开源新模型!99.9%的权重是0,新稀疏性方法代替MoE

OpenAI突然开源新模型!99.9%的权重是0,新稀疏性方法代替MoE

OpenAI突然开源新模型!99.9%的权重是0,新稀疏性方法代替MoE 闻乐 2025-12-14 14:2...
最新评论
ufabet ufabet มีเกมให้เลือกเล่นมากมาย: เกมเดิมพันหลากหลาย ครบทุกค่ายดัง
tornado crypto mixer tornado crypto mixer Discover the power of privacy with TornadoCash! Learn how this decentralized mixer ensures your transactions remain confidential.
ดูบอลสด ดูบอลสด Very well presented. Every quote was awesome and thanks for sharing the content. Keep sharing and keep motivating others.
ดูบอลสด ดูบอลสด Pretty! This has been a really wonderful post. Many thanks for providing these details.
ดูบอลสด ดูบอลสด Pretty! This has been a really wonderful post. Many thanks for providing these details.
ดูบอลสด ดูบอลสด Hi there to all, for the reason that I am genuinely keen of reading this website’s post to be updated on a regular basis. It carries pleasant stuff.
Obrazy Sztuka Nowoczesna Obrazy Sztuka Nowoczesna Thank you for this wonderful contribution to the topic. Your ability to explain complex ideas simply is admirable.
ufabet ufabet Hi there to all, for the reason that I am genuinely keen of reading this website’s post to be updated on a regular basis. It carries pleasant stuff.
ufabet ufabet You’re so awesome! I don’t believe I have read a single thing like that before. So great to find someone with some original thoughts on this topic. Really.. thank you for starting this up. This website is something that is needed on the internet, someone with a little originality!
ufabet ufabet Very well presented. Every quote was awesome and thanks for sharing the content. Keep sharing and keep motivating others.
热评文章
交大高金朱宁:经济学家视角下AI时代的范式思维转变 | MEET2026

交大高金朱宁:经济学家视角下AI时代的范式思维转变 | MEET2026

交大高金朱宁:经济学家视角下AI时代的范式思维转变 | MEET2026 西风 2025-12-13 12:5...
半世纪难题48小时破解!陶哲轩组队把AI数学玩成打怪游戏了

半世纪难题48小时破解!陶哲轩组队把AI数学玩成打怪游戏了

半世纪难题48小时破解!陶哲轩组队把AI数学玩成打怪游戏了 鹭羽 2025-12-13 22:43:25 来源...
美国视频生成老炮儿,入局世界模型

美国视频生成老炮儿,入局世界模型

美国视频生成老炮儿,入局世界模型 鹭羽 2025-12-13 22:41:00 来源:量子位 三连发:真实场景...
面向「空天具身智能」,北航团队提出星座规划新基准丨NeurIPS’25

面向「空天具身智能」,北航团队提出星座规划新基准丨NeurIPS’25

面向「空天具身智能」,北航团队提出星座规划新基准丨NeurIPS’25 鹭羽 2025-12-13 22:37...
为Token付费是一件很愚蠢的事情,用户应该为智能付费丨RockAI刘凡平@MEET2026

为Token付费是一件很愚蠢的事情,用户应该为智能付费丨RockAI刘凡平@MEET2026

为Token付费是一件很愚蠢的事情,用户应该为智能付费丨RockAI刘凡平@MEET2026 西风 2025-...