青源TALK第114期:基于梯度下降的神经网络学习中的不变低维子空间

470次阅读
没有评论

在过去的几年里,梯度下降对于简洁解的隐式偏向是在深度网络训练中广泛研究的现象。在这项工作中,我们首先将焦点缩小到深度线性网络并来研究这一现象。通过我们的分析,在数据具有低维结构时,我们的研究揭示了学习动态中的一个令人惊讶的“简洁法则”。

具体而言,我们表明从正交初始化开始的梯度下降的演化只会影响所有权重矩阵的一小部分奇异向量空间。换句话说,尽管在整个训练过程中更新了所有权重参数,但学习过程仅发生在每个权重矩阵的一个小不变子空间内。学习动态的这种简单性对于提高训练的高效性和对更好的理解深度网络的表示都有重大影响。首先,该分析使我们能够通过利用学习动态中的低维结构来显著提高训练效率。我们可以构建更小但等效的深度线性网络,而不会牺牲对应的宽网络关联的优势。

此外,我们展示了对于高效训练深度非线性网络的潜在可能性。 其次,它使我们能够更好地理解深度表示学习,并理论阐明从浅层到深层网络的逐渐特征压缩和区分。这项研究为深度非线性网络中的分层表示的理解奠定了基础。

本次演讲基于三项最近的研究成果:

https://arxiv.org/abs/2306.01154
https://arxiv.org/abs/2311.02960

https://arxiv.org/abs/2311.05061

Over the past few years, an extensively studied phenomenon in training deep networks is the implicit bias of gradient descent towards parsimonious solutions. In this work, we first investigate this phenomenon by narrowing our focus to deep linear networks. Through our analysis, we reveal a surprising “law of parsimony” in the learning dynamics when the data possesses low-dimensional structures. Specifically, we show that the evolution of gradient descent starting from orthogonal initialization only affects a minimal portion of singular vector spaces across all weight matrices. In other words, the learning process happens only within a small invariant subspace of each weight matrix, even though all weight parameters are updated throughout training. This simplicity in learning dynamics could have significant implications for both efficient training and a better understanding of deep networks. First, the analysis enables us to considerably improve training efficiency by taking advantage of the low-dimensional structure in learning dynamics. We can construct smaller, equivalent deep linear networks without sacrificing the benefits associated with the wider counterparts. Moreover, we demonstrate the potential implications for efficient training deep nonlinear networks.
 Second, it allows us to better understand deep representation learning by elucidating the progressive feature compression and discrimination from shallow to deep layers. The study paves the foundation for understanding hierarchical representations in deep nonlinear networks.

青源TALK第114期:基于梯度下降的神经网络学习中的不变低维子空间

曲庆是密歇根大学电子工程与计算机科学系的助理教授。他分别于2018年10月从哥伦比亚大学获得电气工程博士学位,2011年7月从清华大学获得学士学位。他的研究兴趣集中在数据科学基础、机器学习、数值优化和信号/图像处理的交叉领域。

Qing Qu is an assistant professor in EECS department at the University of Michigan. Prior to that, he was a Moore-Sloan data science fellow at Center for Data Science, New York University, from 2018 to 2020. He received his Ph.D from Columbia University in Electrical Engineering in Oct. 2018. He received his B.Eng. from Tsinghua University in Jul. 2011, and a M.Sc.from the Johns Hopkins University in Dec. 2012, both in Electrical and Computer Engineering. He interned at U.S. Army Research Laboratory in 2012 and Microsoft Research in 2016, respectively. His research interest lies at the intersection of foundation of data science, machine learning, numerical optimization, and signal/image processing, with focus on developing efficient nonconvex methods and global optimality guarantees for solving representation learning and nonlinear inverse problems in engineering and imaging sciences. He is the recipient of Best Student Paper Award at SPARS’15 (with Ju Sun, John Wright), and the recipient of Microsoft PhD Fellowship in machine learning in 2016, and best paper awards in NeurIPS Diffusion Model Workshop in 2023. He is the recipient of the NSF Career Award in 2022, and Amazon Research Award (AWS AI) in 2023.

 

Read More 

正文完
可以使用微信扫码关注公众号(ID:xzluomor)
post-qrcode
 
评论(没有评论)
Generated by Feedzy