**This is an old revision of the document!**

search?q=canonical&btnI=lucky Edit: https://drive.google.com/open?id=1eYtpSJfiYZw0bVvjbcXxg4B24VwskMJi_XvN7_mICfA

# Random Matrix Theory

“Entropy is the price of structure.” - Ilya Prigogine

**Aliases**

*This identifies the pattern and should be representative of the concept that it describes. The name should be a noun that should be easily usable within a sentence. We would like the pattern to be easily referenceable in conversation between practitioners.
*

**Intent**

The hidden structure of randomness.

**Motivation**

* This section describes the reason why this pattern is needed in practice. Other pattern languages indicate this as the Problem. In our pattern language, we express this in a question or several questions and then we provide further explanation behind the question.*

**Sketch**

*This section provides alternative descriptions of the pattern in the form of an illustration or alternative formal expression. By looking at the sketch a reader may quickly understand the essence of the pattern.
*

**Discussion**

*This is the main section of the pattern that goes in greater detail to explain the pattern. We leverage a vocabulary that we describe in the theory section of this book. We don’t go into intense detail into providing proofs but rather reference the sources of the proofs. How the motivation is addressed is expounded upon in this section. We also include additional questions that may be interesting topics for future research.*

**Known Uses**

*Here we review several projects or papers that have used this pattern.*

**Related Patterns**
*
In this section we describe in a diagram how this pattern is conceptually related to other patterns. The relationships may be as precise or may be fuzzy, so we provide further explanation into the nature of the relationship. We also describe other patterns may not be conceptually related but work well in combination with this pattern.*

*Relationship to Canonical Patterns*

*Relationship to other Patterns*

**Further Reading**

*We provide here some additional external material that will help in exploring this pattern in more detail.*

**References**

http://arxiv.org/pdf/1607.06011v1.pdf On the Modeling of Error Functions as High Dimensional Landscapes for Weight Initialization in Learning Networks

http://arxiv.org/abs/1603.04733v5 Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors

We introduce a scalable variational Bayesian neural network where the parameters are governed by a probability distribution over random matrices: the matrix variate Gaussian. By utilizing properties of this distribution we can see that our model can be considered as a composition of Gaussian Processes with nonlinear kernels of a specific form. This kernel is formed from the kroneker product of two separate kernels; a global output kernel and an input specific kernel, where the latter is composed from fixed dimension nonlinear basis functions (the inputs to each layer) weighted by their covariance. We continue in exploiting this duality and introduce pseudo input-output pairs for each layer in the network, which in turn better maintain the Gaussian Process properties of our model thus increasing the flexibility of the posterior distribution.

http://bookstore.ams.org/gsm-132 Topics in Random Matrix Theory

https://arxiv.org/pdf/0909.3912.pdf A Random Matrix Approach to Language Acquisition

Our model of linguistic interaction is analytically studied using methods of statistical physics and simulated by Monte Carlo techniques. The analysis reveals an intricate relationship between the innate propensity for language acquisition β and the lexicon size N, N ∼ exp(β).

http://openreview.net/pdf?id=B186cP9gx SINGULARITY OF THE HESSIAN IN DEEP LEARNING

We look at the eigenvalues of the Hessian of a loss function before and after training. The eigenvalue distribution is seen to be composed of two parts, the bulk which is concentrated around zero, and the edges which are scattered away from zero. We present empirical evidence for the bulk indicating how over parametrized the system is, and for the edges indicating the complexity of the input data.

https://www.quora.com/In-what-ways-is-Randomness-important-to-Deep-Learning

https://www.quantamagazine.org/20160802-unified_theory_of_randomness/

https://arxiv.org/abs/1507.00719 https://arxiv.org/abs/1605.03563 https://arxiv.org/abs/1608.05391

https://arxiv.org/pdf/1711.04735.pdf Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice

https://openreview.net/pdf?id=SJeFNoRcFQ TRADITIONAL AND HEAVY TAILED SELF REGULARIZATION IN NEURAL NETWORK MODELS

https://arxiv.org/abs/1810.01075v1 Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning

https://arxiv.org/abs/1805.11917v2 The Dynamics of Learning: A Random Matrix Approach

we introduce a random matrix-based framework to analyze the learning dynamics of a single-layer linear network on a binary classification problem, for data of simultaneously large dimension and size, trained by gradient descent. Our results provide rich insights into common questions in neural nets, such as overfitting, early stopping and the initialization of training, thereby opening the door for future studies of more elaborate structures and models appearing in today's neural networks.

https://papers.nips.cc/paper/6857-nonlinear-random-matrix-theory-for-deep-learning.pdf Nonlinear random matrix theory for deep learning