Generative modeling : statistical physics of Restricted Boltzmann Machines, learning with missing information and scalable training of Linear Flows

Giancarlo Fissore

Résumé

Neural network models able to approximate and sample high-dimensional probability distributions are known as generative models. In recent years this class of models has received tremendous attention due to their potential in automatically learning meaningful representations of the vast amount of data that we produce and consume daily. This thesis presents theoretical and algorithmic results pertaining to generative models and it is divided in two parts. In the first part, we focus our attention on the Restricted Boltzmann Machine (RBM) and its statistical physics formulation. Historically, statistical physics has played a central role in studying the theoretical foundations and providing inspiration for neural network models. The first neural implementation of an associative memory (Hopfield, 1982) is a seminal work in this context. The RBM can be regarded to as a development of the Hopfield model, and it is of particular interest due to its role at the forefront of the deep learning revolution (Hinton et al. 2006).Exploiting its statistical physics formulation, we derive a mean-field theory of the RBM that let us characterize both its functioning as a generative model and the dynamics of its training procedure. This analysis proves useful in deriving a robust mean-field imputation strategy that makes it possible to use the RBM to learn empirical distributions in the challenging case in which the dataset to model is only partially observed and presents high percentages of missing information. In the second part we consider a class of generative models known as Normalizing Flows (NF), whose distinguishing feature is the ability to model complex high-dimensional distributions by employing invertible transformations of a simple tractable distribution. The invertibility of the transformation allows to express the probability density through a change of variables whose optimization by Maximum Likelihood (ML) is rather straightforward but computationally expensive. The common practice is to impose architectural constraints on the class of transformations used for NF, in order to make the ML optimization efficient. Proceeding from geometrical considerations, we propose a stochastic gradient descent optimization algorithm that exploits the matrix structure of fully connected neural networks without imposing any constraints on their structure other then the fixed dimensionality required by invertibility. This algorithm is computationally efficient and can scale to very high dimensional datasets. We demonstrate its effectiveness in training a multylayer nonlinear architecture employing fully connected layers.

Les modèles de réseaux neuronaux capables d'approximer et d'échantillonner des distributions de probabilité à haute dimension sont connus sous le nom de modèles génératifs. Ces dernières années, cette classe de modèles a fait l'objet d'une attention particulière en raison de son potentiel à apprendre automatiquement des représentations significatives de la grande quantité de données que nous produisons et consommons quotidiennement. Cette thèse présente des résultats théoriques et algorithmiques relatifs aux modèles génératifs et elle est divisée en deux parties. Dans la première partie, nous concentrons notre attention sur la Machine de Boltzmann Restreinte (RBM) et sa formulation en physique statistique. Historiquement, la physique statistique a joué un rôle central dans l'étude des fondements théoriques et dans le développement de modèles de réseaux neuronaux. La première implémentation neuronale d'une mémoire associative (Hopfield, 1982) est un travail séminal dans ce contexte. La RBM peut être considérée comme un développement du modèle de Hopfield, et elle est particulièrement intéressante en raison de son rôle à l'avant-garde de la révolution de l'apprentissage profond (Hinton et al. 2006). En exploitant sa formulation de physique statistique, nous dérivons une théorie de champ moyen de la RBM qui nous permet de caractériser à la fois son fonctionnement en tant que modèle génératif et la dynamique de sa procédure d'apprentissage. Cette analyse s'avère utile pour dériver une stratégie d'imputation robuste de type champ moyen qui permet d'utiliser la RBM pour apprendre des distributions empiriques dans le cas difficile où l'ensemble de données à modéliser n'est que partiellement observé et présente des pourcentages élevés d'informations manquantes. Dans la deuxième partie, nous considérons une classe de modèles génératifs connus sous le nom de Normalizing Flows (NF), dont la caractéristique distinctive est la capacité de modéliser des distributions complexes à haute dimension en employant des transformations inversibles d'une distribution simple et traitable. L'inversibilité de la transformation permet d'exprimer la densité de probabilité par un changement de variables dont l'optimisation par Maximum de Vraisemblance (ML) est assez simple mais coûteuse en calcul. La pratique courante est d'imposer des contraintes architecturales sur la classe de transformations utilisées pour les NF, afin de rendre l'optimisation par ML efficace. En partant de considérations géométriques, nous proposons un algorithme d'optimisation stochastique par descente de gradient qui exploite la structure matricielle des réseaux de neurones entièrement connectés sans imposer de contraintes sur leur structure autre que la dimensionnalité fixe requise par l'inversibilité. Cet algorithme est efficace en termes de calcul et peut s'adapter à des ensembles de données de très haute dimension. Nous démontrons son efficacité dans l'apprentissage d'une architecture non linéaire multicouche utilisant des couches entièrement connectées.

Generative modeling : statistical physics of Restricted Boltzmann Machines, learning with missing information and scalable training of Linear Flows

Modélisation générative : physique statistique des Machines de Boltzmann Restreintes, apprentissage avec informations manquantes et apprentissage scalable des flux linéaires

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager