Learning with Noise-Contrastive Estimation: Easing training by learning to scale

Matthieu Labeau; Alexandre Allauzen

Communication Dans Un Congrès Année : 2018

Learning with Noise-Contrastive Estimation: Easing training by learning to scale

(1) , (2)

1
2

Matthieu Labeau

Fonction : Auteur
PersonId : 182144
IdHAL : matthieu-labeau
IdRef : 230828426

Laboratoire Traitement et Communication de l'Information

Alexandre Allauzen

Fonction : Auteur
PersonId : 171266
IdHAL : alexandre-allauzen
IdRef : 078187621

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

Résumé

Noise-Contrastive Estimation (NCE) is a learning criterion that is regularly used to train neural language models in place of Maximum Likelihood Estimation, since it avoids the computational bottleneck caused by the output softmax. In this paper, we analyse and explain some of the weaknesses of this objective function, linked to the mechanism of self-normalization, by closely monitoring comparative experiments. We then explore several remedies and modifications to propose tractable and efficient NCE training strategies. In particular, we propose to make the scaling factor a trainable parameter of the model, and to use the noise distribution to initialize the output bias. These solutions, yet simple, yield stable and competitive performances in either small and large scale language modelling tasks.

Domaines

Informatique et langage [cs.CL] Statistiques [stat]

Fichier principal

C18-1261.pdf (573.48 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Matthieu Labeau : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02912385

Soumis le : mercredi 5 août 2020-18:30:40

Dernière modification le : lundi 29 janvier 2024-14:50:40

Archivage à long terme le : lundi 30 novembre 2020-15:05:52

Dates et versions

hal-02912385 , version 1 (05-08-2020)

Identifiants

HAL Id : hal-02912385 , version 1

Citer

Matthieu Labeau, Alexandre Allauzen. Learning with Noise-Contrastive Estimation: Easing training by learning to scale. 27th International Conference on Computational Linguistics (COLING 2018), Aug 2018, Santa Fe, NM, United States. pp.3090-3101. ⟨hal-02912385⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM CNRS PARISTECH LIMSI LTCI IDS S2A LISN GS-ENGINEERING GS-COMPUTER-SCIENCE

52 Consultations

45 Téléchargements

Learning with Noise-Contrastive Estimation: Easing training by learning to scale

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager