French CrowS-Pairs: Extending a challenge dataset for measuring social bias in masked language models to a language other than English

Aurélie Névéol; Yoann Dupont; Julien Bezançon; Karën Fort

Communication Dans Un Congrès Année : 2022

French CrowS-Pairs: Extending a challenge dataset for measuring social bias in masked language models to a language other than English

(1, 2) , (3) , (3) , (4, 3)

1
2
3
4

Aurélie Névéol

Fonction : Auteur
PersonId : 1131630

Laboratoire Interdisciplinaire des Sciences du Numérique

Information, Langue Ecrite et Signée

Yoann Dupont

Fonction : Auteur
PersonId : 169798
IdHAL : yoann-dupont
IdRef : 224619225

Sorbonne Université

Julien Bezançon

Fonction : Auteur

Sorbonne Université

Karën Fort

Fonction : Auteur
PersonId : 2215
IdHAL : karen-fort
ORCID : 0000-0002-0723-8850
IdRef : 176299548

Semantic Analysis of Natural Language

Sorbonne Université

Résumé

Warning: This paper contains explicit statements of offensive stereotypes which may be upsetting. Much work on biases in natural language processing has addressed biases linked to the social and cultural experience of English speaking individuals in the United States. We seek to widen the scope of bias studies by creating material to measure social bias in language models (LMs) against specific demographic groups in France. We build on the US-centered CrowS-pairs dataset to create a multilingual stereotypes dataset that allows for comparability across languages while also characterizing biases that are specific to each country and language. We introduce 1,677 sentence pairs in French that cover stereotypes in ten types of bias like gender and age. 1,467 sentence pairs are translated from CrowS-pairs and 210 are newly crowdsourced and translated back into English. The sentence pairs contrast stereotypes concerning underadvantaged groups with the same sentence concerning advantaged groups. We find that four widely used language models (three French, one multilingual) favor sentences that express stereotypes in most bias categories. We report on the translation process, which led to a characterization of stereotypes in CrowS-pairs including the identification of US-centric cultural traits. We offer guidelines to further extend the dataset to other languages and cultural environments.

Domaines

Traitement du texte et du document

Fichier principal

ACLFinal.pdf (185.36 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Karën Fort : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03629677

Soumis le : lundi 4 avril 2022-15:00:07

Dernière modification le : mardi 6 février 2024-14:40:07

Archivage à long terme le : mardi 5 juillet 2022-18:46:20

Dates et versions

hal-03629677 , version 1 (04-04-2022)

Identifiants

HAL Id : hal-03629677 , version 1

Citer

Aurélie Névéol, Yoann Dupont, Julien Bezançon, Karën Fort. French CrowS-Pairs: Extending a challenge dataset for measuring social bias in masked language models to a language other than English. ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, May 2022, Dublin, Ireland. ⟨hal-03629677⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA CENTRALESUPELEC UNIV-LORRAINE INRIA2 LORIA LORIA-NLPKD UNIV-PARIS-SACLAY SORBONNE-UNIVERSITE ANR LISN GS-ENGINEERING GS-COMPUTER-SCIENCE LISN-ILES

475 Consultations

702 Téléchargements

French CrowS-Pairs: Extending a challenge dataset for measuring social bias in masked language models to a language other than English

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager