Languages Worldwide and the World Wide Web: Crowdsourcing on the Internet to Explore Linguistic Theories - Laboratoire Interdisciplinaire des Sciences du Numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2022

Languages Worldwide and the World Wide Web: Crowdsourcing on the Internet to Explore Linguistic Theories

Résumé

Vocal languages across the world are estimated to be approximately 6000, yet only a handful of them are well-resourced, thus limiting typological investigations, i.e., language-comparison studies aiming at understanding universal trends in language. Crowd-sourced data could participate in creating homogenous multilingual corpora and therefore provide a revolutionary tool to give researchers access to large amounts of data in rare or remote languages. Yet crowd-sourced data are usually recorded with non-professional tools in non-silent environments, which represents a challenge to anyone wishing to use them for phonetic research. In this paper, we show how crowd-sourced data can participate in academic research by using audio files from Lingua Libre, Wikimedia France’s open-access linguistic library, to test the Inventory Size Hypothesis. This hypothesis suggests that the more phonological vowel categories a language has, the less internal phonetic variation vowels will display. The platform allows us to investigate the acoustic measurements of the three cardinal vowels /a/, /i/ and /u/ in 7 less-resourced languages with various numbers of vowel categories. Our results replicate the results of previous literature, which shows that our methodology is promising. Lingua Libre thus successfully allows to investigate a scientific question with theoretical implications for larger models of communication, and to bridge the gap between well and less-resourced languages in an inclusive, homogeneous data set of the world’s languages.

Domaines

Linguistique
Fichier non déposé

Dates et versions

hal-03887378 , version 1 (06-12-2022)

Identifiants

  • HAL Id : hal-03887378 , version 1

Citer

Mathilde Hutin, Marc Allassonnière-Tang. Languages Worldwide and the World Wide Web: Crowdsourcing on the Internet to Explore Linguistic Theories. Digital Research Data and Human Sciences (DRDHum Conference 2022), Dec 2022, Jyväskylä, Finland. ⟨hal-03887378⟩
37 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More