Languages Worldwide and the World Wide Web: Crowdsourcing on the Internet to Explore Linguistic Theories - Archive ouverte HAL Access content directly
Proceedings Year : 2022

Languages Worldwide and the World Wide Web: Crowdsourcing on the Internet to Explore Linguistic Theories

Abstract

Vocal languages across the world are estimated to be approximately 6000, yet only a handful of them are well-resourced, thus limiting typological investigations, i.e., language-comparison studies aiming at understanding universal trends in language. Crowd-sourced data could participate in creating homogenous multilingual corpora and therefore provide a revolutionary tool to give researchers access to large amounts of data in rare or remote languages. Yet crowd-sourced data are usually recorded with non-professional tools in non-silent environments, which represents a challenge to anyone wishing to use them for phonetic research. In this paper, we show how crowd-sourced data can participate in academic research by using audio files from Lingua Libre, Wikimedia France’s open-access linguistic library, to test the Inventory Size Hypothesis. This hypothesis suggests that the more phonological vowel categories a language has, the less internal phonetic variation vowels will display. The platform allows us to investigate the acoustic measurements of the three cardinal vowels /a/, /i/ and /u/ in 7 less-resourced languages with various numbers of vowel categories. Our results replicate the results of previous literature, which shows that our methodology is promising. Lingua Libre thus successfully allows to investigate a scientific question with theoretical implications for larger models of communication, and to bridge the gap between well and less-resourced languages in an inclusive, homogeneous data set of the world’s languages.
Not file

Dates and versions

hal-03887378 , version 1 (06-12-2022)

Identifiers

  • HAL Id : hal-03887378 , version 1

Cite

Mathilde Hutin, Marc Allassonnière-Tang. Languages Worldwide and the World Wide Web: Crowdsourcing on the Internet to Explore Linguistic Theories. 2022, 978-951-39-9450-1. ⟨hal-03887378⟩
16 View
0 Download

Share

Gmail Facebook Twitter LinkedIn More