Assessing Authenticity and Anonymity of Synthetic User-generated Content in the Medical Domain - Laboratoire Interdisciplinaire des Sciences du Numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2024

Assessing Authenticity and Anonymity of Synthetic User-generated Content in the Medical Domain

Résumé

Since medical text cannot be shared easily due to privacy concerns, synthetic data bears much potential for natural language processing applications. In the context of social media and user-generated messages about drug intake and adverse drug effects, this work presents different methods to examine the authenticity of synthetic text. We conclude that the generated tweets are untraceable and show enough authenticity from the medical point of view to be used as a replacement for a real Twitter corpus. However, original data might still be the preferred choice as they contain much more diversity.
Fichier principal
Vignette du fichier
Nishiyama_CALDPSEUDO2024.pdf (658.25 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte
licence : CC BY - Paternité

Dates et versions

hal-04528240 , version 1 (01-04-2024)

Licence

Paternité

Identifiants

  • HAL Id : hal-04528240 , version 1

Citer

Tomohiro Nishiyama, Lisa Raithel, Roland Roller, Pierre Zweigenbaum, Eiji Aramaki. Assessing Authenticity and Anonymity of Synthetic User-generated Content in the Medical Domain. Workshop on Computational Approaches to Language Data Pseudonymization (CALD-pseudo), Mar 2024, St. Julian’s, Malta. pp.8-17. ⟨hal-04528240⟩
0 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More