Can Synthetic Text Help Clinical Named Entity Recognition? A Study of Electronic Health Records in French - Laboratoire Interdisciplinaire des Sciences du Numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2023

Can Synthetic Text Help Clinical Named Entity Recognition? A Study of Electronic Health Records in French

Résumé

In sensitive domains, the sharing of corpora is restricted due to confidentiality, copyrights, or trade secrets. Automatic text generation can help alleviate these issues by producing synthetic texts that mimic the linguistic properties of real documents while preserving confidentiality. In this study, we assess the usability of synthetic corpus as a substitute training corpus for clinical information extraction. Our goal is to automatically produce a clinical case corpus annotated with clinical entities and to evaluate it for a named entity recognition (NER) task. We use two auto-regressive neural models partially or fully trained on generic French texts and fine-tuned on clinical cases to produce a corpus of synthetic clinical cases. We study variants of the generation process: (i) fine-tuning on annotated vs. plain text (in that case, annotations are obtained a posteriori) and (ii) selection of generated texts based on models' parameters and filtering criteria. We then train NER models with the resulting synthetic text and evaluate them on a gold standard clinical corpus. Our experiments suggest that synthetic text is useful for clinical NER.

Mots clés

Fichier principal
Vignette du fichier
Exp_rience_reconnaissance_EN.pdf (382.61 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04018935 , version 1 (08-03-2023)
hal-04018935 , version 2 (30-11-2023)

Identifiants

  • HAL Id : hal-04018935 , version 1

Citer

Nicolas Hiebel, Olivier Ferret, Karën Fort, Aurélie Névéol. Can Synthetic Text Help Clinical Named Entity Recognition? A Study of Electronic Health Records in French. EACL The 17th Conference of the European Chapter of the Association for Computational Linguistics, May 2023, Dubrovnic, Croatia. ⟨hal-04018935v1⟩
241 Consultations
337 Téléchargements

Partager

Gmail Facebook X LinkedIn More