Natural language processing of radiology reports for the detection of thromboembolic diseases and clinically relevant incidental findings. - Laboratoire Interdisciplinaire des Sciences du Numérique Accéder directement au contenu
Article Dans Une Revue BMC Bioinformatics Année : 2013

Natural language processing of radiology reports for the detection of thromboembolic diseases and clinically relevant incidental findings.

Résumé

Background:Natural Language Processing (NLP) has been shown effective to analyze the content of radiologyreports and identify diagnosis or patient characteristics. We evaluate the combination of NLP and machine learningto detect thromboembolic disease diagnosis and incidental clinically relevant findings from angiography andvenography reports written in French. We model thromboembolic diagnosis and incidental findings as a set of concepts,modalities and relations between concepts that can be used as features by a supervised machine learning algorithm. Acorpus of 573 radiology reports was de-identified and manually annotated with the support of NLP tools by a physicianfor relevant concepts, modalities and relations. A machine learning classifier was trained on the dataset interpreted by aphysician for diagnosis of deep-vein thrombosis, pulmonary embolism and clinically relevant incidental findings. Decisionmodels accounted for the imbalanced nature of the data and exploited the structure of the reports.Results:The best model achieved an F measure of 0.98 for pulmonary embolism identification, 1.00 for deep veinthrombosis, and 0.80 for incidental clinically relevant findings. The use of concepts, modalities and relations improvedperformances in all cases.Conclusions:This study demonstrates the benefits of developing an automated method to identify medical concepts,modality and relations from radiology reports in French. An end-to-end automatic system for annotationand classification which could be applied to other radiology reports databases would be valuable for epidemiologicalsurveillance, performance monitoring, and accreditation in French hospitals.
Fichier principal
Vignette du fichier
1471-2105-15-266.pdf (581.86 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte
Loading...

Dates et versions

inserm-01094167 , version 1 (11-12-2014)

Identifiants

  • HAL Id : inserm-01094167 , version 1
  • PUBMED : 25099227

Citer

Anne-Dominique Pham, Aurélie Névéol, Thomas Lavergne, Daisuke Yasunaga, Olivier Clément, et al.. Natural language processing of radiology reports for the detection of thromboembolic diseases and clinically relevant incidental findings.. BMC Bioinformatics, 2013, pp.266. ⟨inserm-01094167⟩
211 Consultations
174 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More