Towards improving workflows reproducibility : Extracting information on workflows from text and code repositories - Laboratoire Interdisciplinaire des Sciences du Numérique Accéder directement au contenu
Poster De Conférence Année : 2023

Towards improving workflows reproducibility : Extracting information on workflows from text and code repositories

Résumé

Scientific workflows provide bioinformaticians a means to represent, exchange, and ensure the reproducibility of their analysis pipelines. Workflows are described in literature (text) and/or stored in workflow repositories (code). A major challenge to favor better workflow reuse is to rebuild the link between the documentation and the workflow code. Based on workflow descriptions found in the full text of articles in English, we propose a method for representing and extracting information about the components of workflows. We also use a symbolic approach relying on code structure to analyze the contents of GitHub repositories containing workflows and compare the information available from both sources. We present a corpus of articles annotated with a workflow representation comprising 16 entities and 10 relations. We use this corpus to train and evaluate statistical models for extracting information about workflows, in particular tools used in the workflow. We then link the information extracted from the text to information extracted from the source code. The results obtained show both the feasibility of the extraction tasks and the complementarity of articles and code repositories in terms of information. This work is a first step towards the integration of workflow information from the literature and workflow repositories.
Fichier non déposé

Dates et versions

hal-04363577 , version 1 (25-12-2023)

Identifiants

  • HAL Id : hal-04363577 , version 1

Citer

Clémence Sebe, Frédéric Lemoine, Alban Gaignard, Olivier Ferret, Sarah Cohen-Boulakia, et al.. Towards improving workflows reproducibility : Extracting information on workflows from text and code repositories. 31st Annual Intelligent Systems For Molecular Biology and 22nd Annual European Conference on Computational Biology (ISMB/ECCB 2023), Jul 2023, Lyon, France. . ⟨hal-04363577⟩
55 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More