Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking for Everyone

Zhen Xu; Huan Zhao; Wei-Wei Tu; Magali Richard; Sergio Escalera; Isabelle Guyon

Pré-Publication, Document De Travail Année : 2021

Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking for Everyone

(1) , (1) , (1, 2) , (3) , (4) , (2, 5)

1
2
3
4
5

Zhen Xu

Fonction : Auteur
PersonId : 1092603

4Paradigm

Huan Zhao

Fonction : Auteur

4Paradigm

Wei-Wei Tu

Fonction : Auteur

4Paradigm

Chalearn

Magali Richard

Fonction : Auteur
PersonId : 738986
IdHAL : magali-richard
ORCID : 0000-0003-3165-3218
IdRef : 168149184

Translational Innovation in Medicine and Complexity / Recherche Translationnelle et Innovation en Médecine et Complexité - UMR 5525

Sergio Escalera

Fonction : Auteur

Universitat de Barcelona

Isabelle Guyon

Fonction : Auteur
PersonId : 1087533

Chalearn

Laboratoire Interdisciplinaire des Sciences du Numérique

Résumé

Obtaining standardized crowdsourced benchmark of computational methods is a major issue in scientific communities. Dedicated frameworks enabling fair continuous benchmarking in a unified environment are yet to be developed. Here we introduce Codabench, an open-sourced, community-driven platform for benchmarking algorithms or software agents versus datasets or tasks. A public instance of Codabench is open to everyone, free of charge, and allows benchmark organizers to compare fairly submissions, under the same setting (software, hardware, data, algorithms), with custom protocols and data formats. Codabench has unique features facilitating the organization of benchmarks flexibly, easily and reproducibly. Firstly, it supports code submission and data submission for testing on dedicated compute workers, which can be supplied by the benchmark organizers. This makes the system scalable, at low cost for the platform providers. Secondly, Codabench benchmarks are created from self-contained "bundles", which are zip files containing a full description of the benchmark in a configuration file (following a well-defined schema), documentation pages, data, ingestion and scoring programs, making benchmarks reusable and portable. The Codabench documentation includes many examples of bundles that can serve as templates. Thirdly, Codabench uses dockers for each task's running environment to make results reproducible. Codabench has been used internally and externally with more than 10 applications during the past 6 months. As illustrative use cases, we introduce 4 diverse benchmarks covering Graph Machine Learning, Cancer Heterogeneity, Clinical Diagnosis and Reinforcement Learning.

Domaines

Apprentissage [cs.LG]

Fichier principal

Codabench_arxiv.pdf (1.31 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Zhen Xu : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03374222

Soumis le : jeudi 6 janvier 2022-05:05:53

Dernière modification le : vendredi 24 mars 2023-14:53:25

Dates et versions

hal-03374222 , version 1 (12-10-2021)

hal-03374222 , version 2 (06-01-2022)

hal-03374222 , version 3 (25-02-2022)

hal-03374222 , version 4 (27-06-2022)

Identifiants

HAL Id : hal-03374222 , version 2

Citer

Zhen Xu, Huan Zhao, Wei-Wei Tu, Magali Richard, Sergio Escalera, et al.. Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking for Everyone. 2022. ⟨hal-03374222v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

1063 Consultations

265 Téléchargements

Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking for Everyone

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Partager