High-Quality Fault Resiliency in Fat Trees - Université Paris-Saclay Accéder directement au contenu
Article Dans Une Revue IEEE Micro Année : 2020

High-Quality Fault Resiliency in Fat Trees

Résumé

Coupling regular topologies with optimised routing algorithms is key in pushing the performance of interconnection networks of supercomputers. In this paper we present Dmodc, a fast deterministic routing algorithm for Parallel Generalised Fat-Trees (PGFTs) which minimises congestion risk even under massive network degradation caused by equipment failure. Dmodc computes forwarding tables with a closed-form arithmetic formula by relying on a fast preprocessing phase. This allows complete re-routing of networks with tens of thousands of nodes in less than a second. In turn, this greatly helps centralised fabric management react to faults with high-quality routing tables and no impact to running applications in current and future very large-scale HPC clusters.
Fichier principal
Vignette du fichier
article.pdf (268.31 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03861715 , version 1 (21-11-2022)

Licence

Paternité

Identifiants

Citer

John Gliksberg, Antoine Capra, Alexandre Louvet, Pedro Javier Garcia, Devan Sohier. High-Quality Fault Resiliency in Fat Trees. IEEE Micro, 2020, 40 (1), pp.44-49. ⟨10.1109/MM.2019.2949978⟩. ⟨hal-03861715⟩
12 Consultations
46 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More