PROCLAIM: An Unsupervised Approach to Discover Domain-Specific Attribute Matchings from Heterogeneous Sources - Laboratoire Interdisciplinaire des Sciences du Numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

PROCLAIM: An Unsupervised Approach to Discover Domain-Specific Attribute Matchings from Heterogeneous Sources

Résumé

Schema matching is a critical problem in many applications where the main goal is to match attributes coming from heterogeneous sources. In this paper, we propose PROCLAIM (PROfile-based Cluster-Labeling for AttrIbute Matching), an automatic, unsupervised clustering-based approach to match attributes of a large number of heterogeneous sources. We define the concept of attribute profile to characterize the main properties of an attribute using: (i) the statistical distribution and the dimension of the attribute's values, (ii) the name and textual descriptions related to the attribute. The attribute matchings produced by PROCLAIM give the best representation of heterogeneous sources thanks to the cluster-labeling function we defined. We evaluate PROCLAIM on 45,000 different data sources coming from oil and gas authority open data website3. The results we obtain are promising and validate our approach.
Fichier principal
Vignette du fichier
PROCLAIM__An_Unsupervised_Approach_to_Discover_a_Global_Schema_for_Heterogeneous_Sources.pdf (255.42 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02863603 , version 1 (10-06-2020)

Identifiants

  • HAL Id : hal-02863603 , version 1

Citer

Molood Arman, Sylvain Wlodarczyk, Nacéra Bennacer Seghouani, Francesca Bugiotti. PROCLAIM: An Unsupervised Approach to Discover Domain-Specific Attribute Matchings from Heterogeneous Sources. CAiSE'20, 32nd International Conference on Advanced Information Systems Engineering, Jun 2020, Grenoble, France. ⟨hal-02863603⟩
67 Consultations
130 Téléchargements

Partager

Gmail Facebook X LinkedIn More