Person instance graphs for mono-, cross- and multi-modal person recognition in multimedia data: application to speaker identification in TV broadcast - Laboratoire Interdisciplinaire des Sciences du Numérique Accéder directement au contenu
Article Dans Une Revue International Journal of Multimedia Information Retrieval Année : 2014

Person instance graphs for mono-, cross- and multi-modal person recognition in multimedia data: application to speaker identification in TV broadcast

Résumé

This work introduces a unified framework for mono-, cross-and multi-modal person recognition in multimedia data. Dubbed Person Instance Graph, it models the person recognition task as a graph mining problem: i.e. finding the best mapping between person instance vertices and identity vertices. Practically, we describe how the approach can be applied to speaker identification in TV broadcast. Then, a solution to the above-mentioned mapping problem is proposed. It relies on Integer Linear Programming to model the problem of clustering person instances based on their identity. We provide an in-depth theoretical definition of the optimization problem. Moreover, we improve two fundamental aspects of our previous related work: the problem constraints and the optimized objective function. Finally, a thorough experimental evaluation of the proposed framework is performed on a publicly available benchmark database. Depending on the graph configuration (i.e. the choice of its vertices and edges), we show that multiple tasks can be addressed interchangeably (e.g. speaker diarization, supervised or unsuper-vised speaker identification), significantly outperform-ing state-of-the-art mono-modal approaches.
Fichier principal
Vignette du fichier
paper.pdf (1.93 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01690350 , version 1 (22-01-2018)

Identifiants

Citer

Hervé Bredin, Anindya Roy, Viet-Bac Le, Claude Barras. Person instance graphs for mono-, cross- and multi-modal person recognition in multimedia data: application to speaker identification in TV broadcast. International Journal of Multimedia Information Retrieval, 2014, 3 (3), pp.161 - 175. ⟨10.1007/s13735-014-0055-y⟩. ⟨hal-01690350⟩
65 Consultations
133 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More