A Sketch-Based Naive Bayes Algorithms for Evolving Data Streams - Laboratoire Interdisciplinaire des Sciences du Numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

A Sketch-Based Naive Bayes Algorithms for Evolving Data Streams

Résumé

A well-known learning task in big data stream mining is classification. Extensively studied in the offline setting, in the streaming setting - where data are evolving and even infinite - it is still a challenge. In the offline setting, training needs to store all the data in memory for the learning task; yet, in the streaming setting, this is impossible to do due to the massive amount of data that is generated in real-time. To cope with these resource issues, this paper proposes and analyzes several evolving naive Bayes classification algorithms, based on the well-known count-min sketch, in order to minimize the space needed to store the training data. The proposed algorithms also adapt concept drift approaches, such as ADWIN, to deal with the fact that streaming data may be evolving and change over time. However, handling sparse, very high-dimensional data in such framework is highly challenging. Therefore, we include the hashing trick, a technique for dimensionality reduction, to compress that down to a lower dimensional space, which leads to a large memory saving.We give a theoretical analysis which demonstrates that our proposed algorithms provide a similar accuracy quality to the classical big data stream mining algorithms using a reasonable amount of resources. We validate these theoretical results by an extensive evaluation on both synthetic and real-world datasets.
Fichier principal
Vignette du fichier
bahri2018sketch.pdf (833.89 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04507533 , version 1 (16-03-2024)

Identifiants

Citer

Maroua Bahri, Silviu Maniu, Albert Bifet. A Sketch-Based Naive Bayes Algorithms for Evolving Data Streams. 2018 IEEE International Conference on Big Data (Big Data), Dec 2018, Seattle, United States. pp.604-613, ⟨10.1109/BigData.2018.8622178⟩. ⟨hal-04507533⟩
6 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More