For details, see the paper: IEEE Internet of Things Journal - Federated Learning on Multilabel Evolving Data Streams
Multilabel classification in distributed evolving data stream environment presents significant challenges, including addressing distributed concept drifts and label dependencies. In this study, we introduce two novel solutions employing federated learning (FL) problem transformation techniques to tackle these challenges effectively. Our first approach is an error-driven micro-cluster-based learning strategy that adapts micro-clusters to the evolving data distributions, enabling the handling of concept drifts from various client sources. Our second approach utilizes a graph-based method that leverages graph centrality to capture label dependency and correlation in distributed multilabel data streams. Experimental evaluations reveal that our proposed solutions outperform state-of-the-art methods in terms of multilabel classification metrics. This study highlights the potential of FL in overcoming the challenges associated with distributed multilabel data stream classification.
We recommend using conda to set up the environment.
conda create --name venv python=3.11 -y
conda activate venv
pip install -r requirement.txtpython tests/test_utils.py| Parameter | Default | Description |
|---|---|---|
--dataset |
yelp | Dataset name |
--clients |
5 | Number of federated clients |
--features |
671 | Number of features |
--labels |
5 | Number of labels |
--max_mc |
500 | Max micro-clusters per client |
--global_mc |
500 | Max global micro-clusters |
--percent_init |
0.15 | Initial data percentage |
--run_type |
fed | Run mode: fed (recommended) |
python main.py --dataset yelp --clients 5 --run_type fed
# Scale up with more clients
python main.py --dataset yelp --clients 10 --run_type fed
python main.py --dataset yelp --clients 20 --run_type fed
# Different datasets
python main.py --dataset scene --clients 3 --run_type fedThe data_preprocessed/ folder contains:
yelp.npy- Yelp multi-label datasetscene.npy- Scene multi-label dataset
| Dataset | Instances | Features | Feature Type | Labels | Cardinality | Link |
|---|---|---|---|---|---|---|
| Emotions | 593 | 72 | numeric | 6 | 1.868 | Emotions |
| Birds | 645 | 260 | numeric | 19 | 1.014 | Birds |
| Enron | 1,702 | 1,001 | nominal | 53 | 3.378 | Enron |
| Image | 2,000 | 294 | numeric | 5 | 1.236 | Image |
| Yeast | 2,417 | 103 | numeric | 14 | 4.237 | Yeast |
| Scene | 2,407 | 294 | nominal | 6 | 1.074 | Scene |
| Slashdot | 3,782 | 1,079 | nominal | 22 | 1.181 | Slashdot |
| Tmc2007-500 | 28,600 | 500 | nominal | 22 | 2.220 | Tmc2007-500 |
| Yelp | 10,810 | 671 | nominal | 5 | 1.638 | Yelp |
If you find this code useful, please consider giving a star ⭐ and citation
@ARTICLE{11098479,
author={Lamptey, Khalid Odartey and Ayekai, Browne Judith and Ud Din, Salah},
journal={IEEE Internet of Things Journal},
title={Federated Learning on Multilabel Evolving Data Streams},
year={2025},
volume={12},
number={20},
pages={42103-42115},
keywords={Streams;Federated learning;Multi label classification;Concept drift;Distributed databases;Accuracy;Training;Machine learning algorithms;Decision trees;Data models;Concept drift;data streams;federated learning (FL);multilabel classification;prototype-learning},
doi={10.1109/JIOT.2025.3592954}}
