Skip to content
/ FedMuL Public

Detail code implementation and experimental setting for our paper: Federated Learning on Multilabel Evolving Data Streams

Notifications You must be signed in to change notification settings

kholam/FedMuL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Federated Learning on Multilabel Evolving Data Streams

For details, see the paper: IEEE Internet of Things Journal - Federated Learning on Multilabel Evolving Data Streams

Abstract

Multilabel classification in distributed evolving data stream environment presents significant challenges, including addressing distributed concept drifts and label dependencies. In this study, we introduce two novel solutions employing federated learning (FL) problem transformation techniques to tackle these challenges effectively. Our first approach is an error-driven micro-cluster-based learning strategy that adapts micro-clusters to the evolving data distributions, enabling the handling of concept drifts from various client sources. Our second approach utilizes a graph-based method that leverages graph centrality to capture label dependency and correlation in distributed multilabel data streams. Experimental evaluations reveal that our proposed solutions outperform state-of-the-art methods in terms of multilabel classification metrics. This study highlights the potential of FL in overcoming the challenges associated with distributed multilabel data stream classification.

overview

Quick Start

1. Install Dependencies

We recommend using conda to set up the environment.

conda create --name venv python=3.11 -y
conda activate venv

pip install -r requirement.txt

2. Run Tests (Optional)

python tests/test_utils.py

3. Run the Code

Parameters

Parameter Default Description
--dataset yelp Dataset name
--clients 5 Number of federated clients
--features 671 Number of features
--labels 5 Number of labels
--max_mc 500 Max micro-clusters per client
--global_mc 500 Max global micro-clusters
--percent_init 0.15 Initial data percentage
--run_type fed Run mode: fed (recommended)

Usage Examples

python main.py --dataset yelp --clients 5 --run_type fed

# Scale up with more clients  
python main.py --dataset yelp --clients 10 --run_type fed
python main.py --dataset yelp --clients 20 --run_type fed

# Different datasets
python main.py --dataset scene --clients 3 --run_type fed

The data_preprocessed/ folder contains:

  • yelp.npy - Yelp multi-label dataset
  • scene.npy - Scene multi-label dataset

Dataset Statistics

Dataset Instances Features Feature Type Labels Cardinality Link
Emotions 593 72 numeric 6 1.868 Emotions
Birds 645 260 numeric 19 1.014 Birds
Enron 1,702 1,001 nominal 53 3.378 Enron
Image 2,000 294 numeric 5 1.236 Image
Yeast 2,417 103 numeric 14 4.237 Yeast
Scene 2,407 294 nominal 6 1.074 Scene
Slashdot 3,782 1,079 nominal 22 1.181 Slashdot
Tmc2007-500 28,600 500 nominal 22 2.220 Tmc2007-500
Yelp 10,810 671 nominal 5 1.638 Yelp

Citation

If you find this code useful, please consider giving a star ⭐ and citation

@ARTICLE{11098479,
  author={Lamptey, Khalid Odartey and Ayekai, Browne Judith and Ud Din, Salah},
  journal={IEEE Internet of Things Journal}, 
  title={Federated Learning on Multilabel Evolving Data Streams}, 
  year={2025},
  volume={12},
  number={20},
  pages={42103-42115},
  keywords={Streams;Federated learning;Multi label classification;Concept drift;Distributed databases;Accuracy;Training;Machine learning algorithms;Decision trees;Data models;Concept drift;data streams;federated learning (FL);multilabel classification;prototype-learning},
  doi={10.1109/JIOT.2025.3592954}}

About

Detail code implementation and experimental setting for our paper: Federated Learning on Multilabel Evolving Data Streams

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages