FLTK — Federated Learning Toolkit

FLTK is a research-oriented toolkit for designing and running Federated Learning (FL) experiments. It is built on top of PyTorch Distributed (documentation) and is designed to support truly distributed federated systems. The toolkit has been tested on Ubuntu 20.04 with Python 3.7 and 3.8.

This repository contains the code and experiments for the paper:

Aergia: Leveraging Heterogeneity in Federated Learning Systems Bart Cox, Lydia Y. Chen, Jérémy Decouchant Proceedings of the 23rd ACM/IFIP International Middleware Conference (Middleware 2022) Paper link

If you use this code in your research, please consider citing:

@inproceedings{10.1145/3528535.3565238,
    author = {Cox, Bart and Chen, Lydia Y. and Decouchant, J\'{e}r\'{e}mie},
    title = {Aergia: Leveraging Heterogeneity in Federated Learning Systems},
    year = {2022},
    isbn = {9781450393409},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3528535.3565238},
    doi = {10.1145/3528535.3565238},
    booktitle = {Proceedings of the 23rd ACM/IFIP International Middleware Conference},
    pages = {107–120},
    numpages = {14},
    keywords = {task offloading, federated learning, stragglers},
    location = {Quebec, QC, Canada},
    series = {Middleware '22}
}

Design overview

PyTorch Distributed operates using a world size and process ranks, where ranks range from 0 to world_size - 1. In FLTK:

Rank 0 is typically assigned to the federator (server)
Ranks 1 to world_size - 1 correspond to clients

Federated learning protocol

A typical FL round proceeds as follows:

Client selection by the federator
Selected clients download the current global model
Local training on clients for a fixed number of epochs
Clients send model updates (weights or gradients) to the federator
The federator aggregates the updates to produce a new global model
The updated model is redistributed to clients
Steps 1–6 are repeated until convergence

Key assumptions and constraints

Client data is never shared
Client data distributions are non-IID
Client hardware can be heterogeneous
Device location affects communication latency and bandwidth
Communication overhead can be significant

Project structure

Overview of the main directories and files:

project
├── experiments
├── deploy                                    # Templates for automatic deployment
│     └── docker                              # Docker-based system deployment
│          ├── stub_default.yml
│          └── system_stub.yml                # Defines the federator and network
├── fltk                                      # Source code
│     ├── core                                # Core abstractions
│     ├── datasets                            # Dataset definitions
│     │    ├── data_distribution              # Distributed datasets and samplers
│     │    └── distributed                    # Centralized dataset variants
│     ├── nets                                # Model architectures
│     ├── samplers                            # Non-IID data samplers
│     ├── schedulers                          # Learning rate schedulers
│     ├── strategy                            # Client selection and aggregation
│     ├── util                                # Utility functions
│     └── __main__.py                         # Package entry point
├── Dockerfile                                # Container definition
├── LICENSE
├── README.md
└── setup.py

Execution modes

FLTK supports multiple execution modes depending on experimental requirements.

Simulation

All nodes (federator and clients) run on a single machine in a sequential manner. This mode is convenient for debugging and supports GPU acceleration, but does not capture real-time interaction effects.

Docker Compose (Emulation)

Each node runs in its own Docker container, with configurable CPU, memory, and network constraints. This mode enables real-time experiments where client execution timing and resource contention matter, while remaining reproducible on a single machine.

Fully distributed (e.g., Google Cloud)

Nodes are deployed natively across multiple physical or virtual machines. This mode most closely resembles real-world FL deployments, but requires substantial infrastructure and makes resource control more difficult.

Hybrid

Docker-based emulation and native deployments can be combined. For example, multiple servers may each run several Docker containers that participate in a single federated system.

Supported models

CIFAR-10 CNN
CIFAR-10 ResNet
CIFAR-100 ResNet
CIFAR-100 VGG
Fashion-MNIST CNN
Fashion-MNIST ResNet
Reddit LSTM
Shakespeare LSTM

Supported datasets

CIFAR-10
CIFAR-100
Fashion-MNIST
MNIST
Shakespeare

Prerequisites

For Docker-based execution:

Docker
Docker Compose

Installation

python3 -m pip install -r requirements.txt

Load default models

python3 -m fltk.util.default_models

Examples

Show examples

Docker Compose

Note: Ensure that docker and docker compose are installed.

Generate Docker configuration:

python3 -m fltk util-generate experiments/example_docker/

Run an example experiment:

python3 -m fltk util-run experiments/example_docker/

Single machine (native)

Launch federator

python3 -m fltk single configs/experiment.yaml --rank=0

Launch client

python3 -m fltk single configs/experiment.yaml --rank=1

Known issues

GPU support is currently unavailable in Docker and Docker Compose
The first training epoch can be significantly slower (6×–8×)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FLTK — Federated Learning Toolkit

Design overview

Federated learning protocol

Key assumptions and constraints

Project structure

Execution modes

Simulation

Docker Compose (Emulation)

Fully distributed (e.g., Google Cloud)

Hybrid

Supported models

Supported datasets

Prerequisites

Installation

Load default models

Examples

Docker Compose

Single machine (native)

Launch federator

Launch client

Known issues

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 137 Commits
deploy/docker		deploy/docker
docs		docs
examples		examples
experiments		experiments
fltk		fltk
resources		resources
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

License

bacox/fltk

Folders and files

Latest commit

History

Repository files navigation

FLTK — Federated Learning Toolkit

Design overview

Federated learning protocol

Key assumptions and constraints

Project structure

Execution modes

Simulation

Docker Compose (Emulation)

Fully distributed (e.g., Google Cloud)

Hybrid

Supported models

Supported datasets

Prerequisites

Installation

Load default models

Examples

Docker Compose

Single machine (native)

Launch federator

Launch client

Known issues

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages