Parmesan: Efficient Partitioning and Mapping Flow for DNN Training on General Device Topology

About

Parmesan is an efficient design flow to map the training of a large DNN onto a system with general device topology to maximize the throughput. It works in an end-to-end manner and solves the whole optimization problem in two phases. The first phase aims at producing well-balanced partitions, and the second phase works towards placing the DNN on devices connected by an arbitrary topology network, considering the heterogeneity of the interconnection bandwidth.

System Design

Code Structure

comp_graph: pytorch computational graph extractor, and profiler.
device_topo: device topology data structures, used for compute communication time for now.
experiments: experiment scripts.
optimizer: optimization algorithms (incl. model partitioning and device mapping).
runtime: pipeline scheduler.
simulator: simulator.
test_networks: pytorch module definitions for development uses.
tools: example code of communication profiling, and tool script for checking GPU utilization.
misc: directory for saving generated data files (e.g., Tensorboard visualizations, images, model checkpoints).

Main Entrance

run_dry_optimization.py: example code of performing partitioning and mapping with dry run optimization. Sample input and output are given in ./misc/text_graph_in/ResNet and ./misc/text_graph_out/ResNet respectively.
run_experiments.py: script for generating model definition files.
run_new_runtime.py: script for pipeline training. Note that load_partition should be always given. Alternatively, you can run pipeline training in batch mode using: python3 experiments/batch_runtime_latency.py --work_dir experiments/batch_xxx.

Optimizer API

Optimizer.optimization(opt_method, **kwargs)

opt_method (str): type of optimization algorithm. For our algorithm, use "parmesan".

For opt_method="parmesan", the following keyword arguments can be specified:

k (int): number of partitions after coarsening stage.

comm_coef (float): communication coefficient in coarsening.

new_flow (bool): if True, the optimization flow is coarsening + DP + uncoarsening. Otherwise, the optimization flow is coarsening + uncoarsening + DP.

search_k, search_comm_coef (bool): if True, perform searching of the corresponding parameter and determine the best set of parameters according to simulation time. If search is enabled, the manually set value for the corresponding parameter will be ignored. See optimizer/search_params.py for more details.
For opt_method="ensemble", the following keyword arguments must be specified:

sim (Simulator): the simulator object.

opt_method_to_params (dict): a dict. For each item, the key is the string of opt_method argument, and the value is the dict object of the corresponding kwargs.
The sim_params (dict) can be specifid to configure the simulation:

concurrent_copy_comp (bool): if True, concurrent copy and computation will be adopted in simulation.

debug (bool): if True, a simulation tracing file will be wrote in ./misc/coarsen_graph/sim_tracing.json. You can use chrome://tracing to open it.
The mapping_params (dict) can be specifid to configure the device mapping:

incl_allr (bool): if True, mapping will consider all reduce.

incl_p2p (bool): if True, mapping will consider p2p communication.

parallel_same_t (int): the number of threads for the same target t.

parallel_degree (int): the number of threads for all targets. The number of different targets will be determined by parallel_degree / parallel_same_t.

Requirements

PyTorch == 1.10.1
CUDA == 11.3
Numba == 0.54.1

Citation

@article{liu2024parmesan,
  author={Liu, Lixin and Liu, Tianji and Jiang, Bentian and Young, Evangeline F.Y.},
  booktitle={IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)}, 
  title={Parmesan: Efficient Partitioning and Mapping Flow for DNN Training on General Device Topology}, 
  year={2024},
}

License

Parmesan is an open source project licensed under a BSD 3-Clause License that can be found in the LICENSE file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parmesan: Efficient Partitioning and Mapping Flow for DNN Training on General Device Topology

About

System Design

Code Structure

Main Entrance

Optimizer API

Requirements

Citation

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
comp_graph		comp_graph
device_topo		device_topo
experiments		experiments
img		img
optimizer		optimizer
runtime		runtime
simulator		simulator
test_networks		test_networks
tools		tools
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build_model.py		build_model.py
clean_cache.sh		clean_cache.sh
run_dry_optimization.py		run_dry_optimization.py
run_experiments.py		run_experiments.py
run_new_runtime.py		run_new_runtime.py
run_parmesan.py		run_parmesan.py
run_topogies.py		run_topogies.py

License

cuhk-eda/Parmesan

Folders and files

Latest commit

History

Repository files navigation

Parmesan: Efficient Partitioning and Mapping Flow for DNN Training on General Device Topology

About

System Design

Code Structure

Main Entrance

Optimizer API

Requirements

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages