Skip to content

Parmesan: Efficient Partitioning and Mapping Flow for DNN Training on General Device Topology

License

Notifications You must be signed in to change notification settings

cuhk-eda/Parmesan

Repository files navigation

Parmesan: Efficient Partitioning and Mapping Flow for DNN Training on General Device Topology

About

Parmesan Overview

Parmesan is an efficient design flow to map the training of a large DNN onto a system with general device topology to maximize the throughput. It works in an end-to-end manner and solves the whole optimization problem in two phases. The first phase aims at producing well-balanced partitions, and the second phase works towards placing the DNN on devices connected by an arbitrary topology network, considering the heterogeneity of the interconnection bandwidth.

System Design

Code Structure

  • comp_graph: pytorch computational graph extractor, and profiler.

  • device_topo: device topology data structures, used for compute communication time for now.

  • experiments: experiment scripts.

  • optimizer: optimization algorithms (incl. model partitioning and device mapping).

  • runtime: pipeline scheduler.

  • simulator: simulator.

  • test_networks: pytorch module definitions for development uses.

  • tools: example code of communication profiling, and tool script for checking GPU utilization.

  • misc: directory for saving generated data files (e.g., Tensorboard visualizations, images, model checkpoints).

Main Entrance

  1. run_dry_optimization.py: example code of performing partitioning and mapping with dry run optimization. Sample input and output are given in ./misc/text_graph_in/ResNet and ./misc/text_graph_out/ResNet respectively.

  2. run_experiments.py: script for generating model definition files.

  3. run_new_runtime.py: script for pipeline training. Note that load_partition should be always given. Alternatively, you can run pipeline training in batch mode using: python3 experiments/batch_runtime_latency.py --work_dir experiments/batch_xxx.

Optimizer API

Optimizer.optimization(opt_method, **kwargs)

opt_method (str): type of optimization algorithm. For our algorithm, use "parmesan".

  1. For opt_method="parmesan", the following keyword arguments can be specified:

    k (int): number of partitions after coarsening stage.

    comm_coef (float): communication coefficient in coarsening.

    new_flow (bool): if True, the optimization flow is coarsening + DP + uncoarsening. Otherwise, the optimization flow is coarsening + uncoarsening + DP.

    search_k, search_comm_coef (bool): if True, perform searching of the corresponding parameter and determine the best set of parameters according to simulation time. If search is enabled, the manually set value for the corresponding parameter will be ignored. See optimizer/search_params.py for more details.

  2. For opt_method="ensemble", the following keyword arguments must be specified:

    sim (Simulator): the simulator object.

    opt_method_to_params (dict): a dict. For each item, the key is the string of opt_method argument, and the value is the dict object of the corresponding kwargs.

  3. The sim_params (dict) can be specifid to configure the simulation:

    concurrent_copy_comp (bool): if True, concurrent copy and computation will be adopted in simulation.

    debug (bool): if True, a simulation tracing file will be wrote in ./misc/coarsen_graph/sim_tracing.json. You can use chrome://tracing to open it.

  4. The mapping_params (dict) can be specifid to configure the device mapping:

    incl_allr (bool): if True, mapping will consider all reduce.

    incl_p2p (bool): if True, mapping will consider p2p communication.

    parallel_same_t (int): the number of threads for the same target t.

    parallel_degree (int): the number of threads for all targets. The number of different targets will be determined by parallel_degree / parallel_same_t.

Requirements

  • PyTorch == 1.10.1
  • CUDA == 11.3
  • Numba == 0.54.1

Citation

@article{liu2024parmesan,
  author={Liu, Lixin and Liu, Tianji and Jiang, Bentian and Young, Evangeline F.Y.},
  booktitle={IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)}, 
  title={Parmesan: Efficient Partitioning and Mapping Flow for DNN Training on General Device Topology}, 
  year={2024},
}

License

Parmesan is an open source project licensed under a BSD 3-Clause License that can be found in the LICENSE file.

About

Parmesan: Efficient Partitioning and Mapping Flow for DNN Training on General Device Topology

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages