Skip to content

Releases: MooreThreads/SimuMax

v1.1

22 Dec 08:49
6f4cb1f

Choose a tag to compare

This release expands SimuMax from a pure estimator into a more complete, workflow-friendly platform. It introduces a new end-user application, adds strategy search capabilities, and provides a new system-config generation pipeline with compute/communication efficiency modeling. In addition, it improves compatibility with Megatron-LM 0.14 (notably for MoE) and enhances communication modeling for hybrid parallel setups.

Highlights

  • NEW! SimuMax App (User Application):

    • Added a user-facing application to SimuMax to improve usability and streamline common workflows.
  • NEW! Strategy Search:

    • Introduced strategy search support to help users explore and identify better parallelization and execution strategies automatically.
  • NEW! System Config Pipeline:

    • Added a pipeline to generate system configuration files, including computing efficiency and communication efficiency characterization, enabling more realistic system-level modeling.

Compatibility & Modeling Improvements

  • Megatron-LM 0.14 Support (MoE Updates):

    • Added support for Megatron-LM v0.14.

    • Updated MoE communication behavior: router probabilities are transferred via a separate all-to-all, which:

      • introduces a small additional communication cost,
      • but reduces GPU memory usage.
  • Improved Bandwidth Contention Modeling (Hybrid Parallelism):

    • For cases using EP/TP + DP simultaneously, added modeling of inter-node bandwidth contention caused by multiple DP groups competing for network bandwidth.

v1.0

26 Aug 02:37
a613115

Choose a tag to compare

This release delivers a significant breakthrough in the accuracy of memory and performance estimation for large models. It also introduces several major features to enhance model compatibility, flexibility, and user experience.

Highlights

  • Dramatically Improved Estimation Accuracy:
    • Memory Estimation: Expanded test coverage for both Dense and MoE models. Memory estimation error is now consistently controlled within 1%.
    • Performance Estimation:
      • On NVIDIA A100-PCIE, performance estimation error is consistently below 3%.

New Features & Enhancements

  • MLA Support:
    • Introduced support for the MLA model architecture
  • Enhanced Layer Specification:
    • Added granular control for defining first-stage and last-stage layers in pipeline parallelism, allowing for more optimized model partitioning.
  • Advanced MoE Customization:
    • Support for customizable dense layers in Mixture-of-Experts (MoE) models, providing greater flexibility in model design.
  • Megatron Compatibility Layer:
    • Launched a simplified model migration pipeline for effortless conversion and analysis of models built with NVIDIA's Megatron framework.
  • Optimized Recomputation Strategy:
    • Implemented finer-grained selective recompute, enabling more precise control over the memory-for-computation trade-off to optimize for larger model sizes or higher throughput.
  • Comprehensive Efficiency Analysis:
    • New capability to measure and analyze efficiency and utilization across various tensor shapes and memory layouts.

Bug Fixes

  • Fixed an incorrect token numbers calculation when etp > 1.
  • Corrected the FLOPs or memory access (e.g., HBM access volume) calculation for several operators.
  • Resolved inaccuracies in the estimated communication volume and associated data types.