Releases: MooreThreads/SimuMax
v1.1
This release expands SimuMax from a pure estimator into a more complete, workflow-friendly platform. It introduces a new end-user application, adds strategy search capabilities, and provides a new system-config generation pipeline with compute/communication efficiency modeling. In addition, it improves compatibility with Megatron-LM 0.14 (notably for MoE) and enhances communication modeling for hybrid parallel setups.
Highlights
-
NEW! SimuMax App (User Application):
- Added a user-facing application to SimuMax to improve usability and streamline common workflows.
-
NEW! Strategy Search:
- Introduced strategy search support to help users explore and identify better parallelization and execution strategies automatically.
-
NEW! System Config Pipeline:
- Added a pipeline to generate system configuration files, including computing efficiency and communication efficiency characterization, enabling more realistic system-level modeling.
Compatibility & Modeling Improvements
-
Megatron-LM 0.14 Support (MoE Updates):
-
Added support for Megatron-LM v0.14.
-
Updated MoE communication behavior: router probabilities are transferred via a separate all-to-all, which:
- introduces a small additional communication cost,
- but reduces GPU memory usage.
-
-
Improved Bandwidth Contention Modeling (Hybrid Parallelism):
- For cases using EP/TP + DP simultaneously, added modeling of inter-node bandwidth contention caused by multiple DP groups competing for network bandwidth.
v1.0
This release delivers a significant breakthrough in the accuracy of memory and performance estimation for large models. It also introduces several major features to enhance model compatibility, flexibility, and user experience.
Highlights
- Dramatically Improved Estimation Accuracy:
- Memory Estimation: Expanded test coverage for both Dense and MoE models. Memory estimation error is now consistently controlled within 1%.
- Performance Estimation:
- On NVIDIA A100-PCIE, performance estimation error is consistently below 3%.
New Features & Enhancements
- MLA Support:
- Introduced support for the MLA model architecture
- Enhanced Layer Specification:
- Added granular control for defining first-stage and last-stage layers in pipeline parallelism, allowing for more optimized model partitioning.
- Advanced MoE Customization:
- Support for customizable dense layers in Mixture-of-Experts (MoE) models, providing greater flexibility in model design.
- Megatron Compatibility Layer:
- Launched a simplified model migration pipeline for effortless conversion and analysis of models built with NVIDIA's Megatron framework.
- Optimized Recomputation Strategy:
- Implemented finer-grained selective recompute, enabling more precise control over the memory-for-computation trade-off to optimize for larger model sizes or higher throughput.
- Comprehensive Efficiency Analysis:
- New capability to measure and analyze efficiency and utilization across various tensor shapes and memory layouts.
Bug Fixes
- Fixed an incorrect token numbers calculation when etp > 1.
- Corrected the FLOPs or memory access (e.g., HBM access volume) calculation for several operators.
- Resolved inaccuracies in the estimated communication volume and associated data types.