Releases · MooreThreads/SimuMax

This release expands SimuMax from a pure estimator into a more complete, workflow-friendly platform. It introduces a new end-user application, adds strategy search capabilities, and provides a new system-config generation pipeline with compute/communication efficiency modeling. In addition, it improves compatibility with Megatron-LM 0.14 (notably for MoE) and enhances communication modeling for hybrid parallel setups.

Highlights

NEW! SimuMax App (User Application):
- Added a user-facing application to SimuMax to improve usability and streamline common workflows.
NEW! Strategy Search:
- Introduced strategy search support to help users explore and identify better parallelization and execution strategies automatically.
NEW! System Config Pipeline:
- Added a pipeline to generate system configuration files, including computing efficiency and communication efficiency characterization, enabling more realistic system-level modeling.

Compatibility & Modeling Improvements

Megatron-LM 0.14 Support (MoE Updates):
- Added support for Megatron-LM v0.14.
- Updated MoE communication behavior: router probabilities are transferred via a separate all-to-all, which:
  - introduces a small additional communication cost,
  - but reduces GPU memory usage.
Improved Bandwidth Contention Modeling (Hybrid Parallelism):
- For cases using EP/TP + DP simultaneously, added modeling of inter-node bandwidth contention caused by multiple DP groups competing for network bandwidth.

This release delivers a significant breakthrough in the accuracy of memory and performance estimation for large models. It also introduces several major features to enhance model compatibility, flexibility, and user experience.

Highlights

Dramatically Improved Estimation Accuracy:
- Memory Estimation: Expanded test coverage for both Dense and MoE models. Memory estimation error is now consistently controlled within 1%.
- Performance Estimation:
  - On NVIDIA A100-PCIE, performance estimation error is consistently below 3%.

New Features & Enhancements

MLA Support:
- Introduced support for the MLA model architecture
Enhanced Layer Specification:
- Added granular control for defining first-stage and last-stage layers in pipeline parallelism, allowing for more optimized model partitioning.
Advanced MoE Customization:
- Support for customizable dense layers in Mixture-of-Experts (MoE) models, providing greater flexibility in model design.
Megatron Compatibility Layer:
- Launched a simplified model migration pipeline for effortless conversion and analysis of models built with NVIDIA's Megatron framework.
Optimized Recomputation Strategy:
- Implemented finer-grained selective recompute, enabling more precise control over the memory-for-computation trade-off to optimize for larger model sizes or higher throughput.
Comprehensive Efficiency Analysis:
- New capability to measure and analyze efficiency and utilization across various tensor shapes and memory layouts.

Bug Fixes

Fixed an incorrect token numbers calculation when etp > 1.
Corrected the FLOPs or memory access (e.g., HBM access volume) calculation for several operators.
Resolved inaccuracies in the estimated communication volume and associated data types.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

Compatibility & Modeling Improvements

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

New Features & Enhancements

Bug Fixes

Uh oh!

Releases: MooreThreads/SimuMax

v1.1

Highlights

Compatibility & Modeling Improvements

Uh oh!

v1.0

Highlights

New Features & Enhancements

Bug Fixes

Uh oh!