MicroMix

MicroMix is a mixed-precision quantization method using MXFP8/MXFP6/MXFP4. (paper: arxiv)

1. Installation

conda create -n micromix python=3.10 -y
conda activate micromix

Please make sure that CUDA 12.8 is in your environment.

git clone --recurse-submodules https://github.com/lwy2020/MicroMix.git
cd MicroMix
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt

2. Usage

2.1 Preprocessing

Reorder_indices, p6_num, p8_num are needed for quantization:

python reorder_indices.py --model /PATH/TO/YOUR/MODEL/ --samples 32 --seqlen 2048 --act_sort_metric mean

Results are saved in saved/

2.2 Building Kernels

Please refer to mgemm/README.md

cd mgemm/

2.3 Zero-shot, Few-shot Accuracy and Perplexity Evaluation

bash test.sh /PATH/TO/YOUR/MODEL/

2.4 Code Generation Evaluation

bash eval_plus/test.sh Qwen/Qwen2.5-Coder-32B-Instruct  '32B'

If you want to use the MicroMix kernel but not our algorithm, you can directly set p4_num, p6_num, p8_num (line 41-43 in /model/qLinearLayer.py) as the numbers you want 😄

3. Efficiency Evaluation

MicroMix efficiency:

python benchmarks/benchmark_e2e_micromix.py --model 'llama-3.1-8b' --batch_size 8 --prefill_seq_len 2048

FP16 efficiency:

python benchmarks/benchmark_e2e_fp16.py --model /PATH/TO/YOUR_MODEL --batch_size 8 --prefill_seq_len 2048

INT8 efficiency:

pip install bitsandbytes==0.47.0
python benchmarks/benchmark_e2e_int8.py --model /PATH/TO/YOUR_MODEL --batch_size 12 --prefill_seq_len 2048

Citation

@misc{liu2025micromixefficientmixedprecisionquantization,
      title={MicroMix: Efficient Mixed-Precision Quantization with Microscaling Formats for Large Language Models}, 
      author={Wenyuan Liu and Haoqian Meng and Yilun Luo and Peng Zhang and Xindian Ma},
      year={2025},
      eprint={2508.02343},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2508.02343}, 
}

Acknowledagement

Our code is built on the following repos, thank you for your contributions to community 👍:

Name		Name	Last commit message	Last commit date
Latest commit History 131 Commits
benchmarks		benchmarks
cutlass @ a1aaf23		cutlass @ a1aaf23
figures		figures
mgemm		mgemm
model		model
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
prof_micromix.sh		prof_micromix.sh
reorder_indices.py		reorder_indices.py
requirements.txt		requirements.txt
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MicroMix

1. Installation

2. Usage

2.1 Preprocessing

2.2 Building Kernels

2.3 Zero-shot, Few-shot Accuracy and Perplexity Evaluation

2.4 Code Generation Evaluation

3. Efficiency Evaluation

Citation

Acknowledagement

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

lwy2020/MicroMix

Folders and files

Latest commit

History

Repository files navigation

MicroMix

1. Installation

2. Usage

2.1 Preprocessing

2.2 Building Kernels

2.3 Zero-shot, Few-shot Accuracy and Perplexity Evaluation

2.4 Code Generation Evaluation

3. Efficiency Evaluation

Citation

Acknowledagement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages