docs: Add DeepSeek entry in Changelog and bump Nemo commit #527

yfw · 2025-03-06T20:10:34Z

What does this PR do ?

Add DeepSeek entry in Changelog and bump Nemo commit.

Changelog

Please update the CHANGELOG.md under next version with high level changes in this PR.

Usage

You can potentially add a usage example below

/usr/bin/python /opt/NeMo-Aligner/examples/nlp/gpt/train_gpt_sft.py \
   trainer.precision=bf16 \
   trainer.num_nodes=64 \
   trainer.devices=8 \
   trainer.sft.max_steps=${MAX_STEPS} \
   trainer.sft.limit_val_batches=null \
   trainer.sft.val_check_interval=100 \
   model.megatron_amp_O2=True \
   model.restore_from_path=${MODEL} \
   model.optim.lr=5e-6 \
   model.optim.name=mcore_distributed_optim \
   model.optim.sched.min_lr=5e-7 \
   model.optim.sched.warmup_steps=50 \
   model.optim.sched.constant_steps=0 \
   ++model.optim.eps=1e-5 \
   ++model.optim.sched.max_steps=${MAX_STEPS} \
   model.tensor_model_parallel_size=${TP_SIZE} \
   model.pipeline_model_parallel_size=${PP_SIZE} \
   model.data.chat=False \
   model.data.num_workers=0 \
   model.data.train_ds.micro_batch_size=1 \
   model.data.train_ds.global_batch_size=128 \
   model.data.train_ds.max_seq_length=2048 \
   model.data.train_ds.file_path=${TRAIN_DS} \
   model.data.validation_ds.micro_batch_size=1 \
   model.data.validation_ds.global_batch_size=128 \
   model.data.validation_ds.file_path=${VALID_DS} \
   model.data.validation_ds.max_seq_length=2048 \
   exp_manager.create_wandb_logger=True \
   exp_manager.explicit_log_dir=${OUTPUT_DIR} \
   exp_manager.wandb_logger_kwargs.project=debug-nemo2 \
   exp_manager.wandb_logger_kwargs.name=sft_deepseekv3-dfw-adam_eps1e-5 \
   exp_manager.checkpoint_callback_params.save_nemo_on_train_end=False \
   exp_manager.resume_if_exists=False \
   exp_manager.resume_ignore_no_checkpoint=True \
   exp_manager.create_checkpoint_callback=True \
   exp_manager.checkpoint_callback_params.monitor=val_loss \
   ++model.num_layers_in_first_pipeline_stage=${PP_FIRST} \
   ++model.num_layers_in_last_pipeline_stage=${PP_LAST} \
   ++model.expert_model_parallel_size=${EP_SIZE} \
   ++model.transformer_engine=True \
   ++model.dist_ckpt_load_strictness=log_all \
   ++model.name=decoder_block_gpt \
   ++model.moe_layer_freq=[0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1] \
   ++model.seq_length=2048

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation? Make sure to also update the NeMo Framework User Guide which contains the tutorials

Checklist when contributing a new algorithm

Does the trainer resume and restore model state all states?
Does the trainer support all parallelism techniques(PP, TP, DP)?
Does the trainer support max_steps=-1 and validation?
Does the trainer only call APIs defined in alignable_interface.py?
Does the trainer have proper logging?

Additional Information

Related to # (issue)

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

jgerh

Reviewed the revisions in CHANGELOG.md. No copyedits needed. Approved.

yfw requested a review from terrykong March 6, 2025 20:10

yfw changed the title ~~Add DeepSeek entry in Changelog and bump Nemo commit~~ feat: Add DeepSeek entry in Changelog and bump Nemo commit Mar 6, 2025

Add DeepSeek entry in Changelog and bump Nemo commit

374253e

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

yfw force-pushed the yifu/deepseek_doc branch from 2c3c245 to 374253e Compare March 6, 2025 20:29

yfw changed the title ~~feat: Add DeepSeek entry in Changelog and bump Nemo commit~~ docs: Add DeepSeek entry in Changelog and bump Nemo commit Mar 7, 2025

yfw requested a review from jgerh March 7, 2025 02:07

jgerh approved these changes Mar 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: Add DeepSeek entry in Changelog and bump Nemo commit #527

docs: Add DeepSeek entry in Changelog and bump Nemo commit #527

yfw commented Mar 6, 2025

Uh oh!

jgerh left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

docs: Add DeepSeek entry in Changelog and bump Nemo commit #527

Are you sure you want to change the base?

docs: Add DeepSeek entry in Changelog and bump Nemo commit #527

Conversation

yfw commented Mar 6, 2025

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Checklist when contributing a new algorithm

Additional Information

Uh oh!

jgerh left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants