Skip to content
This repository was archived by the owner on Nov 19, 2025. It is now read-only.

Conversation

@yfw
Copy link
Collaborator

@yfw yfw commented Mar 6, 2025

What does this PR do ?

Add DeepSeek entry in Changelog and bump Nemo commit.

Changelog

  • Please update the CHANGELOG.md under next version with high level changes in this PR.

Usage

  • You can potentially add a usage example below
/usr/bin/python /opt/NeMo-Aligner/examples/nlp/gpt/train_gpt_sft.py \
   trainer.precision=bf16 \
   trainer.num_nodes=64 \
   trainer.devices=8 \
   trainer.sft.max_steps=${MAX_STEPS} \
   trainer.sft.limit_val_batches=null \
   trainer.sft.val_check_interval=100 \
   model.megatron_amp_O2=True \
   model.restore_from_path=${MODEL} \
   model.optim.lr=5e-6 \
   model.optim.name=mcore_distributed_optim \
   model.optim.sched.min_lr=5e-7 \
   model.optim.sched.warmup_steps=50 \
   model.optim.sched.constant_steps=0 \
   ++model.optim.eps=1e-5 \
   ++model.optim.sched.max_steps=${MAX_STEPS} \
   model.tensor_model_parallel_size=${TP_SIZE} \
   model.pipeline_model_parallel_size=${PP_SIZE} \
   model.data.chat=False \
   model.data.num_workers=0 \
   model.data.train_ds.micro_batch_size=1 \
   model.data.train_ds.global_batch_size=128 \
   model.data.train_ds.max_seq_length=2048 \
   model.data.train_ds.file_path=${TRAIN_DS} \
   model.data.validation_ds.micro_batch_size=1 \
   model.data.validation_ds.global_batch_size=128 \
   model.data.validation_ds.file_path=${VALID_DS} \
   model.data.validation_ds.max_seq_length=2048 \
   exp_manager.create_wandb_logger=True \
   exp_manager.explicit_log_dir=${OUTPUT_DIR} \
   exp_manager.wandb_logger_kwargs.project=debug-nemo2 \
   exp_manager.wandb_logger_kwargs.name=sft_deepseekv3-dfw-adam_eps1e-5 \
   exp_manager.checkpoint_callback_params.save_nemo_on_train_end=False \
   exp_manager.resume_if_exists=False \
   exp_manager.resume_ignore_no_checkpoint=True \
   exp_manager.create_checkpoint_callback=True \
   exp_manager.checkpoint_callback_params.monitor=val_loss \
   ++model.num_layers_in_first_pipeline_stage=${PP_FIRST} \
   ++model.num_layers_in_last_pipeline_stage=${PP_LAST} \
   ++model.expert_model_parallel_size=${EP_SIZE} \
   ++model.transformer_engine=True \
   ++model.dist_ckpt_load_strictness=log_all \
   ++model.name=decoder_block_gpt \
   ++model.moe_layer_freq=[0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1] \
   ++model.seq_length=2048

Before your PR is "Ready for review"

Pre checks:

Checklist when contributing a new algorithm

  • Does the trainer resume and restore model state all states?
  • Does the trainer support all parallelism techniques(PP, TP, DP)?
  • Does the trainer support max_steps=-1 and validation?
  • Does the trainer only call APIs defined in alignable_interface.py?
  • Does the trainer have proper logging?

Additional Information

  • Related to # (issue)

@yfw yfw requested a review from terrykong March 6, 2025 20:10
@yfw yfw changed the title Add DeepSeek entry in Changelog and bump Nemo commit feat: Add DeepSeek entry in Changelog and bump Nemo commit Mar 6, 2025
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
@yfw yfw force-pushed the yifu/deepseek_doc branch from 2c3c245 to 374253e Compare March 6, 2025 20:29
@yfw yfw changed the title feat: Add DeepSeek entry in Changelog and bump Nemo commit docs: Add DeepSeek entry in Changelog and bump Nemo commit Mar 7, 2025
@yfw yfw requested a review from jgerh March 7, 2025 02:07
Copy link
Collaborator

@jgerh jgerh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the revisions in CHANGELOG.md. No copyedits needed. Approved.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants