Adithyare/mamba dpo #374

arendu · 2024-11-01T17:58:55Z

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Changelog

Please update the CHANGELOG.md under next version with high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation? Make sure to also update the NeMo Framework User Guide which contains the tutorials

Checklist when contributing a new algorithm

Does the trainer resume and restore model state all states?
Does the trainer support all parallelism techniques(PP, TP, DP)?
Does the trainer support max_steps=-1 and validation?
Does the trainer only call APIs defined in alignable_interface.py?
Does the trainer have proper logging?

Additional Information

Related to # (issue)

Signed-off-by: arendu <adithya.r@gmail.com>

Signed-off-by: root <root@cw-dfw-h100-001-129-026.cm.cluster>

Signed-off-by: arendu <adithya.r@gmail.com>

nemo_aligner/models/nlp/gpt/megatron_gpt_dpo_model.py

Signed-off-by: adithyare <adithyare@nvidia.com>

Signed-off-by: arendu <adithya.r@gmail.com>

arendu added 2 commits October 30, 2024 21:00

wip

3ed1cb1

Signed-off-by: arendu <adithya.r@gmail.com>

Merge branch 'main' into adithyare/mamba_dpo

6db2b64

github-actions bot added the Utils label Nov 1, 2024

arendu and others added 3 commits November 1, 2024 18:00

dpo and sft

bc96c95

Signed-off-by: arendu <adithya.r@gmail.com>

dpo support

b8049cd

Signed-off-by: root <root@cw-dfw-h100-001-129-026.cm.cluster>

mamba padding

050e767

Signed-off-by: arendu <adithya.r@gmail.com>

github-actions bot added the Algorithms label Nov 5, 2024

arendu requested review from gshennvm, terrykong and trias702 November 5, 2024 01:23

trias702 reviewed Nov 5, 2024

View reviewed changes

nemo_aligner/models/nlp/gpt/megatron_gpt_dpo_model.py Show resolved Hide resolved

convenience script to remove old format of DPO data

1a4acc9

Signed-off-by: adithyare <adithyare@nvidia.com>

arendu requested a review from trias702 November 14, 2024 01:46

arendu added 2 commits November 14, 2024 04:38

pad to mult 256

93eea80

Signed-off-by: arendu <adithya.r@gmail.com>

copy dpo style cfg overrides

5721741

Signed-off-by: arendu <adithya.r@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adithyare/mamba dpo #374

Adithyare/mamba dpo #374

Uh oh!

arendu commented Nov 1, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Adithyare/mamba dpo #374

Are you sure you want to change the base?

Adithyare/mamba dpo #374

Uh oh!

Conversation

arendu commented Nov 1, 2024

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Checklist when contributing a new algorithm

Additional Information

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants