[MAF-18899] Apply Gradient Checkpoint Config #42
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR description
pipeline_assign 과 비슷하게, gradient checkpoint 를 어느 시점에 해줄 지
torch.moreh.checkpoint_assign()을 통해 할 수 있다. 이를 GPT2, Mistral 코드에 적용하였다.Moreh framework 의 https://github.com/moreh-dev/framework/commit/744f3476de06509ddcd7382928b971134c00d9d2#diff-fd7e20039cc03d4c907ed4ec098d41a9e74fe6db7423e80aca8cd54888a3a8fa 커밋 참고.
관련 Jira issue link: https://moreh.atlassian.net/browse/MAF-18899