Fix FusedAdam DTensor compatibility issue #2425

shjwudp · 2025-11-26T03:09:48Z

Description

Recent modifications to FusedAdam have made it incompatible with DTensor. Specifically, in the optimizer state initialization section, the optimizer state is now created according to the global shape of the DTensor instead of creating a DTensor optimizer state with the same shape as the parameters.

To maintain compatibility with DTensor, the state tensors should be initialized using zeros_like(param) or empty_like(param) instead of zeros(param.shape) or empty(param.shape).

Fixes #2424

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Change A
Change B

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

greptile-apps · 2025-11-26T03:14:26Z

Greptile Summary

This PR fixes a DTensor compatibility regression in FusedAdam introduced by commit d52ed47. The core issue was that optimizer states were being initialized using torch.zeros(param.shape) and torch.empty(param.shape), which creates regular tensors based on the global shape of DTensors instead of creating DTensor optimizer states with the same distributed properties as the parameters.

Key Changes:

Modified _initialize_state() method in fused_adam.py to use zeros_like(param) and empty_like(param) instead of zeros(param.shape) and empty(param.shape) (lines 376, 378)
Added comprehensive test coverage for FusedAdam with FSDP2 by parameterizing the optimizer type in tests
Includes a skip condition for the known incompatible combination of fused_adam + mx_fp8_block_scaling + fp8_init due to remaining DTensor issue with FP8 quantization at line 388

Note: Line 388 still uses param.shape in the FP8 quantization path (quantizer.make_empty(param.shape)), which remains incompatible with DTensor. This is why the test explicitly skips the mx_fp8_block_scaling + fp8_init combination when using FusedAdam.

Confidence Score: 4/5

This PR is safe to merge with one remaining edge case limitation documented in tests
The fix correctly addresses the DTensor compatibility issue for the common case by using *_like() functions. However, the FP8 quantization path (line 388) still has a DTensor incompatibility that requires a test skip. This is a known limitation rather than a bug in this PR, as fixing it would require changes to the Float8Quantizer.make_empty() API.
Pay attention to transformer_engine/pytorch/optimizers/fused_adam.py line 388 - the FP8 quantization path still uses param.shape and remains incompatible with DTensor

Important Files Changed

Filename	Overview
transformer_engine/pytorch/optimizers/fused_adam.py	Fixed DTensor compatibility by replacing `zeros(param.shape)` and `empty(param.shape)` with `zeros_like(param)` and `empty_like(param)` in state initialization (lines 376, 378). This ensures optimizer states are created as DTensors when params are DTensors.
tests/pytorch/distributed/test_torch_fsdp2.py	Added `optim_type` parameter to test both FusedAdam and torch.Adam optimizers with FSDP2. Includes skip condition for known incompatible combination: fused_adam + mx_fp8_block_scaling + fp8_init (line 58-59).
tests/pytorch/distributed/run_fsdp2_model.py	Added `--adam` command-line argument to allow choosing between FusedAdam and torch.Adam optimizers (lines 88-90, 329-332). Default changed from torch.Adam to make tests configurable.

Sequence Diagram

sequenceDiagram
    participant User as User/Training Script
    participant FSDP2 as FSDP2 (PyTorch)
    participant Model as TE Model with DTensor params
    participant FusedAdam as FusedAdam Optimizer
    participant StateInit as _initialize_state()
    
    User->>FSDP2: Apply fully_shard() to model
    FSDP2->>Model: Convert params to DTensor
    Note over Model: Parameters are now DTensor<br/>with distributed properties
    
    User->>FusedAdam: Create optimizer(model.parameters())
    User->>FusedAdam: optimizer.step()
    
    FusedAdam->>FusedAdam: Check if state exists for param
    alt State not initialized
        FusedAdam->>StateInit: _initialize_state(param, "exp_avg")
        
        alt Before PR (d52ed47)
            StateInit->>StateInit: torch.zeros(param.shape)
            Note over StateInit: Creates regular tensor<br/>using global shape!<br/>❌ Loses DTensor properties
        else After PR (this fix)
            StateInit->>StateInit: torch.zeros_like(param)
            Note over StateInit: Creates DTensor state<br/>preserving distribution!<br/>✅ Maintains DTensor properties
        end
        
        StateInit->>FusedAdam: Return initialized state
        
        alt FP8 quantization path (dtype==uint8)
            Note over StateInit: quantizer.make_empty(param.shape)<br/>⚠️ Still uses param.shape<br/>Incompatible with DTensor
        end
    end
    
    FusedAdam->>Model: Update parameters with optimizer step

greptile-apps

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

vthumbe1503 · 2025-11-26T07:34:44Z

transformer_engine/pytorch/optimizers/fused_adam.py

        """
        dtype = self.name_to_dtype_map[state_name]
        if store_param_remainders:
-            data = torch.zeros(param.shape, dtype=torch.int16, device=param.device)


Could we also change run_fsdp2_model.py to use te FusedAdam optimizer instead of torch Adam so we dont break this again in the future?

Hi, sorry for the late reply, added FusedAdam in run_fsdp2_model.py.

Thank you! Looks good. LGTM!

Seems like some other fused adam tests are failing

The combination of fused_adam + mxfp8 + fp8_init is problematic.
I’ve temporarily skipped tests for this test case, but I believe it is a bug and needs to be fixed.

Could you trigger the test again?

greptile-apps

_{6 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps

Additional Comments (1)

transformer_engine/pytorch/optimizers/fused_adam.py, line 388 (link)

style: This line still uses param.shape which may cause DTensor incompatibility when dtype == torch.uint8. When param is a DTensor, param.shape returns the global shape, not the local shape. Consider whether FP8 quantized states need similar treatment as the fix on lines 376-378. Does the FP8 quantization path handle DTensor parameters, or are FP8 states only used with regular tensors?

_{3 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

vthumbe1503 · 2025-12-25T03:43:13Z

/te-ci L1 pytorch

…ike(param)/empty_like(param) to support DTensor Signed-off-by: jianbinc <shjwudp@gmail.com>

Signed-off-by: jianbinc <shjwudp@gmail.com>

…his combination test is problematic. 2. set run_fsdp2_model.py default use torch adam Signed-off-by: jianbinc <shjwudp@gmail.com>

for more information, see https://pre-commit.ci

greptile-apps · 2025-12-25T07:18:30Z

Greptile found no issues!

From now on, if a review finishes and we haven't found any issues, we will not post anything, but you can confirm that we reviewed your changes in the status check section.

_{This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".}

shjwudp force-pushed the fused_adam_dtensor_issue branch from 0b1d2db to 629c786 Compare November 26, 2025 03:11

greptile-apps bot reviewed Nov 26, 2025

View reviewed changes

vthumbe1503 reviewed Nov 26, 2025

View reviewed changes

shjwudp force-pushed the fused_adam_dtensor_issue branch from 572c176 to d655670 Compare December 25, 2025 03:22

greptile-apps bot reviewed Dec 25, 2025

View reviewed changes

shjwudp force-pushed the fused_adam_dtensor_issue branch from 5717328 to d372019 Compare December 25, 2025 03:29

shjwudp requested a review from vthumbe1503 December 25, 2025 03:32

greptile-apps bot reviewed Dec 25, 2025

View reviewed changes

shjwudp added 3 commits December 25, 2025 15:14

FusedAdam: replace zeros(param.shape)/empty(param.shape) with zeros_l…

e732105

…ike(param)/empty_like(param) to support DTensor Signed-off-by: jianbinc <shjwudp@gmail.com>

add fused_adam functional test in test_torch_fsdp2

6720257

Signed-off-by: jianbinc <shjwudp@gmail.com>

1. skip fsdp2 UT fused_adam + mxfp8 + fp8_init combination. Because t…

1b08d23

…his combination test is problematic. 2. set run_fsdp2_model.py default use torch adam Signed-off-by: jianbinc <shjwudp@gmail.com>

shjwudp force-pushed the fused_adam_dtensor_issue branch from 925c97c to 1b08d23 Compare December 25, 2025 07:15

[pre-commit.ci] auto fixes from pre-commit.com hooks

c4ec3fc

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix FusedAdam DTensor compatibility issue #2425

Fix FusedAdam DTensor compatibility issue #2425

shjwudp commented Nov 26, 2025 •

edited

Loading

Uh oh!

greptile-apps bot commented Nov 26, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

vthumbe1503 Nov 26, 2025 •

edited

Loading

Uh oh!

shjwudp Dec 25, 2025

Uh oh!

vthumbe1503 Dec 25, 2025

Uh oh!

vthumbe1503 Dec 25, 2025

Uh oh!

shjwudp Dec 25, 2025 •

edited

Loading

Uh oh!

shjwudp Dec 25, 2025

Uh oh!

greptile-apps bot left a comment •

edited

Loading

Uh oh!

greptile-apps bot left a comment •

edited

Loading

Uh oh!

vthumbe1503 commented Dec 25, 2025

Uh oh!

greptile-apps bot commented Dec 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix FusedAdam DTensor compatibility issue #2425

Are you sure you want to change the base?

Fix FusedAdam DTensor compatibility issue #2425

Conversation

shjwudp commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Changes

Checklist:

Uh oh!

greptile-apps bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

vthumbe1503 Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shjwudp Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

vthumbe1503 Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

vthumbe1503 Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

shjwudp Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shjwudp Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (1)

Uh oh!

vthumbe1503 commented Dec 25, 2025

Uh oh!

greptile-apps bot commented Dec 25, 2025

Greptile found no issues!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shjwudp commented Nov 26, 2025 •

edited

Loading

greptile-apps bot commented Nov 26, 2025 •

edited

Loading

vthumbe1503 Nov 26, 2025 •

edited

Loading

shjwudp Dec 25, 2025 •

edited

Loading

greptile-apps bot left a comment •

edited

Loading

greptile-apps bot left a comment •

edited

Loading