Abnormally Memory Allocation on GPU During Training

# Description
When running the provided training code on 8*4090, we observed significantly higher memory usage on **one single GPU** compared to other GPUs.

![Image](https://github.com/user-attachments/assets/ac8ade14-2c29-42e4-bd7c-f2d67e5880f5)

We also observed that at the beggining of the training stage, all GPUs get relatively low memory usage, while some GPUs grows higher after a few training batches. This impedes multi-batch training on 8*4090. Are there any known issues in the code where unexpected memory allocation may happen during training?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abnormally Memory Allocation on GPU During Training #14

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Abnormally Memory Allocation on GPU During Training #14

Description

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions