Skip to content

Abnormally Memory Allocation on GPU During Training #14

@RicePasteM

Description

@RicePasteM

Description

When running the provided training code on 8*4090, we observed significantly higher memory usage on one single GPU compared to other GPUs.

Image

We also observed that at the beggining of the training stage, all GPUs get relatively low memory usage, while some GPUs grows higher after a few training batches. This impedes multi-batch training on 8*4090. Are there any known issues in the code where unexpected memory allocation may happen during training?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions