Skip to content

Memory issues on MPS #290

@AnAndroNerd

Description

@AnAndroNerd

Small notes

I'm running this on the latest version of PyTorch and Lightning, this requires one small change to the code in training_utils.py. The removal of line 385 accelerator_connector._register_external_accelerators_and_strategies() however this has no effect on this issue.

MPS and Apple Silicon

MPS (which stands for Metal Performance Shaders) is more or less Apple's version of CUDA, here's the official documentation. On Apple Silicon, there is no difference between VRAM and normal RAM, it's all shared. As such, I shall just call it RAM here.

The issue

While training an Acoustic model, diffsinger uses an abnormally high amount of RAM while training. The worst part is, it scales and doesn't seem to stop. This would imply a memory leak somewhere in the code. I don't believe it's my dataset or my config, as I have trained them on CUDA without issue. There are some known issues with PyTorch memory leaks on MPS, however as I don't know anything about the coding parts of AI nor python as a language, I can't confirm that these are the issues here. pytorch/pytorch#154329 pytorch/pytorch#145374

Image

config.yaml

If someone could add in PyTorch profiling for memory or some other form of seeing where this issue maybe be, that would be greatly appreciated. I know that most of the dev team doesn't have macs, and that I'm the first person to really try training on one so I want to help as much as I can.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions