-
Notifications
You must be signed in to change notification settings - Fork 322
Description
Small notes
I'm running this on the latest version of PyTorch and Lightning, this requires one small change to the code in training_utils.py. The removal of line 385 accelerator_connector._register_external_accelerators_and_strategies() however this has no effect on this issue.
MPS and Apple Silicon
MPS (which stands for Metal Performance Shaders) is more or less Apple's version of CUDA, here's the official documentation. On Apple Silicon, there is no difference between VRAM and normal RAM, it's all shared. As such, I shall just call it RAM here.
The issue
While training an Acoustic model, diffsinger uses an abnormally high amount of RAM while training. The worst part is, it scales and doesn't seem to stop. This would imply a memory leak somewhere in the code. I don't believe it's my dataset or my config, as I have trained them on CUDA without issue. There are some known issues with PyTorch memory leaks on MPS, however as I don't know anything about the coding parts of AI nor python as a language, I can't confirm that these are the issues here. pytorch/pytorch#154329 pytorch/pytorch#145374
If someone could add in PyTorch profiling for memory or some other form of seeing where this issue maybe be, that would be greatly appreciated. I know that most of the dev team doesn't have macs, and that I'm the first person to really try training on one so I want to help as much as I can.
