Skip to content

Conversation

@provos
Copy link

@provos provos commented Nov 30, 2025

Adds MPS device support for both image and video predictors on Apple Silicon.

Changes:

  • Add get_default_device() utility that detects MPS availability
  • Fix device mismatches (coords cache, freqs_cis cache)
  • Add MPS workaround for complex tensor repeat() in RoPE
  • Make torch._assert_async conditional on CUDA
  • Fix MPS memory leak in video predictor via synchronization points

Performance of the Video predictor:

  • ~3x faster than CPU
  • Runs with ~38GB peak memory. This is due to the way that MPS caches graphs. Before adding the synchronization points, running the video predictor would consume all available memory.

this has prebuilt wheels for apple silicon. bump numpy from 1.26 to 1.26.4 to meet dependency requirements for decord2
Allows systems without CUDA to fallback to CPU.
The pin_memory() optimization is only available for CUDA backends.
CUDA handles this internally but we need to handle it directly for CPU
…or cpu

introduce workarounds for torch operations not available on mps like repeats of complex tensors
forcefully flush pending operations with synchronize and empty the cache.
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 30, 2025
Copy link

@mattiagaggi mattiagaggi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to overlap with your other pull request :)

Also I'd make some tests

@provos
Copy link
Author

provos commented Dec 3, 2025

This is a continuation of the other PR. I just didn't want to clutter things when I didn't know yet how hard MPS support was going to be. I can add the end-to-end test for the video predictor to the PR but didn't want to pollute your repo. Is there a place you would like tests to go?

Copy link

@drduhe drduhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesmoe and does a better job that was I was getting at in my PR, #326 in addressing the Mac deployments. I am going to remove that logic from my PR now seeing that this is in the pipeline.

@provos
Copy link
Author

provos commented Dec 11, 2025

@drduhe do you know why the import is stuck? i noticed the same with my other PR which was approved a week(?) ago?

@drduhe
Copy link

drduhe commented Dec 11, 2025

@drduhe do you know why the import is stuck? i noticed the same with my other PR which was approved a week(?) ago?

@provos I am not a maintainer on the project so it probably needs to get approval from someone with more pull than I have =). But FWIW I did create a tests directory as part of my PR as a starting point for where we can put unit tests. @mattiagaggi might have more information on the process, SLA, and workflow requirements.

@nico1996it
Copy link

It would be great to have this merged...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants