After reading the paper I believe the approach here is Open MPI? If so would CPU parallelism be an option here or would network-bound IPC become the limiting factor? I am wondering if Dask MPI integration would make sense as first steps to support it? https://mpi.dask.org/en/latest/