-
Notifications
You must be signed in to change notification settings - Fork 50
Description
-
Context - This observation is based on running a simulation of 120 nodes using GRPC for
traditional_fl. The problem would not be as bad when each node is only interacting with ~10-20 nodes in any given round. -
Issue - Right now the
broadcastfunction is implemented by looping over asendfunction which is a unicast function. This makes broadcast effectively a serially executed function which reduces its effectiveness. -
Solution - While this can be improved by making the send function multi-threaded, I believe a better approach would be to have nodes pull the model updates instead of the super-node pushing it to each node. Even with pull approach, multithreading would be needed to make sure the early nodes wait until the most fresh copy of model weights is available. Furthermore, the server may not respond to the request if too many nodes are already in the request queue so we will have to implement the retry logic. The retry logic is already implemented for
registerfunction in https://github.com/aidecentralized/sonar/blob/main/src/utils/communication/grpc/main.py