Skip to content

Improve GRPC broadcast implementation #65

@tremblerz

Description

@tremblerz
  • Context - This observation is based on running a simulation of 120 nodes using GRPC for traditional_fl. The problem would not be as bad when each node is only interacting with ~10-20 nodes in any given round.

  • Issue - Right now the broadcast function is implemented by looping over a send function which is a unicast function. This makes broadcast effectively a serially executed function which reduces its effectiveness.

  • Solution - While this can be improved by making the send function multi-threaded, I believe a better approach would be to have nodes pull the model updates instead of the super-node pushing it to each node. Even with pull approach, multithreading would be needed to make sure the early nodes wait until the most fresh copy of model weights is available. Furthermore, the server may not respond to the request if too many nodes are already in the request queue so we will have to implement the retry logic. The retry logic is already implemented for register function in https://github.com/aidecentralized/sonar/blob/main/src/utils/communication/grpc/main.py

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions