-
Notifications
You must be signed in to change notification settings - Fork 228
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Description
When aborting a NVFlare job that uses Flower for the training logic, the abort does not interrupt the training loop (it only prevents executing the next round or evaluation).
To Reproduce
Steps to reproduce the behavior:
- Clone the hello-fower example (branch 2.5).
- Start POC mode.
- Inspect what python processes are running before submitting the job.
- Submit the
flower_ptjob. - Optional: Inspect what python processes are running during submitting the job.
- Abort the job using
abort_job. - Inspect what python processes are running after submitting the job. See that the CPU/GPU load does not drop after aborting the job but the training continues in the background.
Expected behavior
Training loop should be stopped by abort signal.
Environment:
- OS: macOS 13.4.1, RedHat Enterprise Linux 9.6 (Plow)
- Python Version: 3.11
- NVFlare Version: 2.5.2 ([this commit])
- Flower Version: 1.11.0rc0
- Tested on CPU
Is there some kind of interrupt signal that i can listen to in client.py and pass to my train() function to stop it when the job is aborted?
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working