Skip to content

[BUG][2.5] Aborting jobs that use flower #3830

@SlamanigG

Description

@SlamanigG

Description
When aborting a NVFlare job that uses Flower for the training logic, the abort does not interrupt the training loop (it only prevents executing the next round or evaluation).

To Reproduce
Steps to reproduce the behavior:

  1. Clone the hello-fower example (branch 2.5).
  2. Start POC mode.
  3. Inspect what python processes are running before submitting the job.
  4. Submit the flower_pt job.
  5. Optional: Inspect what python processes are running during submitting the job.
  6. Abort the job using abort_job.
  7. Inspect what python processes are running after submitting the job. See that the CPU/GPU load does not drop after aborting the job but the training continues in the background.

Expected behavior
Training loop should be stopped by abort signal.

Environment:

  • OS: macOS 13.4.1, RedHat Enterprise Linux 9.6 (Plow)
  • Python Version: 3.11
  • NVFlare Version: 2.5.2 ([this commit])
  • Flower Version: 1.11.0rc0
  • Tested on CPU

Is there some kind of interrupt signal that i can listen to in client.py and pass to my train() function to stop it when the job is aborted?

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions