Skip to content

Conversation

@nvkevlu
Copy link
Collaborator

@nvkevlu nvkevlu commented Dec 15, 2025

Add recipe for hello cross site eval.

Description

Add recipe for hello cross site eval.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Quick tests passed locally by running ./runtest.sh.
  • In-line docstrings updated.
  • Documentation updated.

Copilot AI review requested due to automatic review settings December 15, 2025 22:25
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 15, 2025

Greptile Summary

Adds Recipe API support for cross-site evaluation (CSE) with NumPy models. This PR introduces two new recipe classes that simplify CSE setup and replaces the old low-level FedJob API examples with cleaner Recipe API implementations.

Key Changes

  • New NumpyCrossSiteEvalRecipe: Standalone cross-site evaluation recipe supporting three modes:

    • Pre-trained models loaded from disk via model_locator_config
    • Initial model evaluation via initial_model parameter
    • Client-only evaluation (no server models)
  • New FedAvgWithCrossSiteEvalRecipe: Combined training and evaluation recipe that runs FedAvg training followed by automatic cross-site evaluation

  • Unified example: Replaced job_cse.py and job_train_and_cse.py with single job.py supporting both modes via --mode flag

  • Improved documentation: README completely rewritten with comprehensive usage examples, parameter explanations, and architectural details

  • Comprehensive tests: Unit tests covering all recipe configurations and parameters

Architecture

The recipes abstract away the complexity of manually configuring controllers, persistors, formatters, model locators, trainers, and validators. They provide a high-level declarative API that automatically sets up the proper job configuration.

For the training+CSE workflow, two controllers run sequentially:

  1. ScatterAndGather for FedAvg training
  2. CrossSiteModelEval for evaluation (uses NPModelLocator to find persisted models)

Code Quality

  • Clean separation of concerns with Pydantic validation
  • Comprehensive docstrings with usage examples
  • Consistent naming (terminology updated from "cross-validation" to "cross-site evaluation")
  • Good test coverage

Confidence Score: 5/5

  • This PR is safe to merge with no issues found
  • Well-structured implementation following established patterns in the codebase. New recipes properly extend the Recipe base class, use Pydantic for validation, and include comprehensive documentation and tests. The example refactoring simplifies usage while maintaining backward compatibility (old files removed, new cleaner API introduced). No logical errors, security issues, or breaking changes detected.
  • No files require special attention

Important Files Changed

Filename Overview
nvflare/app_common/np/recipes/cross_site_eval.py New recipe class for standalone cross-site evaluation. Clean implementation with three server model modes (initial model, pre-trained models, client-only). Good docstrings and Pydantic validation.
nvflare/app_common/np/recipes/fedavg_with_cse.py New recipe combining FedAvg training with cross-site evaluation. Uses two sequential controllers (ScatterAndGather, CrossSiteModelEval). Comprehensive parameter validation and documentation.
examples/hello-world/hello-numpy-cross-val/job.py Replaces old FedJob API scripts with unified Recipe API example. Supports two modes: standalone CSE with pre-trained models and training+CSE. Clean command-line interface.
examples/hello-world/hello-numpy-cross-val/README.md Comprehensive rewrite with detailed documentation for both CSE modes (standalone and training+CSE). Includes usage examples, parameter explanations, and architecture details. Much improved from original.
tests/unit_test/app_common/np/test_numpy_cross_site_eval_recipe.py Comprehensive unit tests for NumpyCrossSiteEvalRecipe covering minimal initialization, all parameters, model locator config, timeouts, and various configurations. Good test coverage.

Sequence Diagram

sequenceDiagram
    participant User
    participant Recipe as NumpyCrossSiteEvalRecipe/FedAvgWithCrossSiteEvalRecipe
    participant Server as Server Controller
    participant ModelLocator as Model Locator
    participant Client1 as Client 1
    participant Client2 as Client 2
    
    Note over User,Client2: Mode 1: Standalone CSE
    User->>Recipe: NumpyCrossSiteEvalRecipe(model_locator_config)
    Recipe->>Server: CrossSiteModelEval Controller
    Recipe->>Server: NPModelLocator (pre-trained models)
    Recipe->>Client1: NPTrainer + NPValidator
    Recipe->>Client2: NPTrainer + NPValidator
    
    Server->>ModelLocator: Locate server models
    ModelLocator-->>Server: server_model_1, server_model_2
    
    Server->>Client1: REQUEST_SUBMIT_MODEL
    Server->>Client2: REQUEST_SUBMIT_MODEL
    Client1->>Server: Submit client_1 model
    Client2->>Server: Submit client_2 model
    
    Server->>Client1: VALIDATE [server_model_1, server_model_2, client_2]
    Server->>Client2: VALIDATE [server_model_1, server_model_2, client_1]
    
    Client1->>Client1: Evaluate each model on local data
    Client2->>Client2: Evaluate each model on local data
    
    Client1-->>Server: Validation results
    Client2-->>Server: Validation results
    
    Server->>Server: Generate cross_val_results.json (all-to-all matrix)
    
    Note over User,Client2: Mode 2: Training + CSE
    User->>Recipe: FedAvgWithCrossSiteEvalRecipe(train_script)
    Recipe->>Server: ScatterAndGather Controller (training)
    Recipe->>Server: CrossSiteModelEval Controller (evaluation)
    Recipe->>Server: NPModelPersistor
    Recipe->>Client1: ScriptRunner + NPTrainer + NPValidator
    Recipe->>Client2: ScriptRunner + NPTrainer + NPValidator
    
    Note over Server,Client2: Phase 1: FedAvg Training
    loop num_rounds
        Server->>Client1: TRAIN (global model)
        Server->>Client2: TRAIN (global model)
        Client1->>Client1: Execute training script
        Client2->>Client2: Execute training script
        Client1-->>Server: Updated weights
        Client2-->>Server: Updated weights
        Server->>Server: Aggregate weights (FedAvg)
    end
    
    Server->>Server: Persist final model
    
    Note over Server,Client2: Phase 2: Cross-Site Evaluation
    Server->>ModelLocator: Locate trained models
    ModelLocator-->>Server: Global model from persistor
    
    Server->>Client1: REQUEST_SUBMIT_MODEL
    Server->>Client2: REQUEST_SUBMIT_MODEL
    Client1->>Server: Submit trained client_1 model
    Client2->>Server: Submit trained client_2 model
    
    Server->>Client1: VALIDATE [global_model, client_2]
    Server->>Client2: VALIDATE [global_model, client_1]
    
    Client1->>Client1: Evaluate each model
    Client2->>Client2: Evaluate each model
    
    Client1-->>Server: Validation results
    Client2-->>Server: Validation results
    
    Server->>Server: Generate cross_val_results.json
    Server-->>User: Training and CSE complete
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (2)

  1. nvflare/app_common/np/recipes/cross_site_eval.py, line 172 (link)

    logic: model_locator_id is assigned but never used. The CrossSiteEval controller (unlike CrossSiteModelEval) doesn't accept a model_locator_id parameter - it only uses a persistor or fobs.loadf() to load models. The NPModelLocator component is registered on the server but will never be invoked by the controller.

  2. nvflare/app_common/np/recipes/cross_site_eval.py, line 190 (link)

    style: formatter_id is assigned but never passed to any controller or used elsewhere. The NPFormatter component is registered but unused.

6 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new NumpyCrossSiteEvalRecipe class to simplify cross-site evaluation setup using the Recipe API. Cross-site evaluation enables clients to evaluate models from other clients and the server, creating an all-to-all performance matrix without sharing data. The recipe provides a high-level abstraction over the lower-level FedJob API, making it easier to configure and run cross-site evaluation workflows with NumPy models.

Key changes:

  • Introduces NumpyCrossSiteEvalRecipe class that encapsulates cross-site evaluation workflow configuration
  • Adds comprehensive unit tests covering various recipe initialization scenarios
  • Provides an example demonstrating standalone cross-site evaluation with pre-trained models using the new Recipe API

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
nvflare/app_common/np/recipes/cross_site_eval.py New recipe implementation for NumPy-based cross-site evaluation, providing a high-level API to configure CrossSiteEval workflow with support for initial models or pre-trained model locations
nvflare/app_common/np/recipes/init.py Exports the new NumpyCrossSiteEvalRecipe class
tests/unit_test/app_common/np/test_numpy_cross_site_eval_recipe.py Comprehensive unit tests covering recipe initialization with various parameter combinations including minimal configs, custom timeouts, model locators, and participating clients
tests/unit_test/app_common/np/init.py New test module initialization file with standard copyright header
examples/hello-world/hello-numpy-cross-val/job.py New example demonstrating standalone cross-site evaluation with pre-trained models using the Recipe API
examples/hello-world/hello-numpy-cross-val/README.md Updated documentation explaining the new Recipe API approach, including setup steps and comparison with legacy FedJob API
Comments suppressed due to low confidence (3)

nvflare/app_common/np/recipes/cross_site_eval.py:180

  • Variable model_locator_id is not used.
            model_locator_id = job.to_server(model_locator, id="model_locator")

nvflare/app_common/np/recipes/cross_site_eval.py:187

  • Variable model_locator_id is not used.
            model_locator_id = job.to_server(model_locator, id="model_locator")

nvflare/app_common/np/recipes/cross_site_eval.py:190

  • Variable formatter_id is not used.
        formatter_id = job.to_server(NPFormatter(), id="formatter")

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@chesterxgchen
Copy link
Collaborator

please handle the comments

@nvkevlu
Copy link
Collaborator Author

nvkevlu commented Dec 16, 2025

/build

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@nvkevlu
Copy link
Collaborator Author

nvkevlu commented Dec 17, 2025

/build

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 23, 2025

Greptile's behavior is changing!

From now on, if a review finishes with no comments, we will not post an additional "statistics" comment to confirm that our review found nothing to comment on. However, you can confirm that we reviewed your changes in the status check section.

This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (4)

  1. nvflare/app_common/np/recipes/__init__.py, line 17 (link)

    logic: Missing implementation file fedavg_with_cse.py - this import will fail at runtime

  2. nvflare/app_common/np/recipes/__init__.py, line 19 (link)

    logic: Remove FedAvgWithCrossSiteEvalRecipe from exports - class doesn't exist

  3. examples/hello-world/hello-numpy-cross-val/job.py, line 34 (link)

    logic: Import fails - FedAvgWithCrossSiteEvalRecipe doesn't exist. Either remove training mode or implement the missing recipe class in nvflare/app_common/np/recipes/fedavg_with_cse.py. Is the FedAvgWithCrossSiteEvalRecipe intended to be added in a follow-up PR, or should the training mode be removed from this example?

  4. examples/hello-world/hello-numpy-cross-val/job.py, line 84-95 (link)

    logic: run_training_and_cse() function references undefined FedAvgWithCrossSiteEvalRecipe - will crash when training mode is used

10 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

@nvkevlu
Copy link
Collaborator Author

nvkevlu commented Dec 28, 2025

/build

@nvkevlu nvkevlu enabled auto-merge (squash) December 30, 2025 15:25
@nvkevlu
Copy link
Collaborator Author

nvkevlu commented Dec 30, 2025

/build

@nvkevlu
Copy link
Collaborator Author

nvkevlu commented Dec 30, 2025

replaced with #3923

@nvkevlu nvkevlu closed this Dec 30, 2025
auto-merge was automatically disabled December 30, 2025 16:58

Pull request was closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants