Add recipe for hello cross site eval #3895

nvkevlu · 2025-12-15T22:25:24Z

Add recipe for hello cross site eval.

Description

Add recipe for hello cross site eval.

Types of changes

Non-breaking change (fix or new feature that would not break existing functionality).
Breaking change (fix or new feature that would cause existing functionality to change).
New tests added to cover the changes.
Quick tests passed locally by running ./runtest.sh.
In-line docstrings updated.
Documentation updated.

greptile-apps · 2025-12-15T22:28:43Z

Greptile Summary

Adds Recipe API support for cross-site evaluation (CSE) with NumPy models. This PR introduces two new recipe classes that simplify CSE setup and replaces the old low-level FedJob API examples with cleaner Recipe API implementations.

Key Changes

New NumpyCrossSiteEvalRecipe: Standalone cross-site evaluation recipe supporting three modes:
- Pre-trained models loaded from disk via model_locator_config
- Initial model evaluation via initial_model parameter
- Client-only evaluation (no server models)
New FedAvgWithCrossSiteEvalRecipe: Combined training and evaluation recipe that runs FedAvg training followed by automatic cross-site evaluation
Unified example: Replaced job_cse.py and job_train_and_cse.py with single job.py supporting both modes via --mode flag
Improved documentation: README completely rewritten with comprehensive usage examples, parameter explanations, and architectural details
Comprehensive tests: Unit tests covering all recipe configurations and parameters

Architecture

The recipes abstract away the complexity of manually configuring controllers, persistors, formatters, model locators, trainers, and validators. They provide a high-level declarative API that automatically sets up the proper job configuration.

For the training+CSE workflow, two controllers run sequentially:

ScatterAndGather for FedAvg training
CrossSiteModelEval for evaluation (uses NPModelLocator to find persisted models)

Code Quality

Clean separation of concerns with Pydantic validation
Comprehensive docstrings with usage examples
Consistent naming (terminology updated from "cross-validation" to "cross-site evaluation")
Good test coverage

Confidence Score: 5/5

This PR is safe to merge with no issues found
Well-structured implementation following established patterns in the codebase. New recipes properly extend the Recipe base class, use Pydantic for validation, and include comprehensive documentation and tests. The example refactoring simplifies usage while maintaining backward compatibility (old files removed, new cleaner API introduced). No logical errors, security issues, or breaking changes detected.
No files require special attention

Important Files Changed

Filename	Overview
nvflare/app_common/np/recipes/cross_site_eval.py	New recipe class for standalone cross-site evaluation. Clean implementation with three server model modes (initial model, pre-trained models, client-only). Good docstrings and Pydantic validation.
nvflare/app_common/np/recipes/fedavg_with_cse.py	New recipe combining FedAvg training with cross-site evaluation. Uses two sequential controllers (ScatterAndGather, CrossSiteModelEval). Comprehensive parameter validation and documentation.
examples/hello-world/hello-numpy-cross-val/job.py	Replaces old FedJob API scripts with unified Recipe API example. Supports two modes: standalone CSE with pre-trained models and training+CSE. Clean command-line interface.
examples/hello-world/hello-numpy-cross-val/README.md	Comprehensive rewrite with detailed documentation for both CSE modes (standalone and training+CSE). Includes usage examples, parameter explanations, and architecture details. Much improved from original.
tests/unit_test/app_common/np/test_numpy_cross_site_eval_recipe.py	Comprehensive unit tests for NumpyCrossSiteEvalRecipe covering minimal initialization, all parameters, model locator config, timeouts, and various configurations. Good test coverage.

Sequence Diagram

sequenceDiagram
    participant User
    participant Recipe as NumpyCrossSiteEvalRecipe/FedAvgWithCrossSiteEvalRecipe
    participant Server as Server Controller
    participant ModelLocator as Model Locator
    participant Client1 as Client 1
    participant Client2 as Client 2
    
    Note over User,Client2: Mode 1: Standalone CSE
    User->>Recipe: NumpyCrossSiteEvalRecipe(model_locator_config)
    Recipe->>Server: CrossSiteModelEval Controller
    Recipe->>Server: NPModelLocator (pre-trained models)
    Recipe->>Client1: NPTrainer + NPValidator
    Recipe->>Client2: NPTrainer + NPValidator
    
    Server->>ModelLocator: Locate server models
    ModelLocator-->>Server: server_model_1, server_model_2
    
    Server->>Client1: REQUEST_SUBMIT_MODEL
    Server->>Client2: REQUEST_SUBMIT_MODEL
    Client1->>Server: Submit client_1 model
    Client2->>Server: Submit client_2 model
    
    Server->>Client1: VALIDATE [server_model_1, server_model_2, client_2]
    Server->>Client2: VALIDATE [server_model_1, server_model_2, client_1]
    
    Client1->>Client1: Evaluate each model on local data
    Client2->>Client2: Evaluate each model on local data
    
    Client1-->>Server: Validation results
    Client2-->>Server: Validation results
    
    Server->>Server: Generate cross_val_results.json (all-to-all matrix)
    
    Note over User,Client2: Mode 2: Training + CSE
    User->>Recipe: FedAvgWithCrossSiteEvalRecipe(train_script)
    Recipe->>Server: ScatterAndGather Controller (training)
    Recipe->>Server: CrossSiteModelEval Controller (evaluation)
    Recipe->>Server: NPModelPersistor
    Recipe->>Client1: ScriptRunner + NPTrainer + NPValidator
    Recipe->>Client2: ScriptRunner + NPTrainer + NPValidator
    
    Note over Server,Client2: Phase 1: FedAvg Training
    loop num_rounds
        Server->>Client1: TRAIN (global model)
        Server->>Client2: TRAIN (global model)
        Client1->>Client1: Execute training script
        Client2->>Client2: Execute training script
        Client1-->>Server: Updated weights
        Client2-->>Server: Updated weights
        Server->>Server: Aggregate weights (FedAvg)
    end
    
    Server->>Server: Persist final model
    
    Note over Server,Client2: Phase 2: Cross-Site Evaluation
    Server->>ModelLocator: Locate trained models
    ModelLocator-->>Server: Global model from persistor
    
    Server->>Client1: REQUEST_SUBMIT_MODEL
    Server->>Client2: REQUEST_SUBMIT_MODEL
    Client1->>Server: Submit trained client_1 model
    Client2->>Server: Submit trained client_2 model
    
    Server->>Client1: VALIDATE [global_model, client_2]
    Server->>Client2: VALIDATE [global_model, client_1]
    
    Client1->>Client1: Evaluate each model
    Client2->>Client2: Evaluate each model
    
    Client1-->>Server: Validation results
    Client2-->>Server: Validation results
    
    Server->>Server: Generate cross_val_results.json
    Server-->>User: Training and CSE complete

greptile-apps

Additional Comments (2)

nvflare/app_common/np/recipes/cross_site_eval.py, line 172 (link)

logic: model_locator_id is assigned but never used. The CrossSiteEval controller (unlike CrossSiteModelEval) doesn't accept a model_locator_id parameter - it only uses a persistor or fobs.loadf() to load models. The NPModelLocator component is registered on the server but will never be invoked by the controller.
nvflare/app_common/np/recipes/cross_site_eval.py, line 190 (link)

style: formatter_id is assigned but never passed to any controller or used elsewhere. The NPFormatter component is registered but unused.

_{6 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

Copilot

Pull request overview

This PR adds a new NumpyCrossSiteEvalRecipe class to simplify cross-site evaluation setup using the Recipe API. Cross-site evaluation enables clients to evaluate models from other clients and the server, creating an all-to-all performance matrix without sharing data. The recipe provides a high-level abstraction over the lower-level FedJob API, making it easier to configure and run cross-site evaluation workflows with NumPy models.

Key changes:

Introduces NumpyCrossSiteEvalRecipe class that encapsulates cross-site evaluation workflow configuration
Adds comprehensive unit tests covering various recipe initialization scenarios
Provides an example demonstrating standalone cross-site evaluation with pre-trained models using the new Recipe API

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
nvflare/app_common/np/recipes/cross_site_eval.py	New recipe implementation for NumPy-based cross-site evaluation, providing a high-level API to configure CrossSiteEval workflow with support for initial models or pre-trained model locations
nvflare/app_common/np/recipes/init.py	Exports the new NumpyCrossSiteEvalRecipe class
tests/unit_test/app_common/np/test_numpy_cross_site_eval_recipe.py	Comprehensive unit tests covering recipe initialization with various parameter combinations including minimal configs, custom timeouts, model locators, and participating clients
tests/unit_test/app_common/np/init.py	New test module initialization file with standard copyright header
examples/hello-world/hello-numpy-cross-val/job.py	New example demonstrating standalone cross-site evaluation with pre-trained models using the Recipe API
examples/hello-world/hello-numpy-cross-val/README.md	Updated documentation explaining the new Recipe API approach, including setup steps and comparison with legacy FedJob API

Comments suppressed due to low confidence (3)

nvflare/app_common/np/recipes/cross_site_eval.py:180

Variable model_locator_id is not used.

            model_locator_id = job.to_server(model_locator, id="model_locator")

nvflare/app_common/np/recipes/cross_site_eval.py:187

Variable model_locator_id is not used.

            model_locator_id = job.to_server(model_locator, id="model_locator")

nvflare/app_common/np/recipes/cross_site_eval.py:190

Variable formatter_id is not used.

        formatter_id = job.to_server(NPFormatter(), id="formatter")

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

nvflare/app_common/np/recipes/cross_site_eval.py

chesterxgchen · 2025-12-16T04:45:11Z

please handle the comments

nvkevlu · 2025-12-16T14:36:07Z

/build

greptile-apps

_{6 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

nvkevlu · 2025-12-17T19:30:10Z

/build

examples/hello-world/hello-numpy-cross-val/README.md

nvflare/app_common/np/recipes/cross_site_eval.py

greptile-apps · 2025-12-23T16:39:37Z

Greptile's behavior is changing!

From now on, if a review finishes with no comments, we will not post an additional "statistics" comment to confirm that our review found nothing to comment on. However, you can confirm that we reviewed your changes in the status check section.

_{This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".}

examples/hello-world/hello-numpy-cross-val/job.py

greptile-apps

Additional Comments (4)

nvflare/app_common/np/recipes/__init__.py, line 17 (link)

logic: Missing implementation file fedavg_with_cse.py - this import will fail at runtime
nvflare/app_common/np/recipes/__init__.py, line 19 (link)

logic: Remove FedAvgWithCrossSiteEvalRecipe from exports - class doesn't exist
examples/hello-world/hello-numpy-cross-val/job.py, line 34 (link)

logic: Import fails - FedAvgWithCrossSiteEvalRecipe doesn't exist. Either remove training mode or implement the missing recipe class in nvflare/app_common/np/recipes/fedavg_with_cse.py. Is the FedAvgWithCrossSiteEvalRecipe intended to be added in a follow-up PR, or should the training mode be removed from this example?
examples/hello-world/hello-numpy-cross-val/job.py, line 84-95 (link)

logic: run_training_and_cse() function references undefined FedAvgWithCrossSiteEvalRecipe - will crash when training mode is used

_{10 files reviewed, 4 comments}

_{Edit Code Review Agent Settings | Greptile}

nvkevlu · 2025-12-28T13:38:42Z

/build

nvkevlu · 2025-12-30T15:25:33Z

/build

nvkevlu · 2025-12-30T16:58:23Z

replaced with #3923

add recipe for hello cross site eval

24bca6f

Copilot AI review requested due to automatic review settings December 15, 2025 22:25

Copilot started reviewing on behalf of nvkevlu December 15, 2025 22:25 View session

greptile-apps bot reviewed Dec 15, 2025

View reviewed changes

Copilot AI reviewed Dec 15, 2025

View reviewed changes

address PR comments

900e83e

greptile-apps bot reviewed Dec 16, 2025

View reviewed changes

Merge branch 'main' into hello_cross_site_eval_recipe

2d17ba0

YuanTingHsieh reviewed Dec 19, 2025

View reviewed changes

Merge branch 'main' into hello_cross_site_eval_recipe

8a36c75

update and fix PR comments

ce98214

YuanTingHsieh reviewed Dec 23, 2025

View reviewed changes

examples/hello-world/hello-numpy-cross-val/job.py Outdated Show resolved Hide resolved

update and address PR comments

c32389c

greptile-apps bot reviewed Dec 23, 2025

View reviewed changes

nvkevlu and others added 3 commits December 23, 2025 14:30

add missing new file

df2af09

Merge branch 'main' into hello_cross_site_eval_recipe

7227208

Merge branch 'main' into hello_cross_site_eval_recipe

cb9d8bb

Merge branch 'main' into hello_cross_site_eval_recipe

f0db539

nvkevlu enabled auto-merge (squash) December 30, 2025 15:25

nvkevlu mentioned this pull request Dec 30, 2025

Add cross-site evaluation utility and examples #3923

Open

6 tasks

nvkevlu closed this Dec 30, 2025

auto-merge was automatically disabled December 30, 2025 16:58
Pull request was closed

Add recipe for hello cross site eval #3895

Add recipe for hello cross site eval #3895

Uh oh!

Conversation

nvkevlu commented Dec 15, 2025

Description

Types of changes

Uh oh!

greptile-apps bot commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Key Changes

Architecture

Code Quality

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (2)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chesterxgchen commented Dec 16, 2025

Uh oh!

nvkevlu commented Dec 16, 2025

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

nvkevlu commented Dec 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot commented Dec 23, 2025

Greptile's behavior is changing!

Uh oh!

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (4)

Uh oh!

nvkevlu commented Dec 28, 2025

Uh oh!

nvkevlu commented Dec 30, 2025

Uh oh!

nvkevlu commented Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

greptile-apps bot commented Dec 15, 2025 •

edited

Loading

greptile-apps bot left a comment •

edited

Loading

greptile-apps bot left a comment •

edited

Loading