Skip to content

Conversation

@nvkevlu
Copy link
Collaborator

@nvkevlu nvkevlu commented Dec 30, 2025

Add cross-site evaluation utility and examples.

Description

Instead of #3895, this takes into account #3895 for adding cross-site evaluation utility and examples.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Quick tests passed locally by running ./runtest.sh.
  • In-line docstrings updated.
  • Documentation updated.

Copilot AI review requested due to automatic review settings December 30, 2025 16:24
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 30, 2025

Greptile Summary

This PR adds cross-site evaluation (CSE) utility and examples to NVFlare, enabling users to evaluate federated learning models across all client sites without sharing data. The implementation follows a consistent pattern with the existing add_experiment_tracking() utility.

Key Changes:

  • Added add_cross_site_evaluation() utility function in nvflare/recipe/utils.py that integrates CSE into any recipe with a single function call
  • Created unified job.py example supporting two modes: standalone CSE with pre-trained models, and training+CSE workflow
  • Added new client.py training script for NumPy FedAvg demonstration
  • Replaced two separate job scripts (job_cse.py and job_train_and_cse.py) with one unified script using command-line arguments
  • Updated hello-pt example to demonstrate optional CSE integration with PyTorch models
  • Comprehensive README documentation explaining both operational modes and customization options

Minor Issues:

  • Unused constant CLIENT_MODEL_DIR in both utils.py and job.py (low priority cleanup item)

Confidence Score: 4/5

  • This PR is safe to merge with minor cleanup recommended
  • The implementation is well-structured, follows existing patterns in the codebase, and includes comprehensive documentation. The utility function properly handles both PyTorch and NumPy model types with appropriate registry-based configuration. The code is consistent with the existing add_experiment_tracking() utility pattern. Only minor issue is unused constant definitions that don't affect functionality.
  • No files require special attention - the unused constants are minor style issues that don't affect functionality

Important Files Changed

Filename Overview
nvflare/recipe/utils.py Added add_cross_site_evaluation() utility function with model locator registry supporting PyTorch and NumPy, includes detailed documentation about validator requirements
examples/hello-world/hello-numpy-cross-val/job.py Unified job script supporting both pretrained CSE and training+CSE modes using Recipe API, replaces old job_cse.py and job_train_and_cse.py files
examples/hello-world/hello-numpy-cross-val/client.py New NumPy training script with mock training and evaluation functions for demonstration, includes support for full and diff parameter updates

Sequence Diagram

sequenceDiagram
    participant User
    participant Recipe/Job
    participant Server
    participant Client1
    participant Client2

    alt Mode 1: Standalone CSE with Pre-trained Models
        User->>Recipe/Job: python job.py --mode pretrained
        Recipe/Job->>Server: Configure NPModelLocator with pre-trained models
        Recipe/Job->>Server: Add ValidationJsonGenerator
        Recipe/Job->>Server: Add CrossSiteModelEval controller
        Recipe/Job->>Client1: Deploy NPValidator
        Recipe/Job->>Client2: Deploy NPValidator
        Server->>Server: Load pre-trained models (server_model_1, server_model_2)
        Server->>Client1: Distribute all models for validation
        Server->>Client2: Distribute all models for validation
        Client1->>Client1: Validate each model on local data
        Client2->>Client2: Validate each model on local data
        Client1->>Server: Return validation metrics
        Client2->>Server: Return validation metrics
        Server->>Server: Generate cross-site validation matrix
        Server->>User: Save results to cross_val_results.json
    else Mode 2: Training + CSE
        User->>Recipe/Job: python job.py --mode training
        Recipe/Job->>Recipe/Job: Create NumpyFedAvgRecipe
        Recipe/Job->>Recipe/Job: Add NPValidator to clients
        Recipe/Job->>Recipe/Job: Call add_cross_site_evaluation()
        Recipe/Job->>Server: Deploy ScatterAndGather controller
        Recipe/Job->>Server: Deploy CrossSiteModelEval controller
        Recipe/Job->>Client1: Deploy ScriptRunner + NPValidator
        Recipe/Job->>Client2: Deploy ScriptRunner + NPValidator
        
        loop Training Rounds
            Server->>Client1: Send global model
            Server->>Client2: Send global model
            Client1->>Client1: Train on local data (client.py)
            Client2->>Client2: Train on local data (client.py)
            Client1->>Server: Send updated model weights
            Client2->>Server: Send updated model weights
            Server->>Server: Aggregate weights (FedAvg)
        end
        
        Server->>Server: Training complete, start CSE
        Server->>Client1: Distribute trained models
        Server->>Client2: Distribute trained models
        Client1->>Client1: Validate all models
        Client2->>Client2: Validate all models
        Client1->>Server: Return validation metrics
        Client2->>Server: Return validation metrics
        Server->>Server: Generate cross-site validation matrix
        Server->>User: Save results to cross_val_results.json
    end
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (3)

  1. examples/hello-world/hello-numpy-cross-val/job.py, line 90 (link)

    logic: ValidationJsonGenerator is NOT added automatically when using plain FedJob. It's only added automatically by BaseFedJob. You need to explicitly add it here.

  2. examples/hello-world/hello-pt/README.md, line 32 (link)

    syntax: Typo: "traiing" should be "training"

  3. examples/hello-world/hello-pt/README.md, line 48 (link)

    syntax: Misplaced # character after comma - should be a space

9 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a cross-site evaluation utility function and examples to NVFlare's Recipe API, enabling users to easily evaluate models across different client sites without sharing data.

Key Changes:

  • Added add_cross_site_evaluation() utility function to nvflare/recipe/utils.py for programmatically enabling cross-site model evaluation
  • Added cross-site evaluation support to the PyTorch hello-world example with a --cross_site_eval command-line flag
  • Unified the NumPy cross-validation example into a single job.py with two modes: standalone CSE and training+CSE

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 24 comments.

Show a summary per file
File Description
nvflare/recipe/utils.py Adds add_cross_site_evaluation() utility and MODEL_LOCATOR_REGISTRY for PyTorch and NumPy
examples/hello-world/hello-pt/job.py Adds cross-site evaluation support via --cross_site_eval flag and --train_script parameter
examples/hello-world/hello-pt/README.md Updates documentation with cross-site evaluation instructions and usage examples
examples/hello-world/hello-numpy-cross-val/job.py Replaces multiple job scripts with unified implementation supporting pretrained and training modes
examples/hello-world/hello-numpy-cross-val/client.py Adds NumPy training client script with training and evaluation functions
examples/hello-world/hello-numpy-cross-val/generate_pretrain_models.py Updates comment to clarify cross-site evaluation terminology
examples/hello-world/hello-numpy-cross-val/README.md Comprehensive rewrite documenting both CSE modes and Recipe API usage patterns
examples/hello-world/hello-numpy-cross-val/job_train_and_cse.py Deleted - functionality consolidated into unified job.py
examples/hello-world/hello-numpy-cross-val/job_cse.py Deleted - functionality consolidated into unified job.py

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. nvflare/recipe/utils.py, line 107-108 (link)

    logic: comp_ids doesn't exist on plain FedJob objects - only on BaseFedJob. NumpyFedAvgRecipe uses plain FedJob which will cause AttributeError at runtime when calling add_cross_site_evaluation with persistor_id=None.

9 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@nvkevlu
Copy link
Collaborator Author

nvkevlu commented Dec 30, 2025

/build

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (2)

  1. nvflare/recipe/utils.py, line 39-50 (link)

    style: CLIENT_MODEL_DIR constant is defined but never used in the PR. The run_pretrained_cse() function in job.py references CLIENT_MODEL_DIR directly without using this registry value.

  2. examples/hello-world/hello-numpy-cross-val/job.py, line 38-39 (link)

    style: CLIENT_MODEL_DIR is defined but never used anywhere in the code.

9 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@@ -1,87 +1,246 @@
# Hello Numpy Cross-Site Validation
# Hello NumPy Cross-Site Validation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, what's the correct term, cross-site validation or cross-site evaluation?

@holgerroth @ZiyueXu77

add_experiment_tracking(recipe, tracking_type="tensorboard")

if args.cross_site_eval:
add_cross_site_evaluation(recipe, model_locator_type="pytorch")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is good, much cleaner than before.

Comment on lines +56 to +57
def run_pretrained_cse(n_clients: int):
"""Run standalone cross-site evaluation with pre-trained models.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest "run_cse_only" or "run_cse_with_pretrained"

Comment on lines +66 to +86
# Create a minimal FedJob for CSE-only workflow
job = FedJob(name="hello-numpy-cse", min_clients=n_clients)

# Configure model locator with pre-trained model locations
model_locator_id = job.to_server(
NPModelLocator(
model_dir=SERVER_MODEL_DIR,
model_name={"server_model_1": "server_1.npy", "server_model_2": "server_2.npy"},
)
)

# Add validation JSON generator to save results
job.to_server(ValidationJsonGenerator())

# Add cross-site evaluation controller
eval_controller = CrossSiteModelEval(
model_locator_id=model_locator_id,
submit_model_timeout=600,
validation_timeout=6000,
)
job.to_server(eval_controller)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for this part, i think we still need a CrossSiteEvalRecipe to just do CSE without training.
Then we wont expost FedJob to the users.

Comment on lines +236 to +237
env = ProdEnv()
recipe.export(env, output_path="/tmp/nvflare/prod/job_config")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can execute as well in ProdEnv, need to provide the startup kit path

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants