Add cross-site evaluation utility and examples #3923

nvkevlu · 2025-12-30T16:24:04Z

Add cross-site evaluation utility and examples.

Description

Instead of #3895, this takes into account #3895 for adding cross-site evaluation utility and examples.

Types of changes

Non-breaking change (fix or new feature that would not break existing functionality).
Breaking change (fix or new feature that would cause existing functionality to change).
New tests added to cover the changes.
Quick tests passed locally by running ./runtest.sh.
In-line docstrings updated.
Documentation updated.

greptile-apps · 2025-12-30T16:30:36Z

Greptile Summary

This PR adds cross-site evaluation (CSE) utility and examples to NVFlare, enabling users to evaluate federated learning models across all client sites without sharing data. The implementation follows a consistent pattern with the existing add_experiment_tracking() utility.

Key Changes:

Added add_cross_site_evaluation() utility function in nvflare/recipe/utils.py that integrates CSE into any recipe with a single function call
Created unified job.py example supporting two modes: standalone CSE with pre-trained models, and training+CSE workflow
Added new client.py training script for NumPy FedAvg demonstration
Replaced two separate job scripts (job_cse.py and job_train_and_cse.py) with one unified script using command-line arguments
Updated hello-pt example to demonstrate optional CSE integration with PyTorch models
Comprehensive README documentation explaining both operational modes and customization options

Minor Issues:

Unused constant CLIENT_MODEL_DIR in both utils.py and job.py (low priority cleanup item)

Confidence Score: 4/5

This PR is safe to merge with minor cleanup recommended
The implementation is well-structured, follows existing patterns in the codebase, and includes comprehensive documentation. The utility function properly handles both PyTorch and NumPy model types with appropriate registry-based configuration. The code is consistent with the existing add_experiment_tracking() utility pattern. Only minor issue is unused constant definitions that don't affect functionality.
No files require special attention - the unused constants are minor style issues that don't affect functionality

Important Files Changed

Filename	Overview
nvflare/recipe/utils.py	Added `add_cross_site_evaluation()` utility function with model locator registry supporting PyTorch and NumPy, includes detailed documentation about validator requirements
examples/hello-world/hello-numpy-cross-val/job.py	Unified job script supporting both pretrained CSE and training+CSE modes using Recipe API, replaces old job_cse.py and job_train_and_cse.py files
examples/hello-world/hello-numpy-cross-val/client.py	New NumPy training script with mock training and evaluation functions for demonstration, includes support for full and diff parameter updates

Sequence Diagram

sequenceDiagram
    participant User
    participant Recipe/Job
    participant Server
    participant Client1
    participant Client2

    alt Mode 1: Standalone CSE with Pre-trained Models
        User->>Recipe/Job: python job.py --mode pretrained
        Recipe/Job->>Server: Configure NPModelLocator with pre-trained models
        Recipe/Job->>Server: Add ValidationJsonGenerator
        Recipe/Job->>Server: Add CrossSiteModelEval controller
        Recipe/Job->>Client1: Deploy NPValidator
        Recipe/Job->>Client2: Deploy NPValidator
        Server->>Server: Load pre-trained models (server_model_1, server_model_2)
        Server->>Client1: Distribute all models for validation
        Server->>Client2: Distribute all models for validation
        Client1->>Client1: Validate each model on local data
        Client2->>Client2: Validate each model on local data
        Client1->>Server: Return validation metrics
        Client2->>Server: Return validation metrics
        Server->>Server: Generate cross-site validation matrix
        Server->>User: Save results to cross_val_results.json
    else Mode 2: Training + CSE
        User->>Recipe/Job: python job.py --mode training
        Recipe/Job->>Recipe/Job: Create NumpyFedAvgRecipe
        Recipe/Job->>Recipe/Job: Add NPValidator to clients
        Recipe/Job->>Recipe/Job: Call add_cross_site_evaluation()
        Recipe/Job->>Server: Deploy ScatterAndGather controller
        Recipe/Job->>Server: Deploy CrossSiteModelEval controller
        Recipe/Job->>Client1: Deploy ScriptRunner + NPValidator
        Recipe/Job->>Client2: Deploy ScriptRunner + NPValidator
        
        loop Training Rounds
            Server->>Client1: Send global model
            Server->>Client2: Send global model
            Client1->>Client1: Train on local data (client.py)
            Client2->>Client2: Train on local data (client.py)
            Client1->>Server: Send updated model weights
            Client2->>Server: Send updated model weights
            Server->>Server: Aggregate weights (FedAvg)
        end
        
        Server->>Server: Training complete, start CSE
        Server->>Client1: Distribute trained models
        Server->>Client2: Distribute trained models
        Client1->>Client1: Validate all models
        Client2->>Client2: Validate all models
        Client1->>Server: Return validation metrics
        Client2->>Server: Return validation metrics
        Server->>Server: Generate cross-site validation matrix
        Server->>User: Save results to cross_val_results.json
    end

greptile-apps

Additional Comments (3)

examples/hello-world/hello-numpy-cross-val/job.py, line 90 (link)

logic: ValidationJsonGenerator is NOT added automatically when using plain FedJob. It's only added automatically by BaseFedJob. You need to explicitly add it here.
examples/hello-world/hello-pt/README.md, line 32 (link)

syntax: Typo: "traiing" should be "training"
examples/hello-world/hello-pt/README.md, line 48 (link)

syntax: Misplaced # character after comma - should be a space

_{9 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

Copilot

Pull request overview

This PR adds a cross-site evaluation utility function and examples to NVFlare's Recipe API, enabling users to easily evaluate models across different client sites without sharing data.

Key Changes:

Added add_cross_site_evaluation() utility function to nvflare/recipe/utils.py for programmatically enabling cross-site model evaluation
Added cross-site evaluation support to the PyTorch hello-world example with a --cross_site_eval command-line flag
Unified the NumPy cross-validation example into a single job.py with two modes: standalone CSE and training+CSE

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 24 comments.

Show a summary per file

File	Description
nvflare/recipe/utils.py	Adds `add_cross_site_evaluation()` utility and MODEL_LOCATOR_REGISTRY for PyTorch and NumPy
examples/hello-world/hello-pt/job.py	Adds cross-site evaluation support via `--cross_site_eval` flag and `--train_script` parameter
examples/hello-world/hello-pt/README.md	Updates documentation with cross-site evaluation instructions and usage examples
examples/hello-world/hello-numpy-cross-val/job.py	Replaces multiple job scripts with unified implementation supporting pretrained and training modes
examples/hello-world/hello-numpy-cross-val/client.py	Adds NumPy training client script with training and evaluation functions
examples/hello-world/hello-numpy-cross-val/generate_pretrain_models.py	Updates comment to clarify cross-site evaluation terminology
examples/hello-world/hello-numpy-cross-val/README.md	Comprehensive rewrite documenting both CSE modes and Recipe API usage patterns
examples/hello-world/hello-numpy-cross-val/job_train_and_cse.py	Deleted - functionality consolidated into unified job.py
examples/hello-world/hello-numpy-cross-val/job_cse.py	Deleted - functionality consolidated into unified job.py

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

examples/hello-world/hello-pt/README.md

examples/hello-world/hello-numpy-cross-val/job.py

nvflare/recipe/utils.py

examples/hello-world/hello-pt/README.md

greptile-apps

Additional Comments (1)

nvflare/recipe/utils.py, line 107-108 (link)

logic: comp_ids doesn't exist on plain FedJob objects - only on BaseFedJob. NumpyFedAvgRecipe uses plain FedJob which will cause AttributeError at runtime when calling add_cross_site_evaluation with persistor_id=None.

_{9 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

nvkevlu · 2025-12-30T16:57:34Z

/build

greptile-apps

Additional Comments (2)

nvflare/recipe/utils.py, line 39-50 (link)

style: CLIENT_MODEL_DIR constant is defined but never used in the PR. The run_pretrained_cse() function in job.py references CLIENT_MODEL_DIR directly without using this registry value.
examples/hello-world/hello-numpy-cross-val/job.py, line 38-39 (link)

style: CLIENT_MODEL_DIR is defined but never used anywhere in the code.

_{9 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

YuanTingHsieh · 2026-01-02T22:52:59Z

examples/hello-world/hello-numpy-cross-val/README.md

@@ -1,87 +1,246 @@
-# Hello Numpy Cross-Site Validation
+# Hello NumPy Cross-Site Validation


btw, what's the correct term, cross-site validation or cross-site evaluation?

@holgerroth @ZiyueXu77

YuanTingHsieh · 2026-01-02T22:58:52Z

examples/hello-world/hello-pt/job.py

    add_experiment_tracking(recipe, tracking_type="tensorboard")

+    if args.cross_site_eval:
+        add_cross_site_evaluation(recipe, model_locator_type="pytorch")


this is good, much cleaner than before.

YuanTingHsieh · 2026-01-02T22:59:33Z

examples/hello-world/hello-numpy-cross-val/job.py

+def run_pretrained_cse(n_clients: int):
+    """Run standalone cross-site evaluation with pre-trained models.


suggest "run_cse_only" or "run_cse_with_pretrained"

YuanTingHsieh · 2026-01-02T23:00:14Z

examples/hello-world/hello-numpy-cross-val/job.py

+    # Create a minimal FedJob for CSE-only workflow
+    job = FedJob(name="hello-numpy-cse", min_clients=n_clients)
+
+    # Configure model locator with pre-trained model locations
+    model_locator_id = job.to_server(
+        NPModelLocator(
+            model_dir=SERVER_MODEL_DIR,
+            model_name={"server_model_1": "server_1.npy", "server_model_2": "server_2.npy"},
+        )
+    )
+
+    # Add validation JSON generator to save results
+    job.to_server(ValidationJsonGenerator())
+
+    # Add cross-site evaluation controller
+    eval_controller = CrossSiteModelEval(
+        model_locator_id=model_locator_id,
+        submit_model_timeout=600,
+        validation_timeout=6000,
+    )
+    job.to_server(eval_controller)


for this part, i think we still need a CrossSiteEvalRecipe to just do CSE without training.
Then we wont expost FedJob to the users.

YuanTingHsieh · 2026-01-02T23:01:12Z

examples/hello-world/hello-numpy-cross-val/README.md

+env = ProdEnv()
+recipe.export(env, output_path="/tmp/nvflare/prod/job_config")


you can execute as well in ProdEnv, need to provide the startup kit path

Add cross-site evaluation utility and examples

511a00f

Copilot AI review requested due to automatic review settings December 30, 2025 16:24

Copilot started reviewing on behalf of nvkevlu December 30, 2025 16:24 View session

greptile-apps bot reviewed Dec 30, 2025

View reviewed changes

fix consistency

535b2b6

Copilot AI reviewed Dec 30, 2025

View reviewed changes

greptile-apps bot reviewed Dec 30, 2025

View reviewed changes

nvkevlu and others added 2 commits December 30, 2025 11:56

fix PR issues

3b51953

Merge branch 'main' into combined_cross_site_eval_recipe

b03c6d3

nvkevlu mentioned this pull request Dec 30, 2025

Add recipe for hello cross site eval #3895

Closed

6 tasks

greptile-apps bot reviewed Dec 30, 2025

View reviewed changes

YuanTingHsieh requested a review from holgerroth January 2, 2026 22:51

YuanTingHsieh reviewed Jan 2, 2026

View reviewed changes

		@@ -1,87 +1,246 @@
		# Hello Numpy Cross-Site Validation
		# Hello NumPy Cross-Site Validation

		def run_pretrained_cse(n_clients: int):
		"""Run standalone cross-site evaluation with pre-trained models.

		env = ProdEnv()
		recipe.export(env, output_path="/tmp/nvflare/prod/job_config")

Add cross-site evaluation utility and examples #3923

Are you sure you want to change the base?

Add cross-site evaluation utility and examples #3923

Uh oh!

Conversation

nvkevlu commented Dec 30, 2025

Description

Types of changes

Uh oh!

greptile-apps bot commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (3)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (1)

Uh oh!

nvkevlu commented Dec 30, 2025

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (2)

Uh oh!

YuanTingHsieh Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

YuanTingHsieh Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

YuanTingHsieh Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

YuanTingHsieh Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

YuanTingHsieh Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps bot commented Dec 30, 2025 •

edited

Loading

greptile-apps bot left a comment •

edited

Loading

greptile-apps bot left a comment •

edited

Loading

greptile-apps bot left a comment •

edited

Loading