fix(py): add more test flows in evaluator demo to match JS SDK #4015

MengqinShen · 2025-12-26T21:35:21Z

These changes are to make py/samples/evaluator-demo mirrors js/testapps/evals folder

CHANGLOG:

Modified the index method in py/plugins/dev-local-vectorstore/src/genkit/plugins/dev_local_vectorstore/indexer.py to accept an IndexerRequest object, aligning with a more structured approach for indexing documents.
The old eval_demo.py was removed and replaced with a modular structure including eval_in_code.py, genkit_demo.py, index.py, pdf_rag.py, and setup.py. Various test flows added
Added new JSON datasets (cat_adoption_questions.jsonl, cat_adoption_questions_with_reference.json, dogfacts.json) and a new prompt file (hello.prompt) to support the expanded evaluation demo scenarios.
The README.md for the evaluator demo has been updated.

gemini-code-assist · 2025-12-26T21:35:49Z

Summary of Changes

Hello @MengqinShen, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request undertakes a significant overhaul of the Python evaluator demo, bringing its functionality and structure in line with the JavaScript SDK. The changes focus on enhancing the core Genkit framework's input handling and evaluator plugin capabilities, particularly for the faithfulness metric. The demo itself is now more comprehensive, showcasing various evaluation scenarios, RAG flows, and programmatic evaluation, supported by new datasets and clearer documentation.

Highlights

Robust Input Handling: Changed direct dictionary access payload['input'] to payload.get('input') in py/packages/genkit/src/genkit/core/reflection.py for safer retrieval of input, preventing KeyError if the 'input' key is missing.
Vector Store Indexer API Update: Modified the index method in py/plugins/dev-local-vectorstore/src/genkit/plugins/dev_local_vectorstore/indexer.py to accept an IndexerRequest object, aligning with a more structured approach for indexing documents.
Enhanced Evaluators Plugin: The GenkitEvaluators plugin in py/plugins/evaluators/src/genkit/plugins/evaluators/plugin_api.py now supports more flexible initialization, removes unnecessary assertions for datapoint.reference, and significantly refactors the _faithfulness_eval to implement a two-step evaluation process (statement extraction and NLI check) with improved scoring and reasoning details. It also corrects metric type naming.
Refactored Evaluator Demo: The Python evaluator demo (py/samples/evaluator-demo/) has been completely refactored. The old eval_demo.py was removed and replaced with a modular structure including eval_in_code.py, genkit_demo.py, index.py, pdf_rag.py, and setup.py. This new structure introduces various evaluation flows, RAG capabilities, and programmatic evaluation examples, mirroring the JS SDK's demo.
New Demo Datasets and Prompts: Added new JSON datasets (cat_adoption_questions.jsonl, cat_adoption_questions_with_reference.json, dogfacts.json) and a new prompt file (hello.prompt) to support the expanded evaluation demo scenarios.
Documentation and Dependency Updates: The README.md for the evaluator demo has been updated to reflect the new setup and available flows. The pyproject.toml and uv.lock files were updated to include the pypdf dependency, necessary for PDF processing in the demo.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request significantly overhauls the Python evaluator demo to match the functionality of the JS SDK. It introduces new flows, data, and prompts, providing a more comprehensive example of Genkit's evaluation capabilities. The changes also include important bug fixes in the evaluator plugin, such as correcting evaluator names and improving the logic for faithfulness evaluation.

My review has identified a couple of issues: a potential TypeError in the faithfulness evaluator when handling null context, and a critical return type mismatch in one of the new demo flows that would cause it to fail at runtime. I've provided code suggestions to address these issues. Overall, this is a valuable update that improves the evaluator demo and fixes underlying bugs.

py/samples/evaluator-demo/src/pdf_rag.py

py/plugins/evaluators/src/genkit/plugins/evaluators/plugin_api.py

fix(py): add more test flows in evaluator demo to match JS SDK

3abf4d4

github-project-automation bot added this to Genkit Backlog Dec 26, 2025

github-actions bot added docs Improvements or additions to documentation python Python config fix labels Dec 26, 2025

gemini-code-assist bot reviewed Dec 26, 2025

View reviewed changes

py/samples/evaluator-demo/src/pdf_rag.py Outdated Show resolved Hide resolved

py/plugins/evaluators/src/genkit/plugins/evaluators/plugin_api.py Outdated Show resolved Hide resolved

MengqinShen-GL added 4 commits December 26, 2025 14:48

fix(py): fix the evaluation failure bug in dog_facts_eval test flow

7333c9e

fix(py): fix the python checks failing test

140d8a5

fix(py): fix the python checks failing test in index.py

10635f7

fix(py): remove a comment in setup.py

96e14b8

MengqinShen marked this pull request as ready for review December 26, 2025 23:23

MengqinShen requested a review from yesudeep December 26, 2025 23:23

yesudeep approved these changes Dec 26, 2025

View reviewed changes

MengqinShen merged commit 7b4abaa into main Dec 27, 2025
10 checks passed

github-project-automation bot moved this to Done in Genkit Backlog Dec 27, 2025

MengqinShen deleted the elisa/fix(py)/update-eval-demo branch December 27, 2025 01:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(py): add more test flows in evaluator demo to match JS SDK #4015

fix(py): add more test flows in evaluator demo to match JS SDK #4015

MengqinShen commented Dec 26, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Dec 26, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix(py): add more test flows in evaluator demo to match JS SDK #4015

fix(py): add more test flows in evaluator demo to match JS SDK #4015

Conversation

MengqinShen commented Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Dec 26, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MengqinShen commented Dec 26, 2025 •

edited

Loading