Skip to content

Conversation

@MengqinShen
Copy link
Contributor

@MengqinShen MengqinShen commented Dec 26, 2025

These changes are to make py/samples/evaluator-demo mirrors js/testapps/evals folder

CHANGLOG:

  • Modified the index method in py/plugins/dev-local-vectorstore/src/genkit/plugins/dev_local_vectorstore/indexer.py to accept an IndexerRequest object, aligning with a more structured approach for indexing documents.
  • The old eval_demo.py was removed and replaced with a modular structure including eval_in_code.py, genkit_demo.py, index.py, pdf_rag.py, and setup.py. Various test flows added
  • Added new JSON datasets (cat_adoption_questions.jsonl, cat_adoption_questions_with_reference.json, dogfacts.json) and a new prompt file (hello.prompt) to support the expanded evaluation demo scenarios.
  • The README.md for the evaluator demo has been updated.

@github-actions github-actions bot added docs Improvements or additions to documentation python Python config fix labels Dec 26, 2025
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @MengqinShen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request undertakes a significant overhaul of the Python evaluator demo, bringing its functionality and structure in line with the JavaScript SDK. The changes focus on enhancing the core Genkit framework's input handling and evaluator plugin capabilities, particularly for the faithfulness metric. The demo itself is now more comprehensive, showcasing various evaluation scenarios, RAG flows, and programmatic evaluation, supported by new datasets and clearer documentation.

Highlights

  • Robust Input Handling: Changed direct dictionary access payload['input'] to payload.get('input') in py/packages/genkit/src/genkit/core/reflection.py for safer retrieval of input, preventing KeyError if the 'input' key is missing.
  • Vector Store Indexer API Update: Modified the index method in py/plugins/dev-local-vectorstore/src/genkit/plugins/dev_local_vectorstore/indexer.py to accept an IndexerRequest object, aligning with a more structured approach for indexing documents.
  • Enhanced Evaluators Plugin: The GenkitEvaluators plugin in py/plugins/evaluators/src/genkit/plugins/evaluators/plugin_api.py now supports more flexible initialization, removes unnecessary assertions for datapoint.reference, and significantly refactors the _faithfulness_eval to implement a two-step evaluation process (statement extraction and NLI check) with improved scoring and reasoning details. It also corrects metric type naming.
  • Refactored Evaluator Demo: The Python evaluator demo (py/samples/evaluator-demo/) has been completely refactored. The old eval_demo.py was removed and replaced with a modular structure including eval_in_code.py, genkit_demo.py, index.py, pdf_rag.py, and setup.py. This new structure introduces various evaluation flows, RAG capabilities, and programmatic evaluation examples, mirroring the JS SDK's demo.
  • New Demo Datasets and Prompts: Added new JSON datasets (cat_adoption_questions.jsonl, cat_adoption_questions_with_reference.json, dogfacts.json) and a new prompt file (hello.prompt) to support the expanded evaluation demo scenarios.
  • Documentation and Dependency Updates: The README.md for the evaluator demo has been updated to reflect the new setup and available flows. The pyproject.toml and uv.lock files were updated to include the pypdf dependency, necessary for PDF processing in the demo.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly overhauls the Python evaluator demo to match the functionality of the JS SDK. It introduces new flows, data, and prompts, providing a more comprehensive example of Genkit's evaluation capabilities. The changes also include important bug fixes in the evaluator plugin, such as correcting evaluator names and improving the logic for faithfulness evaluation.

My review has identified a couple of issues: a potential TypeError in the faithfulness evaluator when handling null context, and a critical return type mismatch in one of the new demo flows that would cause it to fail at runtime. I've provided code suggestions to address these issues. Overall, this is a valuable update that improves the evaluator demo and fixes underlying bugs.

@MengqinShen MengqinShen marked this pull request as ready for review December 26, 2025 23:23
@MengqinShen MengqinShen requested a review from yesudeep December 26, 2025 23:23
@MengqinShen MengqinShen merged commit 7b4abaa into main Dec 27, 2025
10 checks passed
@MengqinShen MengqinShen deleted the elisa/fix(py)/update-eval-demo branch December 27, 2025 01:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

config docs Improvements or additions to documentation fix python Python

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants