Skip to content

Conversation

@david-stephan
Copy link
Contributor

No description provided.

@david-stephan david-stephan changed the title Sed 4451 the performance of the analytics view is very poor on large datasets after re ingestion SED-4451 the performance of the analytics view is very poor on large datasets after re ingestion Jan 6, 2026
@gemini-code-assist
Copy link

Summary of Changes

Hello @david-stephan, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request directly addresses a reported performance issue with analytics views, particularly when dealing with large datasets after re-ingestion. The changes focus on optimizing time-series data handling through a significant dependency update, the introduction of more efficient and correctly typed database indexes, and a migration task to clean up any legacy, inefficient indexes. Additionally, new filtering capabilities have been added to allow for more targeted data queries, all contributing to a more responsive and performant analytics experience.

Highlights

  • Dependency Update: The step-framework dependency has been updated to version 2026.1.5-695bdfc4724a72343c993be9, likely bringing in performance improvements or necessary fixes.
  • Time-Series Indexing Improvements: New compound indexes have been introduced for time-series collections (timeseries and ReportNodeTimeSeries) based on taskId, planId, metricType, and timestamp. Crucially, the data type for the begin field in these indexes has been corrected from String to Long to ensure accurate and efficient indexing, which is vital for query performance on large datasets.
  • Migration Task for Index Cleanup: A new migration task, V29_2_TimeSeriesNewIndexes, has been added to automatically drop old, incorrectly typed time-series compound indexes from PostgreSQL collections. This ensures that the system utilizes the newly defined, correctly structured indexes.
  • Granular Data Filtering for Time-Series: A new includeGlobalEntities parameter has been implemented across time-series query endpoints and the FetchBucketsRequest DTO. This allows for more granular control over data filtering, enabling queries to restrict results to non-global entities and potentially optimizing query scope and performance.
  • Time-Series Ingestion Performance: The flushSeriesQueueSize for time-series ingestion has been increased to 20000, which may enhance the performance of data ingestion by allowing more data to be buffered before flushing.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several changes to improve the performance of the analytics view, particularly for large datasets. The core of the changes involves adding and correcting database indexes for time-series data, including a migration task to clean up old, incorrect indexes. Additionally, a new mechanism for restricting data scope via a RestrictedScopeSession has been implemented, controlled by an includeGlobalEntities flag in API requests. While the changes are well-aligned with the performance improvement goal, I've found a critical issue with static initialization order that will lead to a NullPointerException, along with a few other medium to high severity issues related to code correctness and encapsulation that should be addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants