Optimize identify anomaly periods algorithm #11
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What it does
The original implementation created and processed sliding windows one by one, resulting in hour-long processing times for million-row datasets. The new implementation maintains identical detection logic but uses a vectorized cumulative sum approach to calculate all window means at once, dramatically reducing computation time.
This PR also aims to fix the excessive calculation time reported in a previous pull request.
How to test
Initialize the
AnomalyDetecionmodule with your custom outputs (e.g., CPU Usage, Memory Usage, etc.). In the case of having a huge dataset (e.g., millions of data points), you can now observe the significant performance improvement when indicating the anomalies.Follow-ups
N/A
Review checklist