-
Notifications
You must be signed in to change notification settings - Fork 17
Update to handle multiDim histograms #207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds support for multi-dimensional (2D and 3D) histograms to the FLAF analysis framework and fixes a memory leak in the histogram merger. The changes enable linearization of 2D histograms into 1D representations with custom binning per y-bin range.
Changes:
- Added C++ function
rebinHistogramDictto rebin multi-dimensional histograms with custom binning per y-range - Extended Python histogram handling to support 1D, 2D, and 3D histograms throughout the analysis pipeline
- Fixed memory leak in histogram merger by setting
SetDirectory(0)on histograms during accumulation
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 16 comments.
Show a summary per file
| File | Description |
|---|---|
| include/HistHelper.h | Adds C++ rebinHistogramDict function for multi-dimensional histogram rebinning with custom y-range binning |
| Common/HistHelper.py | Extends RebinHisto and GetModel to support dict-based binning and multi-dimensional histograms; adds cache helper functions |
| Analysis/tasks.py | Adds variable flattening logic to extract individual variables from multi-dimensional variable definitions |
| Analysis/HistTupleProducer.py | Refactors to handle flattened variables and adds shifted tree processing for systematic uncertainties |
| Analysis/HistProducerFromNTuple.py | Updates bin counting logic to correctly handle 2D/3D histograms and adds dimension detection |
| Analysis/HistPlotter.py | Removes duplicate RebinHisto and getNewBins functions (now in HistHelper.py); adds 2D binning support |
| Analysis/HistMergerFromHists.py | Fixes memory leak by calling SetDirectory(0) when accumulating histograms and refactors file handling |
Analysis/HistTupleProducer.py
Outdated
| dfw_central.df = dfw_central.df.Define(f"{var}_bin", f"get_{var}_bin({var})") | ||
| dfw_central.colToSave.append(f"{var}_bin") | ||
|
|
||
| varToSave = Utilities.ListToVector(list(set(dfw_central.colToSave))) |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variables tmp_fileNames and snaps are used before they are initialized. They are first used on lines 168 and 169, but not initialized until lines 222-223. These variables should be initialized before their first use, likely at the beginning of the function or before line 166.
| varToSave = Utilities.ListToVector(list(set(dfw_central.colToSave))) | |
| varToSave = Utilities.ListToVector(list(set(dfw_central.colToSave))) | |
| tmp_fileNames = [] | |
| snaps = [] |
Analysis/HistTupleProducer.py
Outdated
| ) | ||
| if df_central.Filter("map_placeholder > 0").Count().GetValue() <= 0: | ||
| raise RuntimeError("no events passed map placeolder") | ||
| all_shifts_to_compute.extend(unc_cfg_dict["shape"].keys()) |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable all_shifts_to_compute is used with .extend() but is never initialized. It should be initialized as an empty list before this line, similar to how norm_uncertainties and scale_uncertainties are initialized.
Analysis/HistTupleProducer.py
Outdated
| raise RuntimeError("no events passed map placeolder") | ||
| all_shifts_to_compute.extend(unc_cfg_dict["shape"].keys()) | ||
|
|
||
| for unc in ["Central"] + all_rel_uncs_to_compute: |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable all_rel_uncs_to_compute is referenced but never defined or initialized in the visible code. This will cause a NameError at runtime.
| for unc in ["Central"] + all_rel_uncs_to_compute: | |
| for unc in ["Central"] + list(norm_uncertainties): |
Analysis/HistTupleProducer.py
Outdated
| df_central = createCentralQuantities( | ||
| df_central, col_types_central, col_names_central | ||
| ) |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function createCentralQuantities is called but is never imported. The import statement from FLAF.Common.HistHelper import * is commented out on line 13. Either uncomment this import or add an explicit import for createCentralQuantities.
Analysis/HistTupleProducer.py
Outdated
| f"weight_{unc}_{scale}" if unc != "Central" else "weight_Central" | ||
| ) | ||
| histTupleDef.DefineWeightForHistograms( | ||
| dfw=dfw_central, |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable dfw_central is used but never defined or initialized in the visible code. It should be created before being passed to DefineWeightForHistograms.
| } | ||
| } | ||
| return hist_output; | ||
| } |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rebinHistogramDict function lacks documentation. Add a comment block explaining the function's purpose, parameters, return value, and the algorithm used for rebinning multi-dimensional histograms into linearized 1D histograms.
| all_output_edges.erase(std::unique(all_output_edges.begin(), all_output_edges.end()), all_output_edges.end()); | ||
|
|
||
| // Create output histogram with variable binning | ||
| TH1D* hist_output = new TH1D("rebinned", "rebinned", all_output_edges.size() - 1, all_output_edges.data()); |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The histogram name and title are hardcoded as 'rebinned'. Consider accepting these as parameters or deriving them from the input histogram to make the function more flexible and preserve histogram metadata.
| TH1D* hist_output = new TH1D("rebinned", "rebinned", all_output_edges.size() - 1, all_output_edges.data()); | |
| TH1D* hist_output = new TH1D(hist_initial->GetName(), hist_initial->GetTitle(), all_output_edges.size() - 1, all_output_edges.data()); |
Analysis/HistProducerFromNTuple.py
Outdated
| for scale in scales: | ||
| for key, filter_to_apply_base in key_filter_dict.items(): | ||
| filter_to_apply_final = filter_to_apply_base | ||
| print("Saving Hist for unc/scale/key:", unc, scale, key) |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug print statement should be removed or converted to proper logging for production code.
|
|
||
| TH1D* rebinHistogramDict(TH1* hist_initial, int N_bins, | ||
| const std::vector<std::pair<float, float>>& y_bin_ranges, | ||
| const std::vector<std::vector<float>>& output_bin_edges) { |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function creates a new histogram with new but there's no clear ownership or deletion mechanism documented. Consider documenting that the caller is responsible for deleting the returned histogram, or use ROOT's memory management (e.g., adding the histogram to a directory).
| TH1D* rebinHistogramDict(TH1* hist_initial, int N_bins, | |
| const std::vector<std::pair<float, float>>& y_bin_ranges, | |
| const std::vector<std::vector<float>>& output_bin_edges) { | |
| /** | |
| * Rebin a 1D histogram using a dictionary of variable-width bin edges. | |
| * | |
| * The returned histogram is allocated with `new` and ownership is | |
| * transferred to the caller. The caller is responsible for managing | |
| * its lifetime, e.g. by: | |
| * - explicitly deleting it with `delete`, or | |
| * - adding it to a ROOT TDirectory/TFile so that ROOT manages | |
| * its deletion. | |
| */ | |
| TH1D* rebinHistogramDict(TH1* hist_initial, int N_bins, | |
| const std::vector<std::pair<float, float>>& y_bin_ranges, | |
| const std::vector<std::vector<float>>& output_bin_edges) { |
Common/HistHelper.py
Outdated
| # print(col_names_cache) | ||
| # if "kinFit_result" in col_names_cache: | ||
| # col_names_cache.remove("kinFit_result") |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment appears to contain commented-out code.
| # print(col_names_cache) | |
| # if "kinFit_result" in col_names_cache: | |
| # col_names_cache.remove("kinFit_result") | |
| # All cache columns are forwarded to the central DataFrame as-is. |
|
@cms-flaf-bot please test
|
|
pipeline#13793040 started |
|
pipeline#13793040 failed |
|
@cms-flaf-bot please test
|
|
pipeline#13793221 started |
|
I hate merge conflicts and apparently broke everything when I was trying to fix them 😢 |
|
pipeline#13793221 failed |
76b6877 to
0dddc08
Compare
|
@cms-flaf-bot please test
|
|
pipeline#13793622 started |
|
@cms-flaf-bot please test
|
|
pipeline#13793706 started |
|
pipeline#13793706 failed |
|
@cms-flaf-bot please test
|
|
pipeline#13793934 started |
|
pipeline#13793934 failed |
|
pipeline#13793622 failed |
|
@cms-flaf-bot please test
|
|
pipeline#13794271 started |
|
pipeline#13794271 failed |
|
@cms-flaf-bot please test
|
|
pipeline#13795444 started |
|
pipeline#13795444 passed |
|
@cms-flaf-bot please test
|
|
pipeline#13796018 started |
|
pipeline#13796018 failed |
|
Testing has to wait until cms-flaf/HH_bbWW#53 is merged |
|
@cms-flaf-bot please test
|
|
pipeline#13799766 started |
|
pipeline#13799766 failed |
Add 2D linearizing, multi-dim histograms, and fixed memory leak in histmerger