Skip to content

Conversation

@aebid
Copy link
Contributor

@aebid aebid commented Jan 16, 2026

Add 2D linearizing, multi-dim histograms, and fixed memory leak in histmerger

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds support for multi-dimensional (2D and 3D) histograms to the FLAF analysis framework and fixes a memory leak in the histogram merger. The changes enable linearization of 2D histograms into 1D representations with custom binning per y-bin range.

Changes:

  • Added C++ function rebinHistogramDict to rebin multi-dimensional histograms with custom binning per y-range
  • Extended Python histogram handling to support 1D, 2D, and 3D histograms throughout the analysis pipeline
  • Fixed memory leak in histogram merger by setting SetDirectory(0) on histograms during accumulation

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 16 comments.

Show a summary per file
File Description
include/HistHelper.h Adds C++ rebinHistogramDict function for multi-dimensional histogram rebinning with custom y-range binning
Common/HistHelper.py Extends RebinHisto and GetModel to support dict-based binning and multi-dimensional histograms; adds cache helper functions
Analysis/tasks.py Adds variable flattening logic to extract individual variables from multi-dimensional variable definitions
Analysis/HistTupleProducer.py Refactors to handle flattened variables and adds shifted tree processing for systematic uncertainties
Analysis/HistProducerFromNTuple.py Updates bin counting logic to correctly handle 2D/3D histograms and adds dimension detection
Analysis/HistPlotter.py Removes duplicate RebinHisto and getNewBins functions (now in HistHelper.py); adds 2D binning support
Analysis/HistMergerFromHists.py Fixes memory leak by calling SetDirectory(0) when accumulating histograms and refactors file handling

dfw_central.df = dfw_central.df.Define(f"{var}_bin", f"get_{var}_bin({var})")
dfw_central.colToSave.append(f"{var}_bin")

varToSave = Utilities.ListToVector(list(set(dfw_central.colToSave)))
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variables tmp_fileNames and snaps are used before they are initialized. They are first used on lines 168 and 169, but not initialized until lines 222-223. These variables should be initialized before their first use, likely at the beginning of the function or before line 166.

Suggested change
varToSave = Utilities.ListToVector(list(set(dfw_central.colToSave)))
varToSave = Utilities.ListToVector(list(set(dfw_central.colToSave)))
tmp_fileNames = []
snaps = []

Copilot uses AI. Check for mistakes.
)
if df_central.Filter("map_placeholder > 0").Count().GetValue() <= 0:
raise RuntimeError("no events passed map placeolder")
all_shifts_to_compute.extend(unc_cfg_dict["shape"].keys())
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable all_shifts_to_compute is used with .extend() but is never initialized. It should be initialized as an empty list before this line, similar to how norm_uncertainties and scale_uncertainties are initialized.

Copilot uses AI. Check for mistakes.
raise RuntimeError("no events passed map placeolder")
all_shifts_to_compute.extend(unc_cfg_dict["shape"].keys())

for unc in ["Central"] + all_rel_uncs_to_compute:
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable all_rel_uncs_to_compute is referenced but never defined or initialized in the visible code. This will cause a NameError at runtime.

Suggested change
for unc in ["Central"] + all_rel_uncs_to_compute:
for unc in ["Central"] + list(norm_uncertainties):

Copilot uses AI. Check for mistakes.
Comment on lines 126 to 128
df_central = createCentralQuantities(
df_central, col_types_central, col_names_central
)
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function createCentralQuantities is called but is never imported. The import statement from FLAF.Common.HistHelper import * is commented out on line 13. Either uncomment this import or add an explicit import for createCentralQuantities.

Copilot uses AI. Check for mistakes.
f"weight_{unc}_{scale}" if unc != "Central" else "weight_Central"
)
histTupleDef.DefineWeightForHistograms(
dfw=dfw_central,
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable dfw_central is used but never defined or initialized in the visible code. It should be created before being passed to DefineWeightForHistograms.

Copilot uses AI. Check for mistakes.
}
}
return hist_output;
}
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rebinHistogramDict function lacks documentation. Add a comment block explaining the function's purpose, parameters, return value, and the algorithm used for rebinning multi-dimensional histograms into linearized 1D histograms.

Copilot uses AI. Check for mistakes.
all_output_edges.erase(std::unique(all_output_edges.begin(), all_output_edges.end()), all_output_edges.end());

// Create output histogram with variable binning
TH1D* hist_output = new TH1D("rebinned", "rebinned", all_output_edges.size() - 1, all_output_edges.data());
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The histogram name and title are hardcoded as 'rebinned'. Consider accepting these as parameters or deriving them from the input histogram to make the function more flexible and preserve histogram metadata.

Suggested change
TH1D* hist_output = new TH1D("rebinned", "rebinned", all_output_edges.size() - 1, all_output_edges.data());
TH1D* hist_output = new TH1D(hist_initial->GetName(), hist_initial->GetTitle(), all_output_edges.size() - 1, all_output_edges.data());

Copilot uses AI. Check for mistakes.
for scale in scales:
for key, filter_to_apply_base in key_filter_dict.items():
filter_to_apply_final = filter_to_apply_base
print("Saving Hist for unc/scale/key:", unc, scale, key)
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug print statement should be removed or converted to proper logging for production code.

Copilot uses AI. Check for mistakes.
Comment on lines +166 to +169

TH1D* rebinHistogramDict(TH1* hist_initial, int N_bins,
const std::vector<std::pair<float, float>>& y_bin_ranges,
const std::vector<std::vector<float>>& output_bin_edges) {
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function creates a new histogram with new but there's no clear ownership or deletion mechanism documented. Consider documenting that the caller is responsible for deleting the returned histogram, or use ROOT's memory management (e.g., adding the histogram to a directory).

Suggested change
TH1D* rebinHistogramDict(TH1* hist_initial, int N_bins,
const std::vector<std::pair<float, float>>& y_bin_ranges,
const std::vector<std::vector<float>>& output_bin_edges) {
/**
* Rebin a 1D histogram using a dictionary of variable-width bin edges.
*
* The returned histogram is allocated with `new` and ownership is
* transferred to the caller. The caller is responsible for managing
* its lifetime, e.g. by:
* - explicitly deleting it with `delete`, or
* - adding it to a ROOT TDirectory/TFile so that ROOT manages
* its deletion.
*/
TH1D* rebinHistogramDict(TH1* hist_initial, int N_bins,
const std::vector<std::pair<float, float>>& y_bin_ranges,
const std::vector<std::vector<float>>& output_bin_edges) {

Copilot uses AI. Check for mistakes.
Comment on lines 374 to 376
# print(col_names_cache)
# if "kinFit_result" in col_names_cache:
# col_names_cache.remove("kinFit_result")
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment appears to contain commented-out code.

Suggested change
# print(col_names_cache)
# if "kinFit_result" in col_names_cache:
# col_names_cache.remove("kinFit_result")
# All cache columns are forwarded to the central DataFrame as-is.

Copilot uses AI. Check for mistakes.
@aebid
Copy link
Contributor Author

aebid commented Jan 16, 2026

@cms-flaf-bot please test

  • HH_bbWW_version=PR_54

@cms-flaf-bot
Copy link
Collaborator

pipeline#13793040 started

@cms-flaf-bot
Copy link
Collaborator

pipeline#13793040 failed

@aebid
Copy link
Contributor Author

aebid commented Jan 16, 2026

@cms-flaf-bot please test

  • HH_bbWW_version=PR_54

@cms-flaf-bot
Copy link
Collaborator

pipeline#13793221 started

@aebid
Copy link
Contributor Author

aebid commented Jan 16, 2026

I hate merge conflicts and apparently broke everything when I was trying to fix them 😢

@cms-flaf-bot
Copy link
Collaborator

pipeline#13793221 failed

@aebid aebid force-pushed the Add_MultiDim_Plots branch from 76b6877 to 0dddc08 Compare January 16, 2026 14:56
@aebid
Copy link
Contributor Author

aebid commented Jan 16, 2026

@cms-flaf-bot please test

  • HH_bbWW_version=PR_54

@cms-flaf-bot
Copy link
Collaborator

pipeline#13793622 started

@aebid
Copy link
Contributor Author

aebid commented Jan 16, 2026

@cms-flaf-bot please test

  • HH_bbWW_version=PR_54

@cms-flaf-bot
Copy link
Collaborator

pipeline#13793706 started

@cms-flaf-bot
Copy link
Collaborator

pipeline#13793706 failed

@aebid
Copy link
Contributor Author

aebid commented Jan 16, 2026

@cms-flaf-bot please test

  • HH_bbWW_version=PR_54

@cms-flaf-bot
Copy link
Collaborator

pipeline#13793934 started

@cms-flaf-bot
Copy link
Collaborator

pipeline#13793934 failed

@cms-flaf-bot
Copy link
Collaborator

pipeline#13793622 failed

@aebid
Copy link
Contributor Author

aebid commented Jan 16, 2026

@cms-flaf-bot please test

  • HH_bbWW_version=PR_54

@cms-flaf-bot
Copy link
Collaborator

pipeline#13794271 started

@cms-flaf-bot
Copy link
Collaborator

pipeline#13794271 failed

@aebid
Copy link
Contributor Author

aebid commented Jan 16, 2026

@cms-flaf-bot please test

  • HH_bbWW_version=PR_54

@cms-flaf-bot
Copy link
Collaborator

pipeline#13795444 started

@cms-flaf-bot
Copy link
Collaborator

pipeline#13795444 passed

@aebid
Copy link
Contributor Author

aebid commented Jan 16, 2026

@cms-flaf-bot please test

  • HH_bbWW_version=PR_54

@cms-flaf-bot
Copy link
Collaborator

pipeline#13796018 started

@cms-flaf-bot
Copy link
Collaborator

pipeline#13796018 failed

@aebid
Copy link
Contributor Author

aebid commented Jan 17, 2026

Testing has to wait until cms-flaf/HH_bbWW#53 is merged

@aebid
Copy link
Contributor Author

aebid commented Jan 17, 2026

@cms-flaf-bot please test

  • HH_bbWW_version=PR_54

@cms-flaf-bot
Copy link
Collaborator

pipeline#13799766 started

@cms-flaf-bot
Copy link
Collaborator

pipeline#13799766 failed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants