Analysis/Tools

Core Analysis Framework of the DESY CMS Higgs -> bb group

See also the code documentation Doxygen page

⚠️ This package is supposed to contain only general codes to be used for analysis. Codes for specific analysis must be developed in dedicated packages/repositories, e.g. for the MSSM Hbb analyses one can use for developments the package Analysis/MssmHbb, which is currently under construction.

Installation
Calibrations
Ntuples
Example
NAF Submission
Example Detailed Description

Installation

The codes as well as the ntuples are independent of CMSSW. However, in order to compile it uses scram. So the latest version in any architecture should be fine.

cmsrel CMSSW_X_Y_Z
cd CMSSW_X_Y_Z/src
cmsenv
git clone https://github.com/desy-cms/analysis-tools.git Analysis/Tools
scram b -j4 USER_CXXFLAGS="-Wno-misleading-indentation"

💤 The USER_CXXFLAGS="-Wno-misleading-indentation" prevents a large number of warnings from misleading identation in modules of the boost library. User can also define the environment variable below in .bashrc or every time after the command cmsenv

export USER_CXXFLAGS="-Wno-misleading-indentation"

Calibrations

Scale factors, efficiencies etc can be obtained from the analysis-calibrations repository (see also the README.md in each run period directory). It is recommended to install it in the Analysis/Tools/data directory, for the framework loads the calibrations files from that path

cd $CMSSW_BASE/src
git clone https://github.com/desy-cms/analysis-calibrations.git Analysis/Tools/data/calibrations

Ntuples

The lists of ntuples files can be obtained from the analysis-ntuples repository (see also the README.md in each run period directory). The repository can be installed in a directory of your convenience, e.g.

cd $CMSSW_BASE/src
git clone https://github.com/desy-cms/analysis-ntuples.git Analysis/Tools/data/ntuples

The ntuples path name is the global one for general grid access. If you are running your jobs on the NAF, it should be faster to use the /pnfs/... path name. To make the conversion simply issue the following command once for each installation of the analysis-ntuples as described above

ntuples_at_pnfs.sh

The script will modify all the files in $CMSSW_BASE/src/Analysis/Tools/data/ntuples.

Example

A simple example macro can be found in Analysis/Tools/bin/AnalyserSimpleExample.cc. The example is a simple offline and online dijet selection (+ muon selection in the semileptonic case) using signal MC samples and triggers from 2017 and 2018 periods. The macro uses a configuration file as input. Configuration files are avaialble for both the all-hadronic and semileptonic cases for both 2017 and 2018 periods:

To execute an example

cd Analysis/Tools/test
AnalyserSimpleExample -c analyser_example_semilep_2018.cfg

NAF Submission

A python script to submit to NAF condor queue, naf_submit.py, is available.

N.B.: So far the script does not make a single submission of multiple jobs. So be careful not to make too many submissions.

usage: naf_submit.py [-h] [--exe EXE] [--config CONFIG] [--ntuples NTUPLES] [--nfiles NFILES] [--json JSON] [--label LABEL] [--events EVENTS_MAX] [--test NJOBS] [--dir DIR] [--status] [--resubmit] [--expert]

Prepare, submit and check jobs to NAF HTCondor batch system

optional arguments:
      -h, --help                     show this help message and exit

submission:
      prepare and submit jobs

      --exe EXE, -e EXE              Executable (REQUIRED)
      --config CONFIG, -c CONFIG     Configuration file (REQUIRED)
      --ntuples NTUPLES, -n NTUPLES  List of ntuples file
      --nfiles NFILES, -x NFILES     Number of ntuple files per job
      --json JSON, -j JSON           JSON file with certified data
      --label LABEL, -l LABEL        user label for the submission
      --events EVENTS_MAX            override eventsMax in the config file (default = -1)
      --test NJOBS                   *** expert only ***:produce njobs, no automatic submission

status:
      show and modify status

      --dir DIR                      an existing condor directory (REQUIRED)
      --status                       -> returns the status of the jobs in --dir
      --resubmit                     -> resubmits aborted and finished-with-error jobs in --dir
      --expert                       -> *** expert mode ***

If you provide a configuration file with the NTUPLES and JSON parameters, you do not need to parse them, the script will read out that information from the configuration file.

Using the example above to be submitted to the naf using HTCondor

naf_submit.py --exe AnalyserSimpleExample --config analyser_example_semilep_2018.cfg -nfiles 2

After the jobs were submitted there will be a directory called Condor_AnalyserSimpleExample_analyser_example_semilep_2018 containing several subdirectories called job_xxxx. In each of this subdirectory there will be several files. Files to take note will be:

<output>.root - the output file containing histograms etc of your analysis
finished.txt - a file that indicates whether the executable ran until the end (only if you are using the framework)
job.submit - HTCondor submit configuration file. If a job is not finished you can resubmit that job manually by issuing the command

condor_submit job.submit

Once the directory Condor_AnalyserSimpleExample_analyser_example_semilep_2018 is created, one can use the status options, e.g.

naf_submit.py --dir Condor_AnalyserSimpleExample_analyser_example_semilep_2018 --status

will display a table like this

 
                          ***  STATUS OF JOBS  ***

   Condor_AnalyserSimpleExample_analyser_example_semilep_2018
 
  ----------------------------------------------------------------------------------------------------------
     job        finished       running       submitted       aborted       error       condor_id (latest)
  ----------------------------------------------------------------------------------------------------------
   job_0000        v                                                         v             15962854.0
   job_0001        v                                                         v             15962854.1
   job_0002                       v                                                        15962854.2
   job_0003                                       v                                        15968568.0
   job_0004                                                      X                         15962854.4
   job_0005        v                                                         X             15962854.5
  ----------------------------------------------------------------------------------------------------------

  N.B.: Good *finished* jobs will no longer appear in future "--status" calls

where v will appear as a green tick mark and x as a red X.

If jobs were aborted or finished with error, you can resubmit all of those jobs with the command

naf_submit.py --dir Condor_AnalyserSimpleExample_analyser_example_semilep_2018 --resubmit

which, in the example above, will resubmit the jobs job_0004 and job_0005.

Example Detailed Description

Creating a macro

Main

In the macro the first thing done is the main Analyser instantiation

#include "Analysis/Tools/interface/Analyser.h"
using namespace std;
using namespace analysis;
using namespace analysis::tools;

int main(int argc, char ** argv)
{
   TH1::SetDefaultSumw2();
   Analyser analyser(argc,argv);
...

The [Info] block in the .cfg file

[Info]
process = MssmHbb
eventsMax = -1
ntuplesList = rootFileList.txt
isMC = true
output = histograms.root

passes general information to initialise the Analyser in the macro. The process parameter is the name of the parent TDirectoryFile in the ntuple. To find it

root -l ntuple.root
# in the root prompt type
.ls
# it will show
KEY: TDirectoryFile	MssmHbb;1	MssmHbb

Where ntuple.root is any input ntuple file, such as the ones in your roorFileList.txt or in the analysis-ntuples.

Hopefully the other names of the config parameters are self-explanatory.

In data one must specify the json file with certified lumis, e.g.

[Info]
...
isMC = false
json = certified_json.txt

Event loop

The event loop must start like this

for ( int i = 0 ; i < analyser.nEvents() ; ++i )
{
   if ( ! analyser.event(i) )   continue;
   // do actions and selections
}

where the analyser.event(i) reads the event from the ntuple and performs some actions, such as applying generator event weights in MC and JSON certified lumis selection in data.

Selections

The Analyser has several predefined selection methods that reads parameters from the configuration file and apply to the event. The selections must be within the event loop.

Jets

For example, if the analysis involves jets, one must define which collection to be used, the minimum number of jets, the jet identification, the jet pT etc. In the configuration there is a block for jets with the relevant parameters

[Jets]
jets = updatedPatJets
nMin = 2
id = tight
puId = loose
ptMin = 60
ptMin = 50
etaMax = 2.2
etaMax = 2.2
extendedFlavour = true

⚠️ Parameters such as ptMin or etaMax are vectors, so the order in which they are put in the configuration makes a difference, so the first entry corresponds to the leading object, the second entry to the second leading object and so on.

In the macro, the selections are performed within the event loop calling the methods, which automatically reads the parameters from the configuration, e.g.

      // jet identification selection
      if ( ! analyser.selectionJetId()          )   continue;
      if ( ! analyser.selectionJetPileupId()    )   continue;
      if ( ! analyser.selectionNJets()          )   continue;

This will prepare a list of jets containing only jets passing the required identification criteria and with a certain number of jets defined in the configuration.

      //  1st and 2nd jet kinematic selection, pt and eta
      if ( ! analyser.selectionJet(1)          )   continue;
      if ( ! analyser.selectionJet(2)          )   continue;

In the Analyser::selectionJet method the argument is the jet rank, i.e., 1 refers to the leading jets, 2 refers to the second leading jet. the method will select the jet according to its pt and eta criteria defined in the configuration.

b-tag

For the b-tagging there is a block in the configuration file, where one defines the algorithm, working points etc.

[BTag]
nMin  = 2
wp = medium
wp = medium
algorithm = deepflavour
loose  = 0.0494
medium = 0.2770
tight  = 0.7264

With this configuration, one intends to select events with at least the two leading jets tagged with medium working point using the deepflavour algorithm. The thresholds of the working points myst be specified.

To perform the selection in the event loop:

      if ( ! analyser.selectionBJet(1)         )   continue;
      if ( ! analyser.selectionBJet(2)         )   continue;

where the argument of the Analyser::selectionBJet is the jet rank as in the jet kinematic selection above.

Muons

Muon selection has its own configuration block and the procedure works in a similar way as jets

[Muons]
muons = slimmedMuons
nMin = 1
id = tight
ptMin = 13.
etaMax = 2.2

The selection code can be for example

      // muon identification selection
      if ( ! analyser.selectionMuonId()         )   continue;
      if ( ! analyser.selectionNMuons()         )   continue;
      // muon kinematic selection
      if ( ! analyser.selectionMuons()          )   continue;

⚠️ The method Analyser::selectionMuons for muon kinematic selection differs from the one used for jet kinematic selection, for it makes a list of muons passing the required kinematic selection and the event is good if there is at list the minimum number of jets required. This method is useful for the muon-jet association in analyses requiring jets containing a muon.

If one wants to select muons depending on their ranking, then one can use, like in the jets case,

      // leading muon kinematic selection
      if ( ! analyser.selectionMuon(1)          )   continue;

Histograms

Histograms can be predefined in the Analyser, e.g. for jets one can use the method jetHistograms, which receives the arguments number of jets, which must be at most the minimum number of jets required, and a name of a directory. One can create as many directories as needed and fill the histograms at different steps of the analysis workflow, e.g. before the event loop:

   analyser.jetHistograms(2,"initial");
   analyser.jetHistograms(2,"final");

then within the event loop:

...
for ( int i = 0 ; i < analyser.nEvents() ; ++i )
{
   if ( ! analyser.event(i) )   continue;
   // do something
   analyser.fillJetHistograms("initial");
   // do something else
   analyser.fillJetHistograms("final");
...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Analysis/Tools

Installation

Calibrations

Ntuples

Example

NAF Submission

Example Detailed Description

Creating a macro

Main

Event loop

Selections

Jets

b-tag

Muons

Histograms

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1,306 Commits
bin		bin
doc		doc
interface		interface
python		python
scripts		scripts
src		src
test		test
.gitignore		.gitignore
BuildFile.xml		BuildFile.xml
README.md		README.md

leyva-daina/analysis-tools

Folders and files

Latest commit

History

Repository files navigation

Analysis/Tools

Installation

Calibrations

Ntuples

Example

NAF Submission

Example Detailed Description

Creating a macro

Main

Event loop

Selections

Jets

b-tag

Muons

Histograms

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages