-
Notifications
You must be signed in to change notification settings - Fork 158
Description
Hello,
I am not sure if others are experiencing this issue, but I have found that when running our HH->WWgg tagger plugin with workspacestd on a signal file, for example with the command:
cmsRun Systematics/test/workspaceStd.py metaConditions=MetaData/data/MetaConditions/Era2017_RR-31Mar2018_v1.json campaign=Era2017_RR-31Mar2018_v2 dataset=GluGluToHHTo2G2Qlnu_node_cHHH1_TuneCP5_PSWeights_13TeV-powh
eg-pythia8 doHHWWggTag=1 HHWWggTagsOnly=1 maxEvents=500 doSystematics=1 dumpWorkspace=0 dumpTrees=1 useAAA=1 doHHWWggTagCutFlow=1 saveHHWWggFinalStateVars=1 HHWWggAnalysisChannel=SL HHWWgguseZeroVtx=1 doHHWWggDebug=1
(There is some customization for WWgg there, implemented in [1] and [2]) There is about 6 minutes taken between the time of opening the input microAOD file and the start of running through the events:
23-Nov-2020 15:09:08 CET Successfully opened file root://cms-xrd-global.cern.ch//store/user/alesauva/flashgg/2017_1/10_6_4/GluGluToHHTo2G2Qlnu_node_cHHH1_TuneCP5_PSWeights_13TeV-powheg-pythia8/2017_1-10_6_4-v0-RunIIFall17MiniAODv2-PU2017_12Apr2018_94X_mc2017_realistic_v14-v1/201105_142131/0000/myMicroAODOutputFile_36.root
Begin processing the 1st record. Run 1, Event 201310, LumiSection 2014 on stream 0 at 23-Nov-2020 15:15:16.486 CET
(15:09 ---> 15:15)
However when running over data, with the following command:
cmsRun Systematics/test/workspaceStd.py metaConditions=MetaData/data/MetaConditions/Era2017_RR-31Mar2018_v1.json campaign=Era2017_RR-31Mar2018_v2 dataset=/DoubleEG/spigazzi-Era2017_RR-31Mar2018_v2-legacyRun2FullV1-v0-Run2017B-31Mar2018-v1-d9c0c6cde5cc4a64343ae06f842e5085/USER doHHWWggTag=1 HHWWggTagsOnly=1 maxEvents=500 doSystematics=0 dumpWorkspace=0 dumpTrees=1 useAAA=1 processId=Data processType=Data doHHWWggTagCutFlow=1 saveHHWWggFinalStateVars=1 HHWWggAnalysisChannel=SL HHWWgguseZeroVtx=1
It only takes around 22 seconds between opening an input file and the start of running over events:
23-Nov-2020 16:11:27 CET Successfully opened file root://cms-xrd-global.cern.ch//store/user/spigazzi/flashgg/Era2017_RR-31Mar2018_v2/legacyRun2FullV1/DoubleEG/Era2017_RR-31Mar2018_v2-legacyRun2FullV1-v0-Run2017B-31Mar2018-v1/190606_094808/0000/myMicroAODOutputFile_519.root
Begin processing the 1st record. Run 297219, Event 255140966, LumiSection 124 on stream 0 at 23-Nov-2020 16:11:49.666 CET
(16:11:27 --> 16:11:49)
In discussing this with Simone in the past, a hypothesis for this slow running for signals may be an unnecessarily long ordering / going through of the MC scaling input files. This is quite an inconvenient issue as it takes lots of time to attempt to debug locally as it takes at least 6 minutes between each small fix that one wants to test.
If anyone may know why this issue is occurring or a potential solution, it would be much appreciated.
Thank you,
Abe
[1] https://github.com/atishelmanch/flashgg/blob/HHWWgg_dev/Systematics/python/HHWWggCustomize.py
[2] https://github.com/atishelmanch/flashgg/blob/HHWWgg_dev/Taggers/plugins/HHWWggTagProducer.cc