Skip to content

WorkspaceStd Slow to Start for Signal #1247

@atishelmanch

Description

@atishelmanch

Hello,

I am not sure if others are experiencing this issue, but I have found that when running our HH->WWgg tagger plugin with workspacestd on a signal file, for example with the command:

cmsRun Systematics/test/workspaceStd.py metaConditions=MetaData/data/MetaConditions/Era2017_RR-31Mar2018_v1.json campaign=Era2017_RR-31Mar2018_v2 dataset=GluGluToHHTo2G2Qlnu_node_cHHH1_TuneCP5_PSWeights_13TeV-powh
eg-pythia8 doHHWWggTag=1 HHWWggTagsOnly=1 maxEvents=500 doSystematics=1 dumpWorkspace=0 dumpTrees=1 useAAA=1 doHHWWggTagCutFlow=1 saveHHWWggFinalStateVars=1 HHWWggAnalysisChannel=SL HHWWgguseZeroVtx=1 doHHWWggDebug=1

(There is some customization for WWgg there, implemented in [1] and [2]) There is about 6 minutes taken between the time of opening the input microAOD file and the start of running through the events:

23-Nov-2020 15:09:08 CET Successfully opened file root://cms-xrd-global.cern.ch//store/user/alesauva/flashgg/2017_1/10_6_4/GluGluToHHTo2G2Qlnu_node_cHHH1_TuneCP5_PSWeights_13TeV-powheg-pythia8/2017_1-10_6_4-v0-RunIIFall17MiniAODv2-PU2017_12Apr2018_94X_mc2017_realistic_v14-v1/201105_142131/0000/myMicroAODOutputFile_36.root

Begin processing the 1st record. Run 1, Event 201310, LumiSection 2014 on stream 0 at 23-Nov-2020 15:15:16.486 CET

(15:09 ---> 15:15)

However when running over data, with the following command:

cmsRun Systematics/test/workspaceStd.py metaConditions=MetaData/data/MetaConditions/Era2017_RR-31Mar2018_v1.json campaign=Era2017_RR-31Mar2018_v2 dataset=/DoubleEG/spigazzi-Era2017_RR-31Mar2018_v2-legacyRun2FullV1-v0-Run2017B-31Mar2018-v1-d9c0c6cde5cc4a64343ae06f842e5085/USER doHHWWggTag=1 HHWWggTagsOnly=1 maxEvents=500 doSystematics=0 dumpWorkspace=0 dumpTrees=1 useAAA=1 processId=Data processType=Data doHHWWggTagCutFlow=1 saveHHWWggFinalStateVars=1 HHWWggAnalysisChannel=SL HHWWgguseZeroVtx=1

It only takes around 22 seconds between opening an input file and the start of running over events:

23-Nov-2020 16:11:27 CET Successfully opened file root://cms-xrd-global.cern.ch//store/user/spigazzi/flashgg/Era2017_RR-31Mar2018_v2/legacyRun2FullV1/DoubleEG/Era2017_RR-31Mar2018_v2-legacyRun2FullV1-v0-Run2017B-31Mar2018-v1/190606_094808/0000/myMicroAODOutputFile_519.root
Begin processing the 1st record. Run 297219, Event 255140966, LumiSection 124 on stream 0 at 23-Nov-2020 16:11:49.666 CET

(16:11:27 --> 16:11:49)

In discussing this with Simone in the past, a hypothesis for this slow running for signals may be an unnecessarily long ordering / going through of the MC scaling input files. This is quite an inconvenient issue as it takes lots of time to attempt to debug locally as it takes at least 6 minutes between each small fix that one wants to test.

If anyone may know why this issue is occurring or a potential solution, it would be much appreciated.

Thank you,
Abe

[1] https://github.com/atishelmanch/flashgg/blob/HHWWgg_dev/Systematics/python/HHWWggCustomize.py
[2] https://github.com/atishelmanch/flashgg/blob/HHWWgg_dev/Taggers/plugins/HHWWggTagProducer.cc

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions