Context: we are implementing an independent HS3 consumer (pyhs3) and validating it against quickFit NLL values on an ATLAS diHiggs (bbγγ) workspace exported via RooFit's HS3 JSON export. This is a request to make the evaluation semantics of binned-data-against-continuous-pdf recoverable from the exported file.
cc @cburgard @Phmonski
The situation
The exported datasets are binned:
{ "name": "AsimovData_0_Run2HM_1", "type": "binned",
"axes": [{ "name": "atlas_invMass_Run2HM_1",
"min": 105.0, "max": 160.0, "nbins": 220 }],
"contents": [0.344, 0.342, "..."] } // Σ ≈ 40.7 events
while the paired channel pdf is a continuous mixture_dist over the same variable. This is presumably faithful to the original workspace (a binned Asimov RooDataHist), so not an export bug per se — but the file does not say how RooFit evaluates this pairing:
(A) bin centers: log L = Σ_b c_b · log pdf(x_b^center)
(B) bin integrals: log L = Σ_b c_b · log ∫_bin_b pdf(x) dx
(cf. IntegrateBins / binned-likelihood attributes). The two differ by a parameter-dependent amount, so an independent consumer reproducing the NLL curve cannot know which to implement. We have filed a matching HS3-spec issue asking for the semantics to be definable: hep-statistics-serialization-standard/hep-statistics-serialization-standard#93.
Suggestions (increasing order of usefulness)
- document the evaluation convention the export assumes;
- include the relevant evaluation options (e.g.
IntegrateBins, binned-likelihood attributes) in the export when they are set;
- optionally support exporting the dataset in the representation the fit actually used.
Related question
For absolute-NLL comparisons: quickFit/RooFit NLLs include data-only constants (e.g. the −log N! of the extended term and constraint normalization constants) that are not part of the serialized model. A short statement in the RooFitHS3 docs of which constants RooFit's createNLL includes would make cross-tool validation much less archaeological.
Context: we are implementing an independent HS3 consumer (pyhs3) and validating it against quickFit NLL values on an ATLAS diHiggs (bbγγ) workspace exported via RooFit's HS3 JSON export. This is a request to make the evaluation semantics of binned-data-against-continuous-pdf recoverable from the exported file.
cc @cburgard @Phmonski
The situation
The exported datasets are binned:
{ "name": "AsimovData_0_Run2HM_1", "type": "binned", "axes": [{ "name": "atlas_invMass_Run2HM_1", "min": 105.0, "max": 160.0, "nbins": 220 }], "contents": [0.344, 0.342, "..."] } // Σ ≈ 40.7 eventswhile the paired channel pdf is a continuous
mixture_distover the same variable. This is presumably faithful to the original workspace (a binned Asimov RooDataHist), so not an export bug per se — but the file does not say how RooFit evaluates this pairing:(cf.
IntegrateBins/ binned-likelihood attributes). The two differ by a parameter-dependent amount, so an independent consumer reproducing the NLL curve cannot know which to implement. We have filed a matching HS3-spec issue asking for the semantics to be definable: hep-statistics-serialization-standard/hep-statistics-serialization-standard#93.Suggestions (increasing order of usefulness)
IntegrateBins, binned-likelihood attributes) in the export when they are set;Related question
For absolute-NLL comparisons: quickFit/RooFit NLLs include data-only constants (e.g. the −log N! of the extended term and constraint normalization constants) that are not part of the serialized model. A short statement in the RooFitHS3 docs of which constants RooFit's
createNLLincludes would make cross-tool validation much less archaeological.