Skip to content

[RF] HS3 export: continuous pdf paired with binned dataset has unrecoverable evaluation semantics #22598

@kratsg

Description

@kratsg

Context: we are implementing an independent HS3 consumer (pyhs3) and validating it against quickFit NLL values on an ATLAS diHiggs (bbγγ) workspace exported via RooFit's HS3 JSON export. This is a request to make the evaluation semantics of binned-data-against-continuous-pdf recoverable from the exported file.

cc @cburgard @Phmonski

The situation

The exported datasets are binned:

{ "name": "AsimovData_0_Run2HM_1", "type": "binned",
  "axes": [{ "name": "atlas_invMass_Run2HM_1",
             "min": 105.0, "max": 160.0, "nbins": 220 }],
  "contents": [0.344, 0.342, "..."] }   // Σ ≈ 40.7 events

while the paired channel pdf is a continuous mixture_dist over the same variable. This is presumably faithful to the original workspace (a binned Asimov RooDataHist), so not an export bug per se — but the file does not say how RooFit evaluates this pairing:

(A) bin centers:    log L = Σ_b c_b · log pdf(x_b^center)
(B) bin integrals:  log L = Σ_b c_b · log ∫_bin_b pdf(x) dx

(cf. IntegrateBins / binned-likelihood attributes). The two differ by a parameter-dependent amount, so an independent consumer reproducing the NLL curve cannot know which to implement. We have filed a matching HS3-spec issue asking for the semantics to be definable: hep-statistics-serialization-standard/hep-statistics-serialization-standard#93.

Suggestions (increasing order of usefulness)

  • document the evaluation convention the export assumes;
  • include the relevant evaluation options (e.g. IntegrateBins, binned-likelihood attributes) in the export when they are set;
  • optionally support exporting the dataset in the representation the fit actually used.

Related question

For absolute-NLL comparisons: quickFit/RooFit NLLs include data-only constants (e.g. the −log N! of the extended term and constraint normalization constants) that are not part of the serialized model. A short statement in the RooFitHS3 docs of which constants RooFit's createNLL includes would make cross-tool validation much less archaeological.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions