Skip to content

ML4GW/BBNet

Repository files navigation

BBNet

A fast neural network framework for the cosmological computation of Big-Bang Nucleosynthesis (BBN) primordial light-element abundances.

This repository contains the reference implementation accompanying the paper "Accurate neural network emulator for primordial light element abundances" (F. Zhang, H. Diao, B. Li, J. Meyers, P. R. Shapiro). It provides the training and evaluation scripts for two emulator instances — one trained on data from PArthENoPE v3.0 and one trained on data from AlterBBN v2.2 — together with the pre-trained weights used in the paper.


Overview

High-accuracy numerical BBN solvers are a well-known computational bottleneck in cosmological inference. A single full-network call of PArthENoPE (26 nuclides, 100 reactions) can take on the order of $10$ to $30$ seconds of CPU time, making MCMC-style parameter exploration prohibitively expensive, particularly for extended cosmologies that introduce additional free parameters.

BBNet is a deep-learning emulator that learns the mapping

$$ (,\Omega_{\mathrm{b}} h^{2},; \tau_{n},; \Delta N_{\mathrm{eff}},; \kappa_{10},) ;\longmapsto; (,Y_{\mathrm{P}},; \mathrm{D/H},) $$

from training data generated by full numerical BBN solvers. The architecture (ResMLPWithAttn) is a multi-layer perceptron with residual connections and a multi-head self-attention block. Once trained, it produces $Y_{\mathrm{P}}$ and $\mathrm{D/H}$ on the millisecond scale per sample on a single NVIDIA A100 GPU — a speed-up of order $10^{3}$–$10^{4}$ relative to the underlying solvers, while preserving sub-per-mille relative accuracy on held-out test data.

The emulator is designed as a drop-in replacement for the BBN-prediction step inside parameter-inference pipelines. It is not a BBN solver; it reproduces the output of an existing solver (here PArthENoPE or AlterBBN) on the parameter ranges spanned by the training set.

Scope note. This repository ships the BBN emulator and its evaluation scripts only. The modified versions of PArthENoPE and AlterBBN that generate the training data, as well as any external MCMC driver, are external to this codebase.


Repository structure

BBNet/
├── train_bbn_parthenope.py     # Training entry point for the PArthENoPE-trained emulator
├── train_bbn_alterbbn.py       # Training entry point for the AlterBBN-trained emulator
├── mape_for_parthenope.py      # Evaluation / error metrics on the PArthENoPE test set
├── mape_for_alterbbn_exp.py    # Evaluation on the AlterBBN test set (incl. expert mode)
├── weights/                    # Pre-trained model weights and normalisation scalers
└── LICENSE                     # MIT License

The repository currently contains:

  • two training scripts, one per BBN backend (PArthENoPE, AlterBBN);
  • two evaluation scripts that compute MPE / MAPE / RMSPE on the test set, with the AlterBBN evaluator additionally supporting the hierarchical expert inference mode described in the paper;
  • the pre-trained checkpoints under weights/ referenced in the paper.

A more complete description of the paper-level framework — including the modified solvers, the training-data generation pipeline, and the integration into Bayesian inference — is given in the Paper-level framework section below.


Method summary

For full details please consult the paper. The most important specifications relevant to using this code are:

Inputs and parameter ranges (uniform Latin-hypercube sampling, with a log-uniform prior on $\kappa_{10}$):

Parameter Range
$\Omega_{\mathrm{b}} h^{2}$ $[0.005,,0.1]$
$\tau_{n}$ (s) $[875.5,,884.5]$
$\Delta N_{\mathrm{eff}}$ $[-1.0,,1.0]$
$\kappa_{10}$ $[0,,1000]$, sampled on $\log_{10}\kappa_{10}\in[-7,,3]$

Outputs: $Y_{\mathrm{P}}$ and $\mathrm{D/H}$.

Training data: $20{,}000$ samples per BBN code, with an $8:1:1$ split into training, validation, and test sets. PArthENoPE is run in its complete nuclear network configuration (26 nuclides, 100 reactions); AlterBBN is run in RK2_halfstep mode (failsafe=7).

Architecture (ResMLPWithAttn): Linear projection to a 4096-dimensional hidden representation followed by a GeLU activation, an 8-head self-attention block with a residual connection, $N$ residual MLP blocks with dropout ($p=0.3$), and a final linear head to $(Y_{\mathrm{P}},\mathrm{D/H})$. All inputs and outputs are standardised to zero mean and unit variance before training; the per-checkpoint scaler statistics are required at inference time.

Optimisation: AdamW with initial learning rate $5\times 10^{-5}$, weight decay $\lambda_W=10^{-5}$, gradient-norm clipping at $1.0$, and a ReduceLROnPlateau scheduler (factor $0.5$, patience $10$ epochs on validation loss). Batch size $16$.

Loss:

$$ \mathcal{L}_{\mathrm{total}} ;=; \mathcal{L}_{\mathrm{MAE}} ;+; \lambda_{\mathrm{smooth}},\mathcal{L}_{\mathrm{smooth}} ;+; \lambda_{\mathrm{D/H}},\mathcal{L}_{\mathrm{sMAPE}}, $$

with $\lambda_{\mathrm{smooth}}=0.1$, $\lambda_{\mathrm{D/H}}=50$, and the input-perturbation scale $\sigma=0.02$. The sMAPE term acts on the deuterium-to-hydrogen ratio in the physical (de-standardised) domain.

Expert mode (AlterBBN only). Because $\mathrm{D/H}$ from AlterBBN spans several orders of magnitude, the AlterBBN emulator can optionally run in a two-stage expert mode: a base model produces a preliminary $\widehat{\mathrm{D/H}}_{\mathrm{base}}$, and the input is then routed to one of two specialist networks trained on the bands $[10^{-7},,10^{-5})$ and $[10^{-5},,10^{-3}]$. All three networks share the same ResMLPWithAttn backbone but carry their own scaler statistics. Expert mode is activated via the --exp flag in mape_for_alterbbn_exp.py.


Installation

The code targets PyTorch on CPU or CUDA-capable GPU. A minimal environment is:

# Create and activate an environment (example)
conda create -n bbnet python=<TODO: python-version>
conda activate bbnet

# Install PyTorch matching your CUDA setup, see https://pytorch.org/get-started/locally/
pip install torch

# Additional dependencies used by the scripts
pip install numpy scipy scikit-learn pandas tqdm

The exact pinned versions used in the paper are not yet provided. A requirements.txt / environment.yml will be added in a future release; in the meantime please refer to the import statements at the top of each script.

The repository itself does not require a build step: clone and run.

git clone https://github.com/Hdiao112/BBNet.git
cd BBNet

Training

Two training scripts are provided, one per backend. Each script expects a training dataset generated by the corresponding (modified) BBN solver, containing the four physical inputs and the two abundance outputs.

Training data is not bundled with this repository. It can either be regenerated using the modified PArthENoPE / AlterBBN codes referenced in the paper, or — when made available — downloaded from <TODO: data-release-URL>.

# PArthENoPE-trained emulator
python train_bbn_parthenope.py \
    --data <path-to-parthenope-training-data> \
    --out  <path-to-output-checkpoint-dir>

# AlterBBN-trained emulator (base model)
python train_bbn_alterbbn.py \
    --data <path-to-alterbbn-training-data> \
    --out  <path-to-output-checkpoint-dir>

The exact CLI flags exposed by each training script have not been documented here to avoid misstatement. Please run python train_bbn_parthenope.py --help and python train_bbn_alterbbn.py --help for the authoritative argument list.

Each training run produces a model checkpoint together with the input/output normalisation scalers used during training. Both files are required at inference time.

For the AlterBBN expert mode, the two band-specific expert checkpoints are trained on the same data filtered to the corresponding $\mathrm{D/H}$ band; see Appendix A of the paper for details.


Evaluation

Two evaluation scripts compute the percentage-error metrics reported in the paper — RMSPE, MAPE, and MPE — on the held-out test set:

$$ \mathrm{MPE} ;=; \frac{1}{N}\sum_{i=1}^{N}\frac{\hat{y}_i - y_i}{y_i}\times 100%, \qquad \mathrm{RMSPE} ;=; \sqrt{\frac{1}{N}\sum_{i=1}^{N}! \left(\frac{\hat{y}_i - y_i}{y_i}\times 100%\right)^{2}}. $$

# Evaluate the PArthENoPE-trained emulator
python mape_for_parthenope.py \
    --weights <checkpoint-path> \
    --data    <path-to-parthenope-test-data>

# Evaluate the AlterBBN-trained emulator (base model only)
python mape_for_alterbbn_exp.py \
    --weights <base-checkpoint-path> \
    --data    <path-to-alterbbn-test-data>

# Evaluate the AlterBBN-trained emulator with expert routing
python mape_for_alterbbn_exp.py \
    --weights      <base-checkpoint-path> \
    --expert1      <expert1-checkpoint-path> \
    --expert2      <expert2-checkpoint-path> \
    --data         <path-to-alterbbn-test-data> \
    --exp

The exact flag names above are placeholders matching the script semantics described in the paper, not necessarily verbatim. Please consult --help on each script for the authoritative interface.

For reference, the percentage-error metrics reported in the paper on the held-out test sets are:

Backend Output RMSPE (%) MAPE (%) MPE (%)
PArthENoPE $Y_{\mathrm{P}}$ 0.0158 0.0064 $-0.0011$
PArthENoPE $\mathrm{D/H}$ 0.0503 0.0331 $\phantom{-}0.0013$
AlterBBN $Y_{\mathrm{P}}$ 0.0175 0.0055 $-0.0014$
AlterBBN $\mathrm{D/H}$ 0.0799 0.0455 $\phantom{-}0.0002$

Using the pre-trained weights

The weights/ directory contains the checkpoints used to produce the figures and tables in the paper. To use them directly for inference, point the evaluation scripts at the relevant files in weights/:

python mape_for_parthenope.py \
    --weights weights/<parthenope-checkpoint> \
    --data    <path-to-test-data>

Each checkpoint is paired with its scaler statistics, and the two must be loaded together to reproduce the reported metrics. The exact filenames inside weights/ are listed in that directory.

The pre-trained weights apply only within the parameter ranges given in Method summary. Predictions outside those ranges are extrapolations and have not been validated.


Paper-level framework

The paper presents BBNet as a framework — a standardised pipeline of data generation, training, and inference that can be repeated for new BBN codes or new physical models. The full pipeline involves the following components, only some of which are part of this repository:

  1. Modified BBN solvers. PArthENoPE v3.0 and AlterBBN v2.2 modified to accept $\Delta N_{\mathrm{eff}}$ and $\kappa_{10}$ as inputs and to use a consistent set of nuclear reaction rates for the dominant PNG / DPG / DDN / DDP channels. These solvers are released separately by the authors and are not contained in this repository.
  2. Training-data generation. $20{,}000$ Latin-hypercube samples per solver. The data files themselves are not bundled in this repository.
  3. Emulator training and evaluation. Provided here, in train_bbn_*.py and mape_for_*.py.
  4. Pre-trained weights. Provided here, under weights/.
  5. Bayesian inference / MCMC integration. Outside the scope of this repository. The paper notes that, because each BBNet evaluation costs a few milliseconds, the emulator can be plugged into existing MCMC pipelines without modifying the sampler itself.

Planned and optional extensions

The paper explicitly outlines several future directions. None of them are implemented in this repository at present:

  • emulators for the primordial $^{3}\mathrm{He}/\mathrm{H}$ and $^{7}\mathrm{Li}/\mathrm{H}$ abundances, intended to address the lithium problem;
  • treatment of nuclear reaction rates as additional free parameters;
  • training instances for additional BSM scenarios beyond the dark-radiation
    • stiff-fluid extension considered here;
  • ready-made bindings to standard inference packages.

Contributions in these directions are welcome — see Contributing.


Citation

If you use BBNet in academic work, please cite the accompanying paper:

@article{BBNet2025,
  title   = {Accurate neural network emulator for primordial light element abundances},
  author  = {Zhang, Fan and Diao, Hang and Li, Bohua and Meyers, Joel and Shapiro, Paul R.},
  journal = {<TODO: journal>},
  year    = {<TODO: year>},
  eprint  = {<TODO: arXiv-id>},
  archivePrefix = {arXiv},
  doi     = {<TODO: doi>}
}

If you also use the pre-trained weights or the training scripts directly, please cite this repository in addition to the paper.


License

This project is released under the MIT License.


Contributing

Bug reports, fixes, and contributions extending the framework to additional BBN codes or to additional primordial abundances ($^{3}\mathrm{He}/\mathrm{H}$, $^{7}\mathrm{Li}/\mathrm{H}$, etc.) are welcome. Please open an issue or a pull request on GitHub.


Contact

For questions about the code or the paper, please open a GitHub issue or contact the corresponding author of the paper (B. Li, bohuali@gxu.edu.cn).

About

A fast nerual network framework to deal with the cosmology Primordial Nucleosynthesis

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages