Compile MOSES evolutionary program search to Boltzmann sampling on thermodynamic hardware — bridging metta-moses and Extropic/thrml.
- Overview
- Why this matters
- Installation
- Quick start
- How it works
- API reference
- Results
- Hyperon integration outlook
- Cross-project comparison
- Project structure
- Contributing
- Sister Projects
- Acknowledgements
- Appendix: LBM compilation path (optional)
- License
MOSES-THRML compiles MOSES's evolutionary program learning into Boltzmann sampling that runs on Extropic's Thermodynamic Sampling Unit (TSU). You give it a fitness function and knob space; it runs parallel Metropolis-Hastings chains (equivalent to TSU Gibbs sampling) and returns the best program. 77 tests verify end-to-end correctness against metta-moses. Together with PLN-THRML, ECAN-THRML, QuantiMORK-THRML, and Geodesic-THRML, it completes the Hyperon AGI stack on thermodynamic hardware.
New to MOSES? (30-second primer)
MOSES (Meta-Optimizing Semantic Evolutionary Search) evolves short programs to fit a target dataset. Instead of training neural network weights, it searches over symbolic boolean expressions — flipping "knobs" to change program structure.
| Component | Meaning | Analogy |
|---|---|---|
| Knob vector | Binary encoding of a program's structure | A genome in genetic algorithms |
| Fitness | How well the program matches target data | Accuracy on a test set |
| Exemplar | The best program found so far in a subpopulation | The fittest individual |
| Deme | A subpopulation exploring variations of one exemplar | An island in island-model GA |
| Simulated annealing | Accept random knob flips; sometimes accept worse to escape local optima | Shaking a puzzle box to unstick pieces |
The search starts hot (accepting almost any change) and cools down (accepting only improvements), converging on the best program. This temperature schedule is exactly what TSU hardware does natively.
New to TSU / Boltzmann sampling? (30-second primer)
A Thermodynamic Sampling Unit (TSU) is Extropic's chip that uses physical thermal noise to sample from Boltzmann distributions. Instead of simulating randomness in software, the hardware is the random sampler.
| TSU concept | Meaning | Analogy |
|---|---|---|
| pbit | A probabilistic bit — fluctuates between two states based on its energy (hardware uses Ising ±1; software often maps to 0/1) | A biased coin that flips itself |
| Energy E(x) | How "costly" a configuration x is | Height on a fitness landscape |
| Boltzmann distribution | P(x) ∝ exp(−E/T) — lower energy = higher probability | Water settling into valleys |
| Gibbs sampling | Update each pbit conditioned on its neighbors until equilibrium | Repeatedly re-rolling each die given the others |
| Temperature T | Controls exploration vs exploitation | High T = random walk; low T = greedy descent |
The key insight: MOSES's Metropolis-Hastings acceptance criterion
min(1, exp(ΔFitness/T)) and TSU's Gibbs update rule both converge to the
same Boltzmann distribution P(x) ∝ exp(−E/T). The algorithms differ in
mechanism (MH proposes-then-accepts; Gibbs updates each pbit conditionally),
but they target the same stationary distribution — so TSU hardware can
replace the software SA loop without changing the distribution being sampled.
See arXiv:2510.23972 for the TSU architecture.
Technical summary: Two formal bridges prove MOSES runs natively on TSU.
(1) SA ↔ Gibbs: both MOSES's MH acceptance min(1, exp(Δfitness/T))
and TSU's Gibbs sampling target the same Boltzmann distribution with energy
E = −fitness: P(x) ∝ exp(fitness/T). The mechanisms differ (MH
propose-accept vs Gibbs conditional update) but the stationary distribution
is identical.
(2) Deme parallelism ↔ multi-chain: MOSES metapopulation's multiple
demes = multiple parallel Markov chains on the same TSU chip (n_chains).
An optional third path compiles boolean programs (SDNF) to RBM weights via LBM Theorem 1 (external, non-Hyperon) — see Appendix: LBM compilation path.
MOSES's core task is searching a combinatorial program space — trying knob flips and accepting or rejecting them. Each architecture parallelizes this differently:
| CPU (metta-moses) | GPU (estimated) | TSU (this project) | |
|---|---|---|---|
| Parallelism | Sequential knob flips | Batched fitness evaluation | All chains sample simultaneously |
| Bottleneck | Program space size × evaluations | Memory bandwidth | Mixing time (energy barrier height) |
| Best fit | Small knob spaces, exact fitness | Large population batches | Knob space fits on-chip¹ |
On a TSU, the knob space compiles into pbit states updated via block Gibbs passes — physical thermal noise drives configurations toward the same Boltzmann equilibrium that MOSES's MH chains target, so the hardware natively samples from the correct distribution.
¹ Knob spaces exceeding a single TSU chip require multi-chip partitioning with communication overhead. Mixing time depends on energy landscape structure and can grow sharply for landscapes with tall barriers.
The TSU architecture paper (arXiv:2510.23972) reports ~10,000× lower energy per sample vs GPU baselines on DTM sampling (E_cell ≈ 2 femtojoules). MOSES program search is expected to benefit similarly but has not been independently benchmarked.
git clone --recurse-submodules https://github.com/xiaohanma-oss/MOSES-THRML.git
cd MOSES-THRML
pip install -e . # core only (numpy)
pip install -e ".[metta]" # + MeTTa bridge (requires hyperon)
pip install -e ".[dev]" # + pytest for running testsThe metta-moses submodule provides test baselines. If you cloned without
--recurse-submodules, rungit submodule update --init.
from moses_thrml import (
KnobSpace, make_xor_dataset, make_energy_fn,
GibbsSchedule, run_thermodynamic_search, diagnose_convergence,
)
import numpy as np
# Define the problem
dataset = make_xor_dataset(3)
def xor_program(inputs, bits):
active = np.where(bits == 1)[0]
if len(active) == 0:
return np.zeros(len(inputs), dtype=np.int8)
return (np.sum(inputs[:, active], axis=1) % 2).astype(np.int8)
energy_fn = make_energy_fn(dataset, xor_program, lambda_=0.0)
# Run thermodynamic search (= MOSES SA on TSU)
schedule = GibbsSchedule(n_warmup=300, n_samples=1000, T_start=1.0, T_end=0.05)
result = run_thermodynamic_search(n_knobs=3, energy_fn=energy_fn,
schedule=schedule, n_chains=30)
print(f"Best program: {result.best_bits}") # [1 1 1]
print(f"Accuracy: {result.best_fitness:.2%}") # 100.00%
# Convergence diagnostics
diag = diagnose_convergence(result)
print(f"R-hat: {diag['r_hat']:.3f}, ESS: {diag['ess']:.0f}")from hyperon import MeTTa
from moses_thrml.metta import register_all
metta = MeTTa()
register_all(metta)
# Search for the best XOR-3 program
results = metta.run('!(thrml-moses-search 3 30 300 1000 xor-3)')
# => (bits 1 1 1)
# Score a candidate program
results = metta.run('!(thrml-moses-score 1 1 1 xor-3)')
# => (fitness 1.000000)pytest tests/ -v # all 77 tests
pytest -m slow -v # end-to-end + scalability tests- Represent —
KnobSpacemaps a program tree to a binary pbit vector (encode/decode/neighbors) - Score — Fitness function compiled to energy:
E = −fitness(BehavioralScore + ComplexityPenalty) - Search — Parallel MH chains with annealing schedule = TSU Boltzmann sampling at β = 1/T
- Select — Exemplar selection from best chains; deme management on CPU
CPU side TSU side
──────────────────────── ────────────────────────
Representation building run_thermodynamic_search()
KnobSpace(exemplar) → parallel MH chains
(= Boltzmann sampling)
Fitness compilation
fitness → E = -fitness → energy landscape
Exemplar selection (deme.py) ← best_bits from chains
| MOSES concept | MOSES-THRML construct | TSU hardware |
|---|---|---|
| Program (boolean expression) | Binary knob vector | pbit state |
| Fitness function | Energy E = −fitness | Energy landscape |
| SA accept criterion | Metropolis-Hastings min(1, exp(−ΔE/T)) |
Gibbs sampling update |
| Temperature schedule | GibbsSchedule(T_start, T_end) |
Inverse temperature β = 1/T |
| Knob flip (mutation) | Bit flip proposal | pbit thermal fluctuation |
| Deme (subpopulation) | Parallel Markov chain | Independent chain on chip |
| SDNF formula (optional LBM path) | RBM weights (LBM Thm.1) | Bipartite EBM on DTCA |
| Metapopulation | Multi-chain orchestration | n_chains parameter |
| Exemplar selection | CPU-side argmin(E) | Readout → CPU |
Representation:
| Function/Class | Module | Description |
|---|---|---|
KnobSpace |
knobs.py |
Program tree ↔ binary pbit vector (encode/decode/neighbors) |
Fitness:
| Function/Class | Module | Description |
|---|---|---|
Dataset |
fitness.py |
Named tuple: inputs + labels |
BehavioralScore |
fitness.py |
Accuracy-based fitness component |
ComplexityPenalty |
fitness.py |
Program size regularization |
CompositeEnergy |
fitness.py |
E = −fitness (combines behavioral + complexity) |
make_energy_fn |
fitness.py |
Build energy function from dataset + program evaluator |
make_xor_dataset, make_mux_dataset |
fitness.py |
Canonical test problem datasets |
EBM compilation (optional — external tool, not part of Hyperon ecosystem):
| Function/Class | Module | Description |
|---|---|---|
SDNFClause, SDNFFormula |
ebm.py |
SDNF representation for boolean programs |
RBMWeights |
ebm.py |
RBM weight container |
sdnf_to_rbm_weights |
ebm.py |
LBM Theorem 1: SDNF → RBM |
rbm_free_energy, rbm_to_energy_fn |
ebm.py |
RBM weights → energy function (marginalise hidden units) |
xor_as_sdnf |
ebm.py |
XOR-n as canonical SDNF formula |
Search:
| Function/Class | Module | Description |
|---|---|---|
GibbsSchedule |
search.py |
Annealing schedule (T_start, T_end, warmup, samples) |
SamplingResult |
search.py |
Container for chain histories + best solution |
run_thermodynamic_search |
search.py |
Parallel MH chains = TSU Boltzmann sampling |
extract_best |
search.py |
Extract best bits from sampling result |
diagnose_convergence |
search.py |
R-hat, ESS convergence diagnostics |
Deme management:
| Function/Class | Module | Description |
|---|---|---|
Deme |
deme.py |
Single deme: exemplar + knob space + search result |
Metapopulation |
deme.py |
Multi-deme orchestration (exemplar selection on CPU) |
run_deme |
deme.py |
Run thermodynamic search for one deme |
| Function | Purpose |
|---|---|
register_all(metta) |
Register thrml-moses-search, thrml-moses-score, thrml-moses-represent with a MeTTa runner |
The thrml-moses-search operator runs thermodynamic search on a named problem;
thrml-moses-score evaluates a candidate program — see Quick start for examples.
77 tests, all passing.
| Problem | Description | Optimal accuracy |
|---|---|---|
xor-3 |
XOR parity of 3 inputs | 100% (bits=[1,1,1]) |
xor-4 |
XOR parity of 4 inputs | 100% (bits=[1,1,1,1]) |
mux-2 |
2-address multiplexer | 100% |
tests/test_knobs.py — 21 tests: encode/decode roundtrip, neighbourhood
tests/test_fitness.py — 13 tests: E = -fitness monotonicity, problem datasets
tests/test_ebm.py — 18 tests: LBM Theorem 1 soundness (satisfying < non-satisfying)
tests/test_search.py — 16 tests: convergence, diagnostics, reproducibility
tests/test_deme.py — 9 tests: deme management, metapopulation
See PLN-THRML README for the full heterogeneous pipeline design (Control → Compile → Sample).
This project contributes the MOSES tier: evolutionary program search compiled to TSU Boltzmann sampling, handling program optimization alongside PLN's factor-graph inference and ECAN's LBM attention diffusion. All three workloads are TSU-native — co-location on one chip via time-multiplexing or spatial partitioning is an open question (depends on knob space size, chain count, and mixing time).
| PLN-THRML | ECAN-THRML | MOSES-THRML | |
|---|---|---|---|
| Compiles | PLN inference rules | ECAN attention diffusion | MOSES program search |
| TSU primitive | Factor graph Gibbs | Lattice-Boltzmann fluid collision+streaming | EBM Gibbs (SA equiv.) |
| Variables | TruthValue (s, c) | STI density field | knob vector (binary) |
| CPU handles | Rule structure (Q_logic) | Goal/boundary conditions | Representation building |
| TSU handles | Posterior sampling | Diffusion/advection | Deme-local knob search |
| Formal basis | Boltzmann factor graph | Navier-Stokes ↔ Lattice-Boltzmann | MH ↔ Gibbs |
moses_thrml/ Main package
__init__.py Public API re-exports
knobs.py KnobSpace: program tree ↔ binary pbit vector
fitness.py BehavioralScore + ComplexityPenalty → CompositeEnergy
ebm.py (optional) LBM Theorem 1: SDNF boolean programs → RBM weights
search.py run_thermodynamic_search: parallel MH chains = TSU Gibbs
deme.py run_deme + Metapopulation: population management
metta/ MeTTa integration layer (optional, requires hyperon)
__init__.py Entry point — exports register_all()
rules.py Grounded ops: thrml-moses-search, thrml-moses-score
dispatch.metta MeTTa dispatch rules
vendor/
metta-moses/ iCog Labs upstream (git submodule) — test baselines
tests/
conftest.py Shared fixtures and tolerance constants
test_knobs.py 21 tests: encode/decode roundtrip, neighbourhood
test_fitness.py 13 tests: E = -fitness monotonicity, problem datasets
test_ebm.py 18 tests: LBM Theorem 1 soundness (optional path)
test_search.py 16 tests: convergence, diagnostics, reproducibility
test_deme.py 9 tests: deme management, metapopulation
docs/
interactive_overview.html Interactive visualization (open in browser)
simple_example.html Simple example visualization
See CONTRIBUTING.md for development setup, testing, code conventions, and pull request guidelines.
Five projects compiling Hyperon's cognitive architecture to thermodynamic hardware:
| Project | What it compiles |
|---|---|
| PLN-THRML | Probabilistic inference → Boltzmann energy tables |
| ECAN-THRML | Attention diffusion → Lattice Boltzmann simulation |
| MOSES-THRML | Program evolution → Boltzmann sampling |
| QuantiMORK-THRML | Predictive coding → wavelet-sparse factor graphs |
| Geodesic-THRML | Unified geodesic scheduler for all above |
- metta-moses — iCog Labs
- thrml — Extropic AI factor graph library
An alternative compilation path uses Logical Boltzmann Machine (LBM) Theorem 1 (Tran, Mota, Garcez 2025) to compile boolean programs (SDNF) directly into RBM weights. This is an external tool (not part of the Hyperon ecosystem) and is not required by the main search pipeline.
from moses_thrml import xor_as_sdnf, sdnf_to_rbm_weights, rbm_to_energy_fn
# MOSES program (as SDNF) → RBM weights → energy function → thermodynamic search
formula = xor_as_sdnf(3) # 4 clauses, 3 variables
weights = sdnf_to_rbm_weights(formula) # LBM Theorem 1
energy_fn = rbm_to_energy_fn(weights) # free energy (hidden units marginalised)
result = run_thermodynamic_search(n_knobs=3, energy_fn=energy_fn, n_chains=20)
assert formula.is_satisfied_by(result.best_bits) # finds satisfying assignmentImplementation: ebm.py (280 lines, 18 tests in
test_ebm.py).
MIT — Copyright (c) 2026 Xiaohan Ma