A lightweight framework for model-based multi-objective optimization with built-in checkpointing and reproducibility.
Key Features:
- Config-First Design: Entire experiment specified in config (dict or YAML)
- Multi-Oracle Support: Optimize multiple independent objectives with different input dimensions
- Checkpointing: Automatic save/resume/rollback functionality
- Reproducible: Built-in seed management
- Extensible: Easy to add custom models, algorithms, and generators
Option 1: pip install (Recommended)
Install in development mode (editable):
cd /path/to/mpBAX
pip install -e .Install normally:
pip install .With optional PyTorch support (for DANetModel):
pip install -e ".[torch]"With development tools (pytest):
pip install -e ".[dev]"All optional dependencies:
pip install -e ".[all]"Option 2: PYTHONPATH (No installation)
If you prefer not to install:
export PYTHONPATH=/path/to/mpBAX:$PYTHONPATHDependencies:
- Core:
numpy>=1.20.0,pyyaml>=5.0 - Optional:
torch>=1.13.0(for DANetModel plugin) - Dev:
pytest>=7.0(for running tests)
Create a single-file optimization script:
import numpy as np
from mpbax.core.engine import Engine
from mpbax.core.model import DummyModel
from mpbax.core.algorithm import GreedySampling
# Define oracle function
def quadratic(X: np.ndarray) -> np.ndarray:
"""X has shape (n, d), returns Y with shape (n, 1)"""
return np.sum((X - 0.5)**2, axis=1, keepdims=True)
# Configure optimization
config = {
'seed': 42,
'max_loops': 10,
'checkpoint': {'dir': 'checkpoints', 'freq': 1},
'oracles': [{
'name': 'quadratic',
'input_dim': 2,
'n_initial': 20,
'function': {'class': quadratic}, # Pass function directly
'model': {'class': DummyModel} # Pass class directly
}],
'algorithm': {
'class': GreedySampling,
'params': {'input_dims': [2], 'n_propose': 10, 'n_candidates': 1000}
}
}
# Run optimization
engine = Engine(config)
engine.run()That's it! mpBAX will:
- Generate 20 initial samples
- Evaluate them with
quadraticfunction - Train a model on the data
- Use the model to propose 10 new candidates
- Repeat for 10 loops total
- Save checkpoints every loop
For larger projects, use YAML configs and import paths instead of instances:
# config.yaml
oracles:
- function:
class: 'myproject.oracles.quadratic' # Import path stringSee examples/ for more examples.
mpBAX v2 introduces flexible parameter placement for cleaner configs:
Default Generator Shortcut:
# Use default uniform generator with custom number of samples
'oracles': [{
'input_dim': 2,
'generate': {'params': {'n': 20}}, # No need to specify 'class'
...
}]Parameter-Level Placement:
# Put input_dim where it's used (in model params)
'oracles': [{
'generate': {'params': {'n': 20, 'd': 2}},
'model': {
'class': DummyModel,
'params': {'input_dim': 2} # Input dim specified here
}
}]Custom Generator with All Params:
# Pass all params (n, d, custom) to generator
'generate': {
'class': lhs_generator,
'params': {'n': 20, 'd': 4, 'criterion': 'maximin'}
}Cleaner Top-Level Config:
# Use 'training' instead of 'model' to avoid confusion
config = {
'training': {'mode': 'finetune'}, # Clear and unambiguous
'oracles': [...]
}All old config patterns remain supported for backward compatibility.
See examples/ directory:
01_basic_optimization.py- Simple single-objective example02_multi_oracle.py- Multi-objective optimization03_multi_output.py- Oracle with multiple outputs04_checkpointing.py- Resume and rollback05_custom_model.py- Implementing custom models06_danet_model.py- Using DANetModel plugin
Oracle functions are the core of mpBAX - they represent your expensive simulation or objective function.
Shape Convention:
- Input X: Shape
(n, d)wheren= number of samples,d= input dimension - Output Y: Shape
(n, k)wherek ≥ 1(number of outputs per sample)
Example - Single Output:
def my_oracle(X: np.ndarray) -> np.ndarray:
"""
Args:
X: Input with shape (n, d)
Returns:
Y: Output with shape (n, 1)
"""
return np.sum(X**2, axis=1, keepdims=True) # Shape (n, 1)Example - Multi-Output:
def multi_output_oracle(X: np.ndarray) -> np.ndarray:
"""
Args:
X: Input with shape (n, d)
Returns:
Y: Output with shape (n, 3) # Three outputs per input
"""
y1 = np.sum(X**2, axis=1, keepdims=True)
y2 = np.sum(X, axis=1, keepdims=True)
y3 = np.max(X, axis=1, keepdims=True)
return np.hstack([y1, y2, y3]) # Shape (n, 3)Important: Always use keepdims=True or reshape to ensure Y is 2D, never 1D.
Everything is specified in a config dict/YAML:
- Oracle functions: Import paths (YAML) or direct instances (Python)
- Models: Per-oracle model configuration with hyperparameters
- Generators: Custom initial sample generators (optional)
- Algorithm: Proposal strategy with parameters
Each oracle operates independently with its own:
- Input dimension (can differ between oracles)
- Number of initial samples
- Model and hyperparameters
- Data storage and checkpoints
Algorithms receive predictions from all models and propose candidates for each oracle.
Multi-Output: Single oracle returns multiple values
def oracle(X): # Shape (n, d)
return Y # Shape (n, 3) - three outputs per inputMulti-Oracle: Multiple independent oracles
config['oracles'] = [
{'name': 'obj1', 'input_dim': 2, ...},
{'name': 'obj2', 'input_dim': 3, ...}
]Automatic checkpointing enables:
- Resume: Continue from latest checkpoint
- Rollback: Delete checkpoints after a specific loop
- Reproducibility: Complete experiment state saved
Directory structure:
checkpoints/
├── config.yaml
├── state.pkl
├── oracle_name_1/
│ ├── data_0.pkl, data_1.pkl, ...
│ ├── model_0_final.pkl, model_1_final.pkl, ...
│ └── model_0_best.pkl, ... (if checkpoint_mode='both')
└── oracle_name_2/
└── ...
Here's a complete config demonstrating all available options:
# Top-level settings
seed: 42
max_loops: 10
# Checkpointing configuration
checkpoint:
dir: 'checkpoints'
freq: 1
resume_from: null # or 'latest' or loop number
# Global training settings (renamed from 'model' to avoid confusion)
training:
mode: 'retrain' # 'retrain' or 'finetune'
checkpoint_mode: 'final' # 'final', 'best', or 'both'
# Oracle configurations (list - one per oracle)
oracles:
- name: 'my_objective'
input_dim: 4 # Can also be in model.params or generate.params.d
n_initial: 20 # Can also be in generate.params.n
# Oracle function
function:
class: 'myproject.oracles.my_oracle' # or direct instance
params: {} # Empty dict calls factory with no args
# Optional: Custom generator (null = uniform [0,1]^d)
generate: null
# Alternative patterns:
# generate:
# params: {n: 20} # Use default generator with custom n
# generate:
# class: 'myproject.generators.lhs_generator'
# params: {n: 20, d: 4, criterion: 'maximin'} # All params here
# Model for this oracle
model:
class: 'DummyModel' # or 'DANetModel' or custom
params: {} # Can include input_dim here instead of oracle level
# Algorithm configuration
algorithm:
class: 'GreedySampling'
params:
input_dims: [4]
n_propose: 10
n_candidates: 1000| Field | Type | Description |
|---|---|---|
seed |
int | Random seed for reproducibility |
max_loops |
int | Maximum optimization loops |
checkpoint |
dict | Checkpointing configuration |
training |
dict | Global training settings (formerly model) |
oracles |
list | Oracle configurations (see below) |
algorithm |
dict | Algorithm configuration |
Note: training was renamed from model to avoid confusion with per-oracle model field. The old name is still supported with a deprecation warning.
| Field | Type | Default | Description |
|---|---|---|---|
dir |
str | required | Checkpoint directory path |
freq |
int | 1 | Save frequency (1=every loop) |
resume_from |
str/int/null | null | Resume point ('latest', loop number, or null) |
Usage examples:
Resume from latest checkpoint:
config['checkpoint']['resume_from'] = 'latest'Resume from specific loop:
config['checkpoint']['resume_from'] = 5Rollback to previous checkpoint:
from mpbax.core.checkpoint import CheckpointManager
manager = CheckpointManager('checkpoints')
manager.delete_checkpoints_after(loop=3) # Delete loops 4+| Field | Type | Default | Description |
|---|---|---|---|
mode |
str | 'retrain' | Training mode: 'retrain' or 'finetune' |
checkpoint_mode |
str | 'final' | Model checkpoint: 'final', 'best', or 'both' |
Training Modes:
Retrain (default) - Creates fresh model instance each loop, trains on all accumulated data from scratch. Simple and suitable for most models.
config['training'] = {'mode': 'retrain'}Finetune - Reuses model instance from previous loop, continues training without resetting weights. Preserves normalization parameters (e.g., X_mu, X_sigma from loop 0). Ideal for neural networks.
config['training'] = {'mode': 'finetune'}Checkpoint Modes:
'final': Save model after training completes (used for resumption)'best': Save best model if tracked viaget_best_model_snapshot()'both': Save both final and best models
| Field | Type | Description |
|---|---|---|
name |
str | Oracle identifier (used for checkpoint dirs) |
input_dim |
int (optional) | Input dimensionality (can be in model.params or generate.params.d) |
n_initial |
int (optional) | Number of initial samples (can be in generate.params.n) |
function |
dict | {'class': func_or_path, 'params': {}} |
generate |
dict/null | Custom generator or null for uniform [0,1]^d |
model |
dict | {'class': model_class_or_path, 'params': {}} |
Flexible Parameter Placement:
The framework supports multiple ways to specify input_dim and n_initial:
# Pattern 1: Traditional (oracle level)
oracles:
- input_dim: 4
n_initial: 20
# Pattern 2: input_dim in model.params
oracles:
- model:
params: {input_dim: 4}
# Pattern 3: n in generate.params (default generator shortcut)
oracles:
- input_dim: 4
generate:
params: {n: 20}
# Pattern 4: All params in custom generator
oracles:
- generate:
class: 'myproject.generators.custom'
params: {n: 20, d: 4, scale: 0.5}
model:
params: {input_dim: 4}At least one location must specify each parameter. The framework validates that required params exist somewhere in the config.
| Field | Type | Description |
|---|---|---|
class |
str/class | Algorithm class or import path |
params |
dict | Algorithm parameters |
Built-in algorithms:
RandomSampling: Random proposalsGreedySampling: Greedy selection from candidates (requiresn_candidates)
Models can access loop tracking metadata to weight recent data. This is useful for adaptive optimization where recent samples are more informative about promising regions.
def train(self, X, Y, metadata=None):
if metadata and 'loop_indices' in metadata:
loop_indices = metadata['loop_indices']
max_loop = np.max(loop_indices)
# Weight recent data higher (e.g., 10x)
weights = np.where(loop_indices == max_loop, 10.0, 1.0)
# Use weights in loss functionThe DataHandler automatically tracks which loop each sample came from, making this information available to models during training.
See DANetModel for a complete implementation example using PyTorch with sample weighting.
mpBAX/
├── mpbax/ # Core package
│ ├── core/ # Core components
│ │ ├── engine.py # Main orchestrator
│ │ ├── evaluator.py # Oracle wrapper
│ │ ├── data_handler.py # Data storage
│ │ ├── model.py # Model interface
│ │ ├── algorithm.py # Algorithm interface
│ │ └── checkpoint.py # Checkpointing
│ └── plugins/ # Extensions
│ └── models/ # Model plugins
│ └── da_net.py # Neural network model
├── examples/ # Example scripts
├── tests/ # Test suite
├── README.md
└── config.yaml # Config template
- Simplicity: Use the simplest and most robust approach
- Minimal Dependencies: numpy, pyyaml, pickle - that's it
- No Assumptions: Common API, no isinstance checks
- Shape Conventions: X is (n, d), Y is (n, k) where k ≥ 1
Custom Model:
from mpbax.core.model import BaseModel
class MyModel(BaseModel):
def __init__(self, input_dim, my_param=1.0):
super().__init__(input_dim)
self.my_param = my_param
def train(self, X, Y, metadata=None):
# Training logic here
self.is_trained = True
def predict(self, X):
# Prediction logic here
return Y_pred # Shape (n, k)Use in config:
'model': {
'class': MyModel, # or 'myproject.models.MyModel'
'params': {'my_param': 2.0}
}Custom Algorithm:
from mpbax.core.algorithm import BaseAlgorithm
class MyAlgorithm(BaseAlgorithm):
def __init__(self, input_dims, n_propose, my_param=1.0):
super().__init__(input_dims, n_propose)
self.my_param = my_param
def propose(self, fn_pred_list):
# fn_pred_list: list of predict functions (one per oracle)
# Return: list of X arrays (one per oracle)
X_list = []
for i, fn_pred in enumerate(fn_pred_list):
X_candidates = self.generate_candidates(self.input_dims[i])
# Selection logic here
X_proposed = self.select_best(X_candidates, fn_pred)
X_list.append(X_proposed)
return X_listCustom Generator:
Generators are functions that generate initial samples. They receive all params during the actual call (not during instantiation):
def lhs_generator(n, d, criterion='maximin'):
"""Latin Hypercube Sampling generator.
Args:
n: Number of samples to generate
d: Input dimensionality
criterion: LHS sampling criterion
Returns:
X with shape (n, d)
"""
# Generate samples using LHS
return XUse in config:
'generate': {
'class': lhs_generator, # or import path
'params': {
'n': 20, # Number of samples
'd': 4, # Dimensionality
'criterion': 'maximin' # Custom parameter
}
}The generator receives all params specified in the config during the call. This allows flexible signatures beyond the standard (n, d) pattern.
Run tests:
# All tests
python -m pytest tests/
# Specific component
python tests/test_model.py
# With pytest
pytest tests/ -vTest structure:
test_data_handler.py- DataHandler teststest_evaluator.py- Evaluator teststest_model.py- Model interface teststest_algorithm.py- Algorithm teststest_checkpoint.py- Checkpointing teststest_engine.py- Integration tests
See tests/README.md for details.
DANetModel - PyTorch neural network surrogate:
- Automatic input normalization (preserved across loops)
- Sample weighting for recent data
- Adaptive epochs (initial vs finetuning)
- Early stopping
- GPU support
See mpbax/plugins/models/README.md for full documentation.
Adding a Plugin:
- Create file in
mpbax/plugins/{category}/ - Inherit from appropriate base class
- Implement required methods
- Document in plugin README