USE LIBGL_ALWAYS_SOFTWARE=1 when eval
This is a fork of PufferLib for the class APM_5DA01_TP on multiagent systems.
This repository includes puffer_ctf, a custom C version of a Capture-the-Flag environment for autonomous marine vehicles inspired by pyquaticus. Two teams (blue and red) compete to grab the opposing team's flag and return it to their base, while tagging opponents who venture into their territory.
uv venv
uv pip install -e .
source .venv/bin/activate
python setup.py build_ext --inplace --forceTest and play the environment manually with keyboard controls (press space to toggle human override):
puffer eval puffer_ctf --train.device cpupuffer train puffer_ctf --train.device cpu --wandbThe reference throughput in a laptop is around ~80K SPS (steps per second).
Evaluation:
puffer eval puffer_ctf --train.device cpu --load-model-path latest
puffer eval puffer_ctf --train.device cpu --wandb --load-id <wandb_run_id>The project is open-ended: extend the environment and/or the learning algorithm and evaluate the impact. It can be based on this repo or on pyquaticus, a fork of the original pyquaticus.
- Use a different algorithm: try MAPPO, IPPO, or other MARL algorithms.
- Modify the observation space: add or remove features and study the effect on learned behavior.
- Modify the reward function: shape rewards to encourage different emergent behaviors.
- Add role specialization: assign each agent a fixed role (attacker / defender) and train specialized policies.
- Algorithm: you can tweak PufferLib's core training algorithm in pufferlib/pufferl.py (see the TODOs).
- Observation space / environment changes: modify pufferlib/ocean/ctf/ctf.h (in C!), for instance
compute_obs()to change the local observation computation. - Rewards: edit
compute_rewardsin pufferlib/ocean/ctf/ctf.h. - Role specialization: augment the observation space with a one-hot encoding of the role in pufferlib/ocean/ctf/ctf.h.
- If you encounter some errors in your laptop or want to run larger experiments, check this for remote access to the school's computing cluster.
- If your machine has a GPU, you can speed up training by using it (e.g.,
--train.device cudainstead of--train.device cpu). - Start with small changes and test them out in a quick training run (e.g.,
puffer train puffer_ctf --train.device cpu --train.total-timesteps 25_000_000) to verify they work as expected. - Use WandB to track experiments and compare results across different configurations.
- You can debug functions in Python (e.g., reward or observation computation) by adding
breakpoint()where needed (e.g.,step()in pufferlib/ocean/ctf/ctf.py). This will drop you into an interactive pdb session. Note that this can be tricky when running with multiple workers, so you may want to test with a single environment (see comment in config/ocean/ctf.ini).