This repository contains the code for "Subjective Depth & Timescale Transformers," a research project implementing and evaluating dynamic transformer architectures. It includes two novel models, the Subjective Depth Transformer (SDT) and the Subjective Timescale Transformer (STT), which leverage Bayesian surprise signals to dynamically route computation, learning where and when to compute.
The framework is built on PyTorch, Hugging Face transformers, and accelerate, with configuration managed by hydra.
Clone the repository. The training scripts will handle the rest of the environment setup automatically.
git clone https://github.com/your-username/dynamic-transformers.git
cd dynamic-transformersThe easiest way to run training is using the provided scripts. They will automatically set up a Python virtual environment with uv, install dependencies, and launch the training run. Evaluation is automatically performed at the end of training, with results saved to eval_results.json in the model's output directory.
For local development and debugging, use train_mac.sh. This script runs a small-scale training job using the laptop.yaml configuration.
chmod +x train_mac.sh
./train_mac.shFor full-scale training on a SLURM cluster, use train_gpu.sh. This script submits a job to the cluster and uses accelerate for distributed training.
chmod +x train_gpu.sh
sbatch train_gpu.shYou can customize the run by editing the script or by setting environment variables. For example, to log the run to Weights & Biases, set the WANDB_RUN variable:
WANDB_RUN=my-awesome-experiment sbatch train_gpu.shThe model type (e.g., stt, sdt, mod) and other parameters can be modified directly within the train_gpu.sh script.