MoLiNER for motion analysis, language-based segmentation and retrieval.
A more detailed documentation is available in the docs directory.
To get started with MoLiNER, you need to set up the environment and install the required dependencies.
1. Clone the repository:
git clone https://github.com/raideno/MoLiNER.git
cd MoLiNER2. Setup Python Environment:
python -m venv .venv
# On macOS and Linux
source .venv/bin/activate
# On Windows
.venv\Scripts\activate
# NOTE: installing the dependencies
pip install -r requirements.txt3. Hugging Face Authentication (IMPORTANT):
Rename the .env.example to .env and replace the xxx value with the appropriate values.
4. TMR Pretrained Weights:
bash scripts/download-tmr-pretrained-models.shThis project supports multiple datasets: Babel, HumanML3D, and KIT-ML (upcoming). All data downloading, preprocessing, etc is done automatically. The only required thing is to specify the HUGGING_FACE_TOKEN inside of the .env as instructed in the previous step, this is required as the raw data is hosted on hugging face in a private repository.
All data related code is available inside of src/data and the associated configuration inside of configs/data.
A set of predefined data pipelines with filtering are already available and listed in # Data Pipelines. It is also possible to create your custom data pipeline with your custom filtering, instructions are available at docs/create-data-pipeline.md.
To train a new model, use the train-model.py script.
- Create the Model:
Duplicate the configs/model/moliner.base.yaml file and name it as you wish.
- Specify the Modules:
Replace all the ??? in the .yaml file with one of the possible values for each module.
- Start the Training:
HYDRA_FULL_ERROR=1 TOKENIZERS_PARALLELISM=false python train-model.py \
model=<MODEL_NAME> \
data=<DATA_PIPELINE_NAME> \
trainer.accelerator=cuda \
+trainer.devices=[0]NOTEs:
- For more control on the trainer, you can change the configs/trainer.yaml.
<MODEL_NAME>should be set the the name of the file you just created without the.yamlextension.<DATA_PIPELINE_NAME>possible values can be found atconfigs/dataand are also listed in #data-pipelines section.
Data Pipelines
| Data Variants | Description |
|---|---|
babel/base |
Babel dataset for motion-language segmentation. |
babel/separate |
Frame and sequence annotations are put in different samples. |
babel/20/base |
Babel dataset with sequence-level annotations. |
babel/20/standardized/chunking/16 |
Babel dataset with chunk-based annotations. 16 Frames per span. |
babel/20/standardized/windowing/16 |
Babel dataset with window-based annotations. 16 Frames per span. |
hml3d/base |
HumanML3D dataset for 3D motion-language tasks. |
mixed/base |
A mix of HML3D and Babel dataset. |
RUN_DIR: Once training started, a directory inside the out directory will be created, model weights, logs, etc will be stored there, this directory will be referred to as run_dir in the rest of the documentation.
Configurations
This project uses Hydra for configuration management. This allows for a flexible and composable way to configure experiments.
The main configuration files are located in the configs/ directory.
defaults.yaml: Contains global default settings.train-model.yaml,test-model.yaml, etc.: Main configuration files for different scripts.configs/model/moliner.yaml: Configuration for the model architecture (e.g., encoders, decoders).configs/data/: Configuration for datasets.configs/trainer.yaml: Configuration for the PyTorch Lightning trainer.
You can override any configuration setting from the command line. For example:
python train-model.py data=<data-name> model=<model-name> trainer.max_epochs=100This command will train the specified model on the specified dataset for 100 epochs.
Refers to Hydra's Documentation for more infos about how to use Hydra.
Before running evaluation or inference, you might want to extract the model weights from the PyTorch Lightning checkpoint for easier loading.
python extract.py run_dir=<path_to_run_dir>This will save the model weights in the run directory.
To evaluate the model's performance on in-motion retrieval tasks:
HYDRA_FULL_ERROR=1 python evaluate.mlp.py \
run_dir=<path_to_run_dir> \
device=cuda:1 \
score=0.5This supports HumanML3D, Babel sequence-level, and Babel frame-level datasets.
To evaluate the model on segmentation tasks with the Babel frame-level dataset:
HYDRA_FULL_ERROR=1 python evaluate.locate.py \
run_dir=<path_to_run_dir> \
device=cuda:1 \
score=0.5Upcoming...
