Skip to content

neural-processing-lab/unlocking-b2t

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Unlocking Non-Invasive Brain-to-Text

This repository contains the code for the paper "Unlocking Non-Invasive Brain-to-Text". Find the preprint on ArXiv here. If you find this code helpful in your work, please cite the paper:

@article{jayalath2024unlocking,
  title={{Unlocking Non-Invasive Brain-to-Text}},
  author={Jayalath, Dulhan and Landau, Gilad and Parker Jones, Oiwi},
  journal={arXiv preprint arXiv:2505.13446},
  year={2025}
}

Quick start

  1. Install requirements with pip install -r requirements.txt.
  2. Download all or some of the LibriBrain, Armeni, Gwilliams, and Broderick (Natural Speech) datasets.
  3. Modify the paths in data/dataset_configs.yaml to point root to your dataset's BIDS root directory, and change cache to where you would like preprocessed data to be kept.
  4. Make sure you have a weights and biases account and are signed in on your console.

Note: although we use sensor position data for LibriBrain in our experiments, this is not currently publicly released. To train with LibriBrain without this data, you must provide --har_type gating otherwise training will silently fail. This uses dataset-conditional linear convolutions rather than a spatial attention module. Performance should be similar.

Training a model

All results during training and evaluation will be logged to a project called word-to-sent in your weights and biases account.

Train a word predictor

Single dataset: python train.py --vocab_size 250 --dset libribrain

Joint training: python train.py --vocab_size 250 --dset gwilliams2022 --aux_dsets armeni2022 libribrain

Train a word predictor and evaluate with rescoring

python train.py --vocab_size 250 --dset libribrain --post_proc --no_llm_api

Train a word predictor and evaluate with both rescoring and in-context methods

If you want to use in-context LLM API methods, please register on the Anthropic console and set the environment variable ANTHROPIC_API_KEY with your API key, ensuring that you have sufficient credits. We estimate that you will need around $15-$20 to evaluate a dataset in full (less if you use --limit_eval_samples).

python train.py --vocab_size 250 --dset libribrain --post_proc

Evaluate with random noise inputs to trained model

python train.py --vocab_size 250 --dset libribrain --test_ckpt /path/to/trained/ckpt --random_noise_inputs

Usage tips

  • Use --limit_eval_samples <int> to reduce the number of sentences you evaluate with to save API costs.
  • Use a trained word predictor for evaluation by supplying ---test_ckpt /path/to/ckpt (checkpoints are saved to ./checkpoints/* automatically).
  • Use --name <run-name> to change the name of the run logged to weights and biases.
  • Use --predict_oov if you want to train and use an OOV position predictor during evaluation.

About

Code for the preprint "Unlocking Non-Invasive Brain-to-Text" on ArXiv.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages