Skip to content

OHBA-analysis/Cho2026_Tokenizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cho2026_Tokenizer

🔗 A Systematic Evaluation of Sample-Level Tokenization Strategies for MEG Foundation Models (Preprint DOI: 10.1101/2025.09.25.678554)

💡 Please email SungJun Cho at sungjun.cho@ndcn.ox.ac.uk or simply open a GitHub issue if you have any questions or concerns.

⚡️ Getting Started

This repository contains all the scripts necessary to reproduce the analyses and figures presented in the manuscript. It is divided into three main directories.

Directory Description
scripts Scripts for training the tokenizer and MEG-GPT models and conducting subsequent analyses.
supplementary Scripts for additional data inspection, post-hoc analysis, and visualization.
testing Scripts for debugging and testing.

For detailed descriptions of the scripts in each directory, please consult the README file located within each respective folder.

In addition, the models directory contains configuration files specifying the hyperparameters used for all models trained in this work. Corresponding tables summarizing these hyperparameters are provided within the same directory.

🎯 Requirements

This repository builds on the osl-foundation software package, which provides the MEG-GPT model and its associated tokenizers. To start, please install osl-foundation and set up its environment by following the installation guide here.

The scripts used in this paper rely on the following dependencies:

python==3.10.4
tensorflow==2.11.0
tensorflow-probability==0.19.0
osl-dynamics==2.1.8
osl-foundation==0.0.1

Once these steps are complete, you can clone or download this repository to your preferred directory, and you're ready to begin!

PyTorch version

For a PyTorch implementation of the tokenisers, please refer to the EphysTokenizer repository.

Notes on hardware

All scripts in this repository were executed on the Oxford Biomedical Research Computing (BMRC) servers. Our experiments were run using two NVIDIA GPUs (V100 or A100) with CUDA 11.7.0.

🪪 License

Copyright (c) 2026 SungJun Cho and OHBA Analysis Group. Cho2026_Tokenizer is a free and open-source software licensed under the MIT License.

Releases

No releases published

Packages

 
 
 

Contributors