Giter Site home page Giter Site logo

nn-decoding's Introduction

Neural network brain decoding

License: MIT

This repository contains analysis code for the paper:

Linking human and artificial neural representations of language.
Jon Gauthier and Roger P. Levy.
2019 Conference on Empirical Methods in Natural Language Processing.

This repository is open-source under the MIT License. If you would like to reuse our code or otherwise extend our work, please cite our paper:

 @inproceedings{gauthier2019linking,
   title={Linking human and artificial neural representations of language},
   author={Gauthier, Jon and Levy, Roger P.},
   booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing},
   year={2019}
 }

About the codebase

We structure our data analysis pipeline, from model fine-tuning to representation analysis, using Nextflow. Our entire data analysis pipeline is specified in the file main.nf.

Visualizations and statistical tests are done in Jupyter notebooks stored in the notebooks directory.

Running the code

Hardware requirements

  • ~2 TB disk space (for storing brain images, model checkpoints, etc.)
  • 8 GB RAM or more
  • 1 GPU with > 4 GB RAM (for fine-tuning BERT models)

We strongly suggest running this pipeline on a distributed computing cluster to save time. The full pipeline completes in several days on an MIT high-performance computing cluster.

If you don't have a GPU or this much disk space to spare but still wish to run the pipeline, please ping me and we can make special resource-saving arrangements.

Software requirements

There are only two software requirements:

  1. Nextflow is used to manage the data processing pipeline. Installing Nextflow is as simple as running the following command:

    wget -qO- https://get.nextflow.io | bash

    This installation script will put a binary nextflow in your working directory. The later commands in this README assume that this binary is on your PATH.

  2. Singularity retrieves and runs the software containers necessary for the pipeline. It is likely already available on your computing cluster. If not, please see the Singularity installation instructions.

The pipeline is otherwise fully automated, so all other dependencies (data, BERT, etc.) will be automatically retrieved.

Starting the pipeline

Check out the repository by downloading the emnlp2019-final tag and run the following command in the root directory:

nextflow run main.nf

Configuring the pipeline

For technical configuration (e.g. customizing how this pipeline will be deployed on a cluster), see the file nextflow.config. The pipeline is configured by default to run locally, but can be easily farmed out across a computing cluster.

A configuration for the SLURM framework is given in nextflow.slurm.config. If your cluster uses a framework other than SLURM, adapting to it may be as simple as changing a few settings in that file. See the Nextflow documentation on cluster computing for more information.

For model configuration (e.g. customizing hyperparameters), see the header of the main pipeline in main.nf. Each parameter, written as params.X, can be overwritten with a command line flag of the same name. For example, if we wanted to run the whole pipeline with BERT models trained for 500 steps rather than 250 steps, we could simply execute

nextflow run main.nf --finetune_steps 500

Analysis and visualization

The notebooks directory contains Jupyter notebooks for producing the visualizations and statistical analyses in the paper (and much more):

After the Nextflow pipeline completes, you can load and run these notebooks by beginning a Jupyter notebook session in the same directory as where you began the pipeline. The notebooks require Tensorflow and general Python data science tools to function. I recommend using my tensorflow Singularity image as follows:

singularity run library://jon/default/tensorflow:1.12.0-cpu jupyter lab

nn-decoding's People

Contributors

hans avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nn-decoding's Issues

about the learning parameters

Hi,

Can you please provide the learn_decoder.py parameters, I tried few times but fail to run the learn_decoder.py

Thank you

Add custom LM tasks to nextflow pipeline

Possibly also the data generation routines themselves. Why not?

Unfortunately the Books Corpus data is not publicly available, but we can at least provide this as an optional feature, given that the user provides their own input language modeling corpus.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.