Giter Site home page Giter Site logo

lihuibng / spikegpt Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ridgerchu/spikegpt

0.0 1.0 0.0 1.28 MB

Implementation of "SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks"

License: BSD 2-Clause "Simplified" License

C++ 0.13% Python 50.72% Cuda 49.15%

spikegpt's Introduction

SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks

SpikeGPT is a lightweight generative language model with pure binary, event-driven spiking activation units. The arxiv paper of SpikeGPT can be found here.

If you are interested in SpikeGPT, feel free to join our Discord using this link!

This repo is inspired by the RWKV-LM.

If you find yourself struggling with environment configuration, consider using the Docker image for SpikeGPT available on Github.

Training on Enwik8

  1. Download the enwik8 dataset by visiting the following link: enwik8 dataset.

  2. Modify the train set, validate set, and test set paths in the train.py script to match the directory where you've extracted the files. For example, if you've extracted the files to a directory named enwik8_data, your train.py script should be updated as follows:

    # Set the paths for the datasets
    datafile_train = "path/to/enwik8_data/train"
    datafile_valid = "path/to/enwik8_data/validate"
    datafile_test = "path/to/enwik8_data/test"
    

Pre-training on large corpus

  1. Pre-Training on a Large Corpus:

    • To begin, pre-tokenize your corpus data.
    • For custom data, use the jsonl2binidx tool to convert your data.
    • If you prefer pre-tokenized data, consider using pre-tokenized The Pile, which is equipped with a 20B tokenizer and is used in GPT-NeoX and Pythia.
    • If resources are limited, you may use just one file from the dataset instead of the entire collection.
  2. Configuring the Training Script:

    • In train.py, uncomment line 82 to enable MMapIndexedDataset as the dataset class.
    • Change datafile_train to the filename of your binidx file.
    • Important: Do not include the .bin or .idx file extensions.
  3. Starting Multi-GPU Training:

    • Utilize Hugging Face's Accelerate to begin training on multiple GPUs.

Fine-Tuning on WikiText-103

  1. Downloading Pre-Tokenized WikiText-103:

  2. Fine-Tuning the Model:

    • Use the same approach as in pre-training for fine-tuning your model with this dataset.
    • Important: Set a smaller learning rate than during the pre-training stage to avoid catastrophic forgetting. A recommended learning rate is around 3e-6.
    • For the batch size, it's advisable to adjust according to your specific requirements to find an optimal setting for your case.

Inference with Prompt

You can choose to run inference with either your own customized model or with our pre-trained model. Our pre-trained model is available here. This model trained 5B tokens on OpenWebText.

  1. download our pre-trained model, and put it in the root directory of this repo.
  2. Modify the 'context' variable in run.py to your custom prompt
  3. Run run.py

Fine-Tune with NLU tasks

  1. run the file in 'NLU' folders
  2. change the path in line 17 to the model path

Citation

If you find SpikeGPT useful in your work, please cite the following source:

@article{zhu2023spikegpt,
        title = {SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks},
        author = {Zhu, Rui-Jie and Zhao, Qihang and Li, Guoqi and Eshraghian, Jason K.},
        journal = {arXiv preprint arXiv:2302.13939},
        year    = {2023}
}

spikegpt's People

Contributors

ridgerchu avatar jeshraghian avatar eddiem3 avatar eltociear avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.