Giter Site home page Giter Site logo

hosikchoi / h3 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from hazyresearch/h3

0.0 0.0 0.0 7.16 MB

Language Modeling with the H3 State Space Model

License: Apache License 2.0

Shell 0.01% JavaScript 0.11% C++ 1.86% Python 0.37% C 0.02% PHP 0.28% Assembly 88.08% CSS 0.05% Pawn 4.93% Cuda 0.80% Makefile 0.01% HTML 2.32% CMake 0.12% POV-Ray SDL 1.04% Dockerfile 0.02%

h3's Introduction

Hungry Hungry Hippos (H3)

This repository provides the official implementation of H3 from the following paper.

Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Daniel Y. Fu*, Tri Dao*, Khaled K. Saab, Armin W. Thomas, Atri Rudra, Christopher Ré
International Conference on Learning Representations, 2023. Notable top-25% (spotlight).
Paper: https://arxiv.org/abs/2212.14052

H3

Code & model release

You can find model weights on the Hugging Face Hub here (under "Files and Versions" for each model):

Loading weights and running inference

Examples of how to load the weights and run inference are given in benchmarks/benchmark_generation.py and examples/generate_text_h3.py.

Here's an example of how to download and run our 125M model (you may need to install FlashAttention):

git lfs install
git clone https://huggingface.co/danfu09/H3-125M

git clone https://github.com/HazyResearch/H3.git

PYTHONPATH=$(pwd)/H3 python H3/examples/generate_text_h3.py --ckpt H3-125M/model.pt --prompt "Hungry Hungry Hippos: Towards Language Modeling With State Space Models is a new language model that" --dmodel 768 --nlayer 12 --attn-layer-idx 6 --nheads=12

You should get an output like this (may change due to sampling in the text generation):

Hungry Hungry Hippos: Towards Language Modeling With State Space Models is a new language model that uses state-space models to create a human-like vocabulary that can help improve human understanding and judgment of language. It takes a human's past experience of language, and tries to capture their cognitive patterns. State Space Models helps the researchers make sense of language in its own terms, which helps users learn about their language of choice. State Space Models is used to develop a set of languages for researchers in an effort to help them develop more intelligent language models. The goal is to increase and develop a human-like language model using state space models. It is hoped that it will aid people to do more work to develop a language that is more

Here's the summary of model sizes for each model:

Model dmodel nlayer nheads
125M 768 12 12
355M 1024 24 16
1.3B 2048 24 16
2.7B 2560 32 20

See examples/README.md for examples about how to load all these models and run them!

Acknowledgments

Some of the files related to S4D and HiPPO initialization are adapted from the https://github.com/HazyResearch/state-spaces.

Citation

If you use this codebase, or otherwise found our work valuable, please cite:

@inproceedings{fu2023hungry,
  title={Hungry {H}ungry {H}ippos: Towards Language Modeling with State Space Models},
  author={Fu, Daniel Y. and Dao, Tri and Saab, Khaled K. and Thomas, Armin W.
  and Rudra, Atri and R{\'e}, Christopher},
  booktitle={International Conference on Learning Representations},
  year={2023}
}

h3's People

Contributors

danfu09 avatar tridao avatar eltociear avatar kashif avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.