Giter Site home page Giter Site logo

vamoko / transformerhub Goto Github PK

View Code? Open in Web Editor NEW

This project forked from bubblejoe-brownu/transformerhub

0.0 0.0 0.0 593 KB

This is a repository of transformer-like models, including Transformer, GPT, BERT, ViT and much more to be implemented along my journey into the fascinating deep learning field.

Shell 0.22% Python 86.04% Jupyter Notebook 13.74%

transformerhub's Introduction

TransformerHub

This repository aims to implement different forms of transformer model, including seq2seq (the original architecture in All You Need is Attention paper), encoder-only, decoder-only, and unified transformer models.

These models are not meant to be the states of the arts on any tasks. Instead, they come with the purpose of training myself with advanced programming skills and also provide references to people who share the love of deep learning and machine intelligence.

This work is inspired by, and would not be possible without the open-source repos of NanoGPT, ViT, MAE, CLIP, and OpenCLIP. A huge thanks to them for open-sourcing their models!

This repository also maintains a paperlist of recent progresses in transformer models.

Features

This repository features a list of designs:

  • Transformer Architectures:
    • Encoder-only
    • Decoder-only
    • Encoder-Decoder
    • Unified (In Progress)
  • Attention Modules:
    • Unmasked Attention (Transformer, BERT)
    • Causal masked Attention (Transformer, GPT)
    • Prefix Causal Attention (T5)
    • Sliding-Window Attention (Mistral)
  • Position Embedding:
    • Fixed Position Embedding (Transformer)
    • Learnable Position Embedding (Transformer, BERT)
    • Rotary Position Embedding (Roformer)
    • Extrapolable Position Embedding (Length-Extrapolatable Transformer)

Current Progress

Currently working on implementing DINO, a variant of ViT trained in a self-supervised manner

Model Implemented Trained Evaluated
Transformer No No
GPT No No
BERT Yes No
ViT No No
MAE No No No
CLIP No No No

DISCLAIMER: Because of the popularity and versatility of Transformers, there will be a lot of course assignments related to implementing part of or the entire Transformer models. This repository was developed purely for self-training purpose, and could well serve as a reference for implementing a Transformer model. But directly copying from this repo is strictly prohibited and is a violation of code of conduct for most academic institutes.

For those who need a refreshment of what the Transformer is or what the detailed architecture of Transformer looks like, please refer to a well-illustrated blog: http://nlp.seas.harvard.edu/annotated-transformer/#background

Here is a poem generated by LLaMA2, an open-source LLm released by Meta AI: Attention is all you need,
To understand what's said and read.
Transformers learn relations,
Through multi-head attentions.
Encoder, decoder architecture,
Learns features for good imagery.
Training on large datasets,
Its performance quickness gets.
Built on top of sequence to sequence,
Its parallel computing saves time to flex.
Understanding language, text and voice,
With deep learning that gave it its poise.
Task agnostic, wide usability,
Driving progress in AI agility.
Pushing NLP to new heights,
Transformers show their might.

transformerhub's People

Contributors

bubblejoe-brownu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.