Giter Site home page Giter Site logo

tinygpt's Introduction

TinyGpt

This repository contains an implementation of TinyGPT, a small-scale GPT model created from scratch.

Key Points :

Architecture: TinyGPT is built upon the transformer architecture, which is widely used in natural language processing tasks. It consists of a stack of identical decoder blocks, each containing self-attention mechanisms and feed-forward neural networks.

Self-Attention: The self-attention mechanism allows the model to weigh the importance of different words in the input sequence when generating the output. It enables the model to capture contextual dependencies effectively.

Decoder Blocks: TinyGPT comprises multiple decoder blocks stacked on top of each other. Each decoder block consists of a self-attention layer, a feed-forward neural network, and layer normalization. The self-attention layer attends to the previous positions in the sequence, capturing the interdependencies between different tokens.

Attention Dropout: To improve the generalization capability of the model, attention dropout is applied during training. This technique randomly drops out a certain percentage of attention weights, encouraging the model to learn more robust representations.

Embedding Dropout: Embedding dropout is used to regularize the model during training. It randomly sets a certain percentage of the token embeddings to zero, preventing overfitting and enhancing the model's ability to generalize.

figure

Configuration

Setting Value
attn_dropout 0.1
embed_dropout 0.1
ff_dropout 0.1
vocab_size 100
max_len 20
num_heads 12
embed_dim 768
num_decoder_blocks 12

tinygpt's People

Contributors

krish2002 avatar

Stargazers

 avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.