Giter Site home page Giter Site logo

yasho191 / llm Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 1.57 MB

This repository contains the implementation of encoder and decoder modules of the Transformer architecture. The encoder with classifier is utilized for classification tasks, while the decoder is used for language modeling.

License: Apache License 2.0

Jupyter Notebook 95.40% Python 4.60%

llm's Introduction

LLM Basics

This repository contains the implementation of encoder and decoder modules of the Transformer architecture. The encoder with classifier is utilized for classification tasks, while the decoder is used for language modeling.

Code Structure

  • task_report: This folder contains the problem statement for the project and the final analysis report.
  • Experiments.ipynb: This Jupyter notebook contains the results and analyses of all three parts of the project.
  • llm: This directory contains all internal code for implementing the Transformer, including utilities, tokenizers, etc.
  • results: Contains result files generated during experimentation.

llm

This folder contain the code files for all the implemented transformer architectures including the normal implementation and the AliBi implementation for Part 3. the AliBi Implementation Can be found in the file - transformer_exploration.py.

  • Minor changes were made to the utilities.py file to accomodate the output of the transformer models.
  • Major Changes to made tot the main.py file to include all 3 parts of the project.
  • Minor Changes were made to the tokenizer.py file.
  • No Chnages were made to the dataset.py file.

Running the Code

To run the code, navigate to the project directory and execute the following command:

cd LLM
python3 llm/main.py task_type

Replace task_type with one of the following:

  • part1: Task 1 - Classifier training (Part A)
  • part2: Task 2 - Language Model training
  • part3a: Task 3 - Architecture Exploration (AliBi Implementation)
  • part3b: Task 3 - Performance Improvement (Encoder + Classifier) Exploration
  • part3b_best: Task 3 - Performance Improvement - Best Model (Encoder + Classifier)

If you do not wish to run the code to check all the parts you can checkout the Experiments.ipynb notebook which contains the results and visualization of all the results obtained in each part.

References

  1. Vaswani, A., et al. (2017). "Attention is All You Need." arXiv:1706.03762.
  2. Beltagy, I., et al. (2020). "Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation." arXiv:2009.11304.

Additional Resources

For a detailed explanation of the Transformer architecture, refer to Andrej Karpathy's YouTube video.

Stanford CS224N: NLP with Deep Learning. YouTube video.

llm's People

Contributors

yasho191 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.