Giter Site home page Giter Site logo

adityang / kan-gpt Goto Github PK

View Code? Open in Web Editor NEW
663.0 9.0 51.0 3.12 MB

The PyTorch implementation of Generative Pre-trained Transformers (GPTs) using Kolmogorov-Arnold Networks (KANs) for language modeling

Home Page: https://adityang.github.io/kan-gpt/

License: MIT License

Dockerfile 0.03% Makefile 1.76% Python 92.60% Shell 0.21% Jupyter Notebook 5.40%
gpt kanformers kolmogorov-arnold-networks kolmogorov-arnold-representation llm text-generation transformers

kan-gpt's Introduction

KAN-GPT

PyPI - Downloads PyPI - Version codecov CI GitHub License

The PyTorch implementation of Generative Pre-trained Transformers (GPTs) using Kolmogorov-Arnold Networks (KANs) for language modeling

Install it from PyPI

pip install kan_gpt

Citation

If you find our work useful cite us!

@misc{GANESH2024KANGPT,
  author       = {Aditya Nalgunda Ganesh},
  title        = {KAN-GPT: The PyTorch implementation of Generative Pre-trained Transformers (GPTs) using Kolmogorov-Arnold Networks (KANs) for language modeling},
  year         = {2024},
  month        = {May},
  note         = {Release 1.0.0, 9th May 2024},
  url          = {https://github.com/AdityaNG/kan-gpt/}
}

Usage

Refer to the KAN_GPT.ipynb and kan_gpt/prompt.py for usage examples. The following is an outline of how to use the model:

from kan_gpt.model import GPT
from transformers import GPT2Tokenizer

model_config = GPT.get_default_config()
model_config.model_type = "gpt2"
model_config.vocab_size = 50257
model_config.block_size = 1024
model = GPT(model_config)

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

prompt = "Bangalore is often described as the "

prompt_encoded = tokenizer.encode(
  text=prompt, add_special_tokens=False
)

x = torch.tensor(prompt_encoded).unsqueeze(0)

model.eval()
y = model.generate(x, 50)  # sample 50 tokens

result = tokenizer.decode(y[0])

print(result)

# Bangalore is often described as the Silicon Valley of India.
# The city has witnessed rapid growth in the past two decades.....

Setup for Development

# Download Repo
git clone https://github.com/AdityaNG/kan-gpt
cd kan-gpt
git pull

# Download Dataset
./scripts/download_webtext.sh
./scripts/download_tinyshakespeare.sh

# Install dependencies for development
pip install -r requirements.txt
pip install -e .

Train

Use the following dummy script to make sure everything is working as expected

WANDB_MODE=offline CUDA_VISIBLE_DEVICE="" python3 -m kan_gpt.train --architecture MLP --batch_size 1 --dummy_dataset --device cpu --max_iters 200
WANDB_MODE=offline CUDA_VISIBLE_DEVICE="" python3 -m kan_gpt.train --architecture KAN --batch_size 1 --dummy_dataset --device cpu --max_iters 200

Then make use of the training script

python -m kan_gpt.train

Prompt

You can prompt the model to produce text as follows

python -m kan_gpt.prompt --prompt "Bangalore is often described as the " --model_path (checkpoint)

Results

We train and compare KAN-GPT with an equivalent MLP-GPT model on the Tiny Shakespeare dataset. We observe that the KAN-GPT performs slightly better than the MLP-GPT. We are looking into further experiments to dive deeper. The results are shown below:

Metrics
results_loss results_cross_entropy results_perplexity

TODOs

  • Integrate minGPT and pykan
  • Dataset downloading script for WebText
  • PyTorch Dataset parser for WebText
  • PyTorch Dataset parser for tinyshakespeare
  • Mini training POC for KAN-GPT
    • Integrate KAN training logic from KAN.train_kan
    • Train a dummy batch w/o any memory issues
  • Mini training POC for MLP-GPT
  • Train MLP-GPT on the webtext dataset as a baseline
  • Train KAN-GPT on the webtext dataset as a baseline
  • Metrics comparing KAN-GPT and MLP-GPT
  • Auto Save checkpoints
  • Auto Save checkpoints to W&B
  • Auto Download model weights from git / huggingface
  • W&B hyperparam sweep script
  • Script to load checkpoint in interactive mode
  • Reduce requrements.txt constraints
  • Define pydantic model for training and sweep args
  • Pruning the package, get rid of unused code
  • Training script to PyTorch Lighting
  • Documentation: mkdocs gh-deploy
  • Integrate with efficient-kan
  • Test Cases
    • KAN: Forward-Backward test
    • GPT: Forward-Backward test
    • KAN_GPT: Forward-Backward test
    • EFFICIENT_KAN: Forward-Backward test

Development

Read the CONTRIBUTING.md file.

References

kan-gpt's People

Contributors

adityang avatar dependabot[bot] avatar eltociear avatar gyunggyung avatar themattbin avatar wektorz avatar yumemio avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kan-gpt's Issues

CUDA out of memory

class KanMLP(nn.Module):
    """Some Information about KanLinear"""
    def __init__(self,
              in_features=1152,
              hidden_features = None,
              out_features = None,
               drop=0.
              ):
        super().__init__()
        
        approx_gelu = lambda: nn.GELU(approximate="tanh")
        
        out_features = out_features or in_features
        hidden_features = hidden_features or in_features
        self.mlp = nn.ModuleDict(
            dict(
                c_fc=KAN(width=[in_features, hidden_features]),
                c_proj=KAN(width=[hidden_features, out_features]),
                act=NewGELU(),
                dropout=nn.Dropout(0.0),
            )
        )
        m = self.mlp
        self.mlpf = lambda x: m.dropout(
            m.c_proj(m.act(m.c_fc(x)))
        )  # MLP forward



        
    def forward(self, x):
        x = self.mlpf(x)
        return x

net = KanMLP(1152,1152*4).to("cuda")
x = torch.rand(size=(4,4096*4,1152)).to("cuda")
nex(x)

When the number of tokens reaches a certain size, the following situation will occur

 CUDA out of memory.

Import module error

After installing the kan_gpt using pip install kan_gpt on colab.
I got this error when I try to import the library and the model. I copied it from the GitHub readme.

Screenshot_20240508-120050.png

Where is PyPI

Describe the bug

Where is the PyPI website. I checked the internet and still don't see it. The pypi badge appeared in the README file also does not redirect me to the right page

Train scripts fail beacause of missing tinyshakespeare dataset

Problem is in .ipynb file

# Download Repo
%cd /content
!git clone https://github.com/AdityaNG/kan-gpt
%cd kan-gpt
!git pull
# Download Dataset
!./scripts/download_webtext.sh
# Install dependencies
!pip install -r requirements.txt
!pip install -e .

fix add !./scripts/download_tinyshakespeare.sh as below

# Download Repo
%cd /content
!git clone https://github.com/AdityaNG/kan-gpt
%cd kan-gpt
!git pull
# Download Dataset
!./scripts/download_webtext.sh
!./scripts/download_tinyshakespeare.sh
# Install dependencies
!pip install -r requirements.txt
!pip install -e .

have a nice day

Increase Test Coverage

Is your feature request related to a problem? Please describe.

codecov
The current code test coverage is at around 60%, it would be good to have test cases to cover at least 80% of the code to ensure there are minimal regressions.

Describe the solution you'd like
We can see which files and which lines are covered by the existing test cases (which is defined in tests/) by clicking on the code coverage badge on the README or following this link:
https://codecov.io/gh/AdityaNG/kan-gpt

These coverage reports can also be generated locally using:

make test

Additional context
The goal is to add more test cases for the following folders. You may use the original repo's test cases as references.

This is a great starter issue for anyone interested in sinking their teeth deep into the repo :)

relax scikit-learn version

I am attempting to use kan-gpt in a project that requries scikit-learn>=1.2.2. However, kan-gpt pins an exact version of scikit-learn in the requirements.txt file (scikit_learn==1.1.3). If possible, could this be relaxed to scikit-learn>=1.1.13?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.