adityang / kan-gpt Goto Github PK

The PyTorch implementation of Generative Pre-trained Transformers (GPTs) using Kolmogorov-Arnold Networks (KANs) for language modeling

Home Page: https://adityang.github.io/kan-gpt/

License: MIT License

Dockerfile 0.03% Makefile 1.76% Python 92.60% Shell 0.21% Jupyter Notebook 5.40%

gpt kanformers kolmogorov-arnold-networks kolmogorov-arnold-representation llm text-generation transformers

kan-gpt's Introduction

KAN-GPT

The PyTorch implementation of Generative Pre-trained Transformers (GPTs) using Kolmogorov-Arnold Networks (KANs) for language modeling

Install it from PyPI

pip install kan_gpt

Citation

If you find our work useful cite us!

@misc{GANESH2024KANGPT,
  author       = {Aditya Nalgunda Ganesh},
  title        = {KAN-GPT: The PyTorch implementation of Generative Pre-trained Transformers (GPTs) using Kolmogorov-Arnold Networks (KANs) for language modeling},
  year         = {2024},
  month        = {May},
  note         = {Release 1.0.0, 9th May 2024},
  url          = {https://github.com/AdityaNG/kan-gpt/}
}

Usage

Refer to the KAN_GPT.ipynb and kan_gpt/prompt.py for usage examples. The following is an outline of how to use the model:

from kan_gpt.model import GPT
from transformers import GPT2Tokenizer

model_config = GPT.get_default_config()
model_config.model_type = "gpt2"
model_config.vocab_size = 50257
model_config.block_size = 1024
model = GPT(model_config)

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

prompt = "Bangalore is often described as the "

prompt_encoded = tokenizer.encode(
  text=prompt, add_special_tokens=False
)

x = torch.tensor(prompt_encoded).unsqueeze(0)

model.eval()
y = model.generate(x, 50)  # sample 50 tokens

result = tokenizer.decode(y[0])

print(result)

# Bangalore is often described as the Silicon Valley of India.
# The city has witnessed rapid growth in the past two decades.....

Setup for Development

# Download Repo
git clone https://github.com/AdityaNG/kan-gpt
cd kan-gpt
git pull

# Download Dataset
./scripts/download_webtext.sh
./scripts/download_tinyshakespeare.sh

# Install dependencies for development
pip install -r requirements.txt
pip install -e .

Train

Use the following dummy script to make sure everything is working as expected

WANDB_MODE=offline CUDA_VISIBLE_DEVICE="" python3 -m kan_gpt.train --architecture MLP --batch_size 1 --dummy_dataset --device cpu --max_iters 200
WANDB_MODE=offline CUDA_VISIBLE_DEVICE="" python3 -m kan_gpt.train --architecture KAN --batch_size 1 --dummy_dataset --device cpu --max_iters 200

Then make use of the training script

python -m kan_gpt.train

Prompt

You can prompt the model to produce text as follows

python -m kan_gpt.prompt --prompt "Bangalore is often described as the " --model_path (checkpoint)

Results

We train and compare KAN-GPT with an equivalent MLP-GPT model on the Tiny Shakespeare dataset. We observe that the KAN-GPT performs slightly better than the MLP-GPT. We are looking into further experiments to dive deeper. The results are shown below:

Metrics

TODOs

Development

Read the CONTRIBUTING.md file.

References

kan-gpt's People

Contributors

Stargazers

Watchers

Forkers

valeman biocheming zeroxclem tfius codeaudit henosis-us darvinrivera utopic-dev athy125 dhavala sendlerlee peerachetporkaew xenophobed mrm8488 thomascherickal dneralla wektorz kennykangmpc rusttm themattbin eltociear lyhiving twogapme yacineali74 anilkumar115 techthiyanes callmejacksong gino2013 hhy5277 ailabteam lysomuch phelixh degerli ridwaanhall alpatiev yiliu9090 hao707822882 dhilip2002 codeisnotcode shengyupei seanjinnn victorborda hssn-20 qinghao-guan selfnature gyunggyung aminravanbakhsh lycsqq yumemio cha-imaa alendaniel

kan-gpt's Issues

Increase block size

Can the model block size be increased from 1024 to 3000 atleast?

Hugging Face Transformers

Hi, is there a way to use this in HF Transformers? Thanks!

CUDA out of memory

class KanMLP(nn.Module):
    """Some Information about KanLinear"""
    def __init__(self,
              in_features=1152,
              hidden_features = None,
              out_features = None,
               drop=0.
              ):
        super().__init__()
        
        approx_gelu = lambda: nn.GELU(approximate="tanh")
        
        out_features = out_features or in_features
        hidden_features = hidden_features or in_features
        self.mlp = nn.ModuleDict(
            dict(
                c_fc=KAN(width=[in_features, hidden_features]),
                c_proj=KAN(width=[hidden_features, out_features]),
                act=NewGELU(),
                dropout=nn.Dropout(0.0),
            )
        )
        m = self.mlp
        self.mlpf = lambda x: m.dropout(
            m.c_proj(m.act(m.c_fc(x)))
        )  # MLP forward



        
    def forward(self, x):
        x = self.mlpf(x)
        return x

net = KanMLP(1152,1152*4).to("cuda")
x = torch.rand(size=(4,4096*4,1152)).to("cuda")
nex(x)

When the number of tokens reaches a certain size, the following situation will occur

 CUDA out of memory.

kan-llama

Can you make llamas or several models like below?

https://twitter.com/thomasahle/status/1788358225441787994?t=7NpAmaS8RYH6GMxrspmlyA&s=19

I don't have any implementation materials yet, but I think it would be easy for you to make it. I'm looking forward to it.

Thank you!

Import module error

After installing the kan_gpt using pip install kan_gpt on colab.
I got this error when I try to import the library and the model. I copied it from the GitHub readme.

Where is PyPI

Describe the bug

Where is the PyPI website. I checked the internet and still don't see it. The pypi badge appeared in the README file also does not redirect me to the right page

Train scripts fail beacause of missing tinyshakespeare dataset

Problem is in .ipynb file

# Download Repo
%cd /content
!git clone https://github.com/AdityaNG/kan-gpt
%cd kan-gpt
!git pull
# Download Dataset
!./scripts/download_webtext.sh
# Install dependencies
!pip install -r requirements.txt
!pip install -e .

fix add !./scripts/download_tinyshakespeare.sh as below

# Download Repo
%cd /content
!git clone https://github.com/AdityaNG/kan-gpt
%cd kan-gpt
!git pull
# Download Dataset
!./scripts/download_webtext.sh
!./scripts/download_tinyshakespeare.sh
# Install dependencies
!pip install -r requirements.txt
!pip install -e .

have a nice day

We need comparation results.

Increase Test Coverage

Is your feature request related to a problem? Please describe.

The current code test coverage is at around 60%, it would be good to have test cases to cover at least 80% of the code to ensure there are minimal regressions.

Describe the solution you'd like
We can see which files and which lines are covered by the existing test cases (which is defined in tests/) by clicking on the code coverage badge on the README or following this link:
https://codecov.io/gh/AdityaNG/kan-gpt

These coverage reports can also be generated locally using:

make test

Additional context
The goal is to add more test cases for the following folders. You may use the original repo's test cases as references.

kan_gpt/
kan_gpt/kan https://github.com/KindXiaoming/pykan
kan_gpt/mingpt https://github.com/karpathy/minGPT
kan_gpt/efficient_kan https://github.com/Blealtan/efficient-kan

This is a great starter issue for anyone interested in sinking their teeth deep into the repo :)

relax scikit-learn version

I am attempting to use kan-gpt in a project that requries scikit-learn>=1.2.2. However, kan-gpt pins an exact version of scikit-learn in the requirements.txt file (scikit_learn==1.1.3). If possible, could this be relaxed to scikit-learn>=1.1.13?

I followed the description as it was, but an error comes out. I post collab.

https://colab.research.google.com/drive/1EnRst9ojKYfxjieIQMhsuxje2eveN4DA#scrollTo=ItOjsuy5_ILn