Comments (5)
Thanks for the hint about the static library; after several tries, I was able to successfully compile test_gpt2cu program using this command:
nvcc -O3 --use_fast_math test_gpt2.cu /usr/local/cuda/lib64/libcublas_static.a /usr/local/cuda/lib64/libcublasLt_static.a /usr/local/cuda/lib64/libcudart_static.a /usr/local/cuda/lib64/libculibos.a -lpthread -ldl -lrt -o test_gpt2cu
There where some warnings:
nvlink warning : Skipping incompatible '/usr/lib/x86_64-linux-gnu/libpthread.a' when searching for -lpthread
nvlink warning : Skipping incompatible '/usr/lib/x86_64-linux-gnu/libdl.a' when searching for -ldl
nvlink warning : Skipping incompatible '/usr/lib/x86_64-linux-gnu/librt.a' when searching for -lrt
but test_gpt2cu worked.
Also the train_gpt2cu compile worked, but it seems 8GB VRAM of a RTX 2070 are not enough, the CUDA Driver uses shared memory and becomes very slow:
[System]
Device 0: NVIDIA GeForce RTX 2070
enable_tf32: 0
[GPT-2]
max_seq_len: 1024
vocab_size: 50257
num_layers: 12
num_heads: 12
channels: 768
num_parameters: 124439808
batch size: 4
sequence length: 1024
train dataset num_batches: 74
val dataset num_batches: 8
num_activations: 2456637440
val loss 4.513920
step 1/74: train loss 4.367857 (49702.271082 ms)
from llm.c.
Hi, I made progress on this one. Almost there, but not 100%; read on. I wanted to share learnings to far. Not surprisingly, the whole static/shared library aspect is a red herring to getting things to work in Windows Subsystem for Linux (WSL).
Here are repeatable steps to get train_gpt2cu
to run in WSL. Really fun to see it execute!
- Install Ubuntu fresh via wsl. I'm sharing the version that I used.
~/dev/llm.c$ uname -r
5.15.146.1-microsoft-standard-WSL2
- Get the latest bits and bytes and tools that you'll need
sudo apt update
sudo apt upgrade
sudo apt install gcc
sudo apt install python3-pip
- Restart the Linux instance
- Key point! Get the 12.2 CUDA toolkit. Neither 12.4 nor the default version a la (sudo apt install nvidia-cuda-toolkit) seems to work. Follow the instructions here: https://developer.nvidia.com/cuda-12-2-0-download-archive?target_os=Linux&target_arch=x86_64&Distribution=WSL-Ubuntu&target_version=2.0&target_type=runfile_local
- Follow these instructions, per the above installer
- PATH includes /usr/local/cuda-12.2/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-12.2/lib64, or, add /usr/local/cuda-12.2/lib64 to /etc/ld.so.conf and run ldconfig as root
- Git clone away and follow the llm.c CUDA instructions.
- Enjoy!
P.S.
python3 train_gpt2.py
fails with
Running pytorch 2.2.2+cu121
using device: cuda
wrote gpt2_tokenizer.bin
loading weights from pretrained gpt: gpt2
loading cached tokens in data/tiny_shakespeare_val.bin
wrote gpt2_124M.bin
Traceback (most recent call last):
File "/home/colin/dev/llm.c/train_gpt2.py", line 403, in <module>
write_state(model, x, y, logits, loss, "gpt2_124M_debug_state.bin")
File "/home/colin/dev/llm.c/train_gpt2.py", line 279, in write_state
grads = {name: param.grad.cpu() for name, param in model.named_parameters()}
File "/home/colin/dev/llm.c/train_gpt2.py", line 279, in <dictcomp>
grads = {name: param.grad.cpu() for name, param in model.named_parameters()}
AttributeError: 'NoneType' object has no attribute 'cpu'
so there is on more (python?) dependency issue to sort out. You can get around this, sadly, by commenting out the line
#write_state(model, x, y, logits, loss, "gpt2_124M_debug_state.bin")
This is obviously a silly solution; I'll need to spent more time later on resolving this one. But getting close to running on WSL!
from llm.c.
If for some weird reason you're not on Ubuntu 22.04 ... Ubuntu used to have a notoriously old python so pretty sure that would be your issue. But even on Linux it was a struggle to get Pytorch and Cuda versions that match up, just a few months back.
(incredible that Cuda works at all on WSL! Too many of the things I love about Linux, doesn't work on it, block devices, kernel options, network things, I've even had issues with pipes! I did love coLinux
... too bad it didn't survive 😢 )
You should try to do a fresh pull though, this project is growing fast!
from llm.c.
I've filed an issue with nvidia; I'll post back if I hear anything:
from llm.c.
Resolved after a git fetch! I didn't see any changes in train_gpt2.py that would have addressed, so perhaps elsewhere. I didn't update the python version.
For the record here are the version of Ubuntu and Python in which this is working:
Welcome to Ubuntu 22.04.4 LTS (GNU/Linux 5.15.146.1-microsoft-standard-WSL2 x86_64)
~/dev/llm.c$ python3 --version
Python 3.10.12
Thanks, dagelf, and tzzipproth for your suggestions and ideas. WSL is now working end to end!
from llm.c.
Related Issues (20)
- __stcs in layernorm_forward_kernel3 function in train_gpt2.cu HOT 3
- layernorm_backward.cu: atomicAdd
- [.gitignore] gitignore does not ignore all binary files HOT 3
- Is there a final bin file that can input some question and output some answer? HOT 1
- Input token length question HOT 2
- Suddenly "Out of memory" on train/python and train, test/CUDA on 4090 HOT 3
- bug: something goes wrong at larger batch sizes HOT 6
- CI tests are dependent on connectivity/uptime of huggingface HOT 3
- test_gpt2.cu correctness bounds tune per-parameter
- Windows Github actions / workflow is successfully building including Cuda 12.4 builds HOT 10
- Error while running "make train_gpt2fp32cu" on Ubuntu HOT 13
- CI Mac issue with resources for Python? HOT 5
- init from scratch HOT 3
- Splitting cuda dev files to use smaller sizes for cpu validation compared to profiling
- Refactoring all of the shared cuda helper methods to the shared common file
- WikiText 103 evaluation
- cuda code that approaches cublas performance
- Hardcoded block_size in kernels HOT 14
- inf loss at big batch
- delete use of cooperative groups in kernels HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llm.c.