Hi, I'm having trouble getting the CUDA code to execute. I wanted t

WSL setup details (resolved: instructions in the comments) about llm.c HOT 5 CLOSED

karpathy commented on June 12, 2024 1

WSL setup details (resolved: instructions in the comments)

from llm.c.

Comments (5)

tzipproth commented on June 12, 2024 2

Thanks for the hint about the static library; after several tries, I was able to successfully compile test_gpt2cu program using this command:

nvcc -O3 --use_fast_math test_gpt2.cu /usr/local/cuda/lib64/libcublas_static.a /usr/local/cuda/lib64/libcublasLt_static.a /usr/local/cuda/lib64/libcudart_static.a /usr/local/cuda/lib64/libculibos.a -lpthread -ldl -lrt -o test_gpt2cu

There where some warnings:
nvlink warning : Skipping incompatible '/usr/lib/x86_64-linux-gnu/libpthread.a' when searching for -lpthread
nvlink warning : Skipping incompatible '/usr/lib/x86_64-linux-gnu/libdl.a' when searching for -ldl
nvlink warning : Skipping incompatible '/usr/lib/x86_64-linux-gnu/librt.a' when searching for -lrt

but test_gpt2cu worked.

Also the train_gpt2cu compile worked, but it seems 8GB VRAM of a RTX 2070 are not enough, the CUDA Driver uses shared memory and becomes very slow:

[System]
Device 0: NVIDIA GeForce RTX 2070
enable_tf32: 0
[GPT-2]
max_seq_len: 1024
vocab_size: 50257
num_layers: 12
num_heads: 12
channels: 768
num_parameters: 124439808
batch size: 4
sequence length: 1024
train dataset num_batches: 74
val dataset num_batches: 8
num_activations: 2456637440
val loss 4.513920
step 1/74: train loss 4.367857 (49702.271082 ms)

from llm.c.

colinbrace commented on June 12, 2024 2

Hi, I made progress on this one. Almost there, but not 100%; read on. I wanted to share learnings to far. Not surprisingly, the whole static/shared library aspect is a red herring to getting things to work in Windows Subsystem for Linux (WSL).

Here are repeatable steps to get train_gpt2cu to run in WSL. Really fun to see it execute!

Install Ubuntu fresh via wsl. I'm sharing the version that I used.

~/dev/llm.c$ uname -r
5.15.146.1-microsoft-standard-WSL2

Get the latest bits and bytes and tools that you'll need

sudo apt update
sudo apt upgrade
sudo apt install gcc
sudo apt install python3-pip

Restart the Linux instance
Key point! Get the 12.2 CUDA toolkit. Neither 12.4 nor the default version a la (sudo apt install nvidia-cuda-toolkit) seems to work. Follow the instructions here: https://developer.nvidia.com/cuda-12-2-0-download-archive?target_os=Linux&target_arch=x86_64&Distribution=WSL-Ubuntu&target_version=2.0&target_type=runfile_local
Follow these instructions, per the above installer

 -   PATH includes /usr/local/cuda-12.2/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-12.2/lib64, or, add /usr/local/cuda-12.2/lib64 to /etc/ld.so.conf and run ldconfig as root

Git clone away and follow the llm.c CUDA instructions.
Enjoy!

P.S.
python3 train_gpt2.py

fails with

Running pytorch 2.2.2+cu121
using device: cuda
wrote gpt2_tokenizer.bin
loading weights from pretrained gpt: gpt2
loading cached tokens in data/tiny_shakespeare_val.bin
wrote gpt2_124M.bin
Traceback (most recent call last):
  File "/home/colin/dev/llm.c/train_gpt2.py", line 403, in <module>
    write_state(model, x, y, logits, loss, "gpt2_124M_debug_state.bin")
  File "/home/colin/dev/llm.c/train_gpt2.py", line 279, in write_state
    grads = {name: param.grad.cpu() for name, param in model.named_parameters()}
  File "/home/colin/dev/llm.c/train_gpt2.py", line 279, in <dictcomp>
    grads = {name: param.grad.cpu() for name, param in model.named_parameters()}
AttributeError: 'NoneType' object has no attribute 'cpu'

so there is on more (python?) dependency issue to sort out. You can get around this, sadly, by commenting out the line

#write_state(model, x, y, logits, loss, "gpt2_124M_debug_state.bin")

This is obviously a silly solution; I'll need to spent more time later on resolving this one. But getting close to running on WSL!

from llm.c.

dagelf commented on June 12, 2024 2

If for some weird reason you're not on Ubuntu 22.04 ... Ubuntu used to have a notoriously old python so pretty sure that would be your issue. But even on Linux it was a struggle to get Pytorch and Cuda versions that match up, just a few months back.

(incredible that Cuda works at all on WSL! Too many of the things I love about Linux, doesn't work on it, block devices, kernel options, network things, I've even had issues with pipes! I did love coLinux... too bad it didn't survive 😢 )

You should try to do a fresh pull though, this project is growing fast!

from llm.c.

colinbrace commented on June 12, 2024 1

I've filed an issue with nvidia; I'll post back if I hear anything:

https://forums.developer.nvidia.com/t/device-not-found-for-shared-cublas-but-found-for-static-cublas-static/289614

from llm.c.

colinbrace commented on June 12, 2024

Resolved after a git fetch! I didn't see any changes in train_gpt2.py that would have addressed, so perhaps elsewhere. I didn't update the python version.

For the record here are the version of Ubuntu and Python in which this is working:

Welcome to Ubuntu 22.04.4 LTS (GNU/Linux 5.15.146.1-microsoft-standard-WSL2 x86_64)

~/dev/llm.c$ python3 --version
Python 3.10.12

Thanks, dagelf, and tzzipproth for your suggestions and ideas. WSL is now working end to end!

from llm.c.

WSL setup details (resolved: instructions in the comments) about llm.c HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent