Giter Site home page Giter Site logo

nanogpt's Introduction

nanoGPT

nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs. It is a rewrite of minGPT that prioritizes teeth over education. Still under active development, but currently the file train.py reproduces GPT-2 (124M) on OpenWebText, running on a single 8XA100 40GB node in about 4 days of training. The code itself is plain and readable: train.py is a ~300-line boilerplate training loop and model.py a ~300-line GPT model definition, which can optionally load the GPT-2 weights from OpenAI. That's it.

repro124m

Because the code is so simple, it is very easy to hack to your needs, train new models from scratch, or finetune pretrained checkpoints (e.g. biggest one currently available as a starting point would be the GPT-2 1.3B model from OpenAI).

install

pip install torch numpy transformers datasets tiktoken wandb tqdm

Dependencies:

  • pytorch <3
  • numpy <3
  • transformers for huggingface transformers <3 (to load GPT-2 checkpoints)
  • datasets for huggingface datasets <3 (if you want to download + preprocess OpenWebText)
  • tiktoken for OpenAI's fast BPE code <3
  • wandb for optional logging <3
  • tqdm for progress bars <3

quick start

If you are not a deep learning professional and you just want to feel the magic and get your feet wet, the fastest way to get started is to train a character-level GPT on the works of Shakespeare. First, we download it as a single (1MB) file and turn it from raw text into one large stream of integers:

python data/shakespeare_char/prepare.py

This creates a train.bin and val.bin in that data directory. Now it is time to train your GPT. The size of it very much depends on the computational resources of your system:

I have a GPU. Great, we can quickly train a baby GPT with the settings provided in the config/train_shakespeare_char.py config file:

python train.py config/train_shakespeare_char.py

If you peek inside it, you'll see that we're training a GPT with a context size of up to 256 characters, 384 feature channels, and it is a 6-layer Transformer with 6 heads in each layer. On one A100 GPU this training run takes about 3 minutes and the best validation loss is 1.4697. Based on the configuration, the model checkpoints are being written into the --out_dir directory out-shakespeare-char. So once the training finishes we can sample from the best model by pointing the sampling script at this directory:

python sample.py --out_dir=out-shakespeare-char

This generates a few samples, for example:

ANGELO:
And cowards it be strawn to my bed,
And thrust the gates of my threats,
Because he that ale away, and hang'd
An one with him.

DUKE VINCENTIO:
I thank your eyes against it.

DUKE VINCENTIO:
Then will answer him to save the malm:
And what have you tyrannous shall do this?

DUKE VINCENTIO:
If you have done evils of all disposition
To end his power, the day of thrust for a common men
That I leave, to fight with over-liking
Hasting in a roseman.

lol ¯\_(ツ)_/¯. Not bad for a character-level model after 3 minutes of training on a GPU. Better results are quite likely obtainable by instead finetuning a pretrained GPT-2 model on this dataset (see finetuning section later).

I only have a macbook (or other cheap computer). No worries, we can still train a GPT but we want to dial things down a notch. I recommend getting the bleeding edge PyTorch nightly (select it here when installing) as it is currently quite likely to make your code more efficient. But even without it, a simple train run could look as follows:

python train.py config/train_shakespeare_char.py --device=cpu --compile=False --eval_iters=20 --log_interval=1 --block_size=64 --batch_size=12 --n_layer=4 --n_head=4 --n_embd=128 --max_iters=2000 --lr_decay_iters=2000 --dropout=0.0

Here, since we are running on CPU instead of GPU we must set both --device=cpu and also turn off PyTorch 2.0 compile with --compile=False. Then when we evaluate we get a bit more noisy but faster estimate (--eval_iters=20, down from 200), our context size is only 64 characters instead of 256, and the batch size only 12 examples per iteration, not 64. We'll also use a much smaller Transformer (4 layers, 4 heads, 128 embedding size), and decrease the number of iterations to 2000 (and correspondingly usually decay the learning rate to around max_iters with --lr_decay_iters). Because our network is so small we also ease down on regularization (--dropout=0.0). This still runs in about ~3 minutes, but gets us a loss of only 1.88 and therefore also worse samples, but it's still good fun:

python sample.py --out_dir=out-shakespeare-char --device=cpu

Generates samples like this:

GLEORKEN VINGHARD III:
Whell's the couse, the came light gacks,
And the for mought you in Aut fries the not high shee
bot thou the sought bechive in that to doth groan you,
No relving thee post mose the wear

Not bad for ~3 minutes on a CPU, for a hint of the right character gestalt. If you're willing to wait longer, feel free to tune the hyperparameters, increase the size of the network, the context length (--block_size), the length of training, etc.

Finally, on Apple Silicon Macbooks and with a recent PyTorch version make sure to add --device=mps (short for "Metal Performance Shaders"); PyTorch then uses the on-chip GPU that can significantly accelerate training (2-3X) and allow you to use larger networks. See Issue 28 for more.

reproducing GPT-2

A more serious deep learning professional may be more interested in reproducing GPT-2 results. So here we go - we first tokenize the dataset, in this case the OpenWebText, an open reproduction of OpenAI's (private) WebText:

python data/openwebtext/prepare.py

This downloads and tokenizes the OpenWebText dataset. It will create a train.bin and val.bin which holds the GPT2 BPE token ids in one sequence, stored as raw uint16 bytes. Then we're ready to kick off training. To reproduce GPT-2 (124M) you'll want at least an 8X A100 40GB node and run:

torchrun --standalone --nproc_per_node=8 train.py config/train_gpt2.py

This will run for about 4 days using PyTorch Distributed Data Parallel (DDP) and go down to loss of ~2.85. Now, a GPT-2 model just evaluated on OWT gets a val loss of about 3.11, but if you finetune it it will come down to ~2.85 territory (due to an apparent domain gap), making the two models ~match.

If you're in a cluster environment and you are blessed with multiple GPU nodes you can make GPU go brrrr e.g. across 2 nodes like:

# Run on the first (master) node with example IP 123.456.123.456:
torchrun --nproc_per_node=8 --nnodes=2 --node_rank=0 --master_addr=123.456.123.456 --master_port=1234 train.py
# Run on the worker node:
torchrun --nproc_per_node=8 --nnodes=2 --node_rank=1 --master_addr=123.456.123.456 --master_port=1234 train.py

It is a good idea to benchmark your interconnect (e.g. iperf3). In particular, if you don't have Infiniband then also prepend NCCL_IB_DISABLE=1 to the above launches. Your multinode training will work, but most likely crawl. By default checkpoints are periodically written to the --out_dir. We can sample from the model by simply python sample.py.

Finally, to train on a single GPU simply run the python train.py script. Have a look at all of its args, the script tries to be very readable, hackable and transparent. You'll most likely want to tune a number of those variables depending on your needs.

baselines

OpenAI GPT-2 checkpoints allow us to get some baselines in place for openwebtext. We can get the numbers as follows:

$ python train.py config/eval_gpt2.py
$ python train.py config/eval_gpt2_medium.py
$ python train.py config/eval_gpt2_large.py
$ python train.py config/eval_gpt2_xl.py

and observe the following losses on train and val:

model params train loss val loss
gpt2 124M 3.11 3.12
gpt2-medium 350M 2.85 2.84
gpt2-large 774M 2.66 2.67
gpt2-xl 1558M 2.56 2.54

However, we have to note that GPT-2 was trained on (closed, never released) WebText, while OpenWebText is just a best-effort open reproduction of this dataset. This means there is a dataset domain gap. Indeed, taking the GPT-2 (124M) checkpoint and finetuning on OWT directly for a while reaches loss down to ~2.85. This then becomes the more appropriate baseline w.r.t. reproduction.

finetuning

Finetuning is no different than training, we just make sure to initialize from a pretrained model and train with a smaller learning rate. For an example of how to finetune a GPT on new text go to data/shakespeare and run prepare.py to download the tiny shakespeare dataset and render it into a train.bin and val.bin, using the OpenAI BPE tokenizer from GPT-2. Unlike OpenWebText this will run in seconds. Finetuning can take very little time, e.g. on a single GPU just a few minutes. Run an example finetuning like:

python train.py config/finetune_shakespeare.py

This will load the config parameter overrides in config/finetune_shakespeare.py (I didn't tune them much though). Basically, we initialize from a GPT2 checkpoint with init_from and train as normal, except shorter and with a small learning rate. If you're running out of memory try decreasing the model size (they are {'gpt2', 'gpt2-medium', 'gpt2-large', 'gpt2-xl'}) or possibly decreasing the block_size (context length). The best checkpoint (lowest validation loss) will be in the out_dir directory, e.g. in out-shakespeare by default, per the config file. You can then run the code in sample.py --out_dir=out-shakespeare:

THEODORE:
Thou shalt sell me to the highest bidder: if I die,
I sell thee to the first; if I go mad,
I sell thee to the second; if I
lie, I sell thee to the third; if I slay,
I sell thee to the fourth: so buy or sell,
I tell thee again, thou shalt not sell my
possession.

JULIET:
And if thou steal, thou shalt not sell thyself.

THEODORE:
I do not steal; I sell the stolen goods.

THEODORE:
Thou know'st not what thou sell'st; thou, a woman,
Thou art ever a victim, a thing of no worth:
Thou hast no right, no right, but to be sold.

Whoa there, GPT, entering some dark place over there. I didn't really tune the hyperparameters in the config too much, feel free to try!

sampling / inference

Use the script sample.py to sample either from pre-trained GPT-2 models released by OpenAI, or from a model you trained yourself. For example, here is a way to sample from the largest available gpt2-xl model:

python sample.py \
    --init_from=gpt2-xl \
    --start="What is the answer to life, the universe, and everything?" \
    --num_samples=5 --max_new_tokens=100

If you'd like to sample from a model you trained, use the --out_dir to point the code appropriately. You can also prompt the model with some text from a file, e.g. python sample.py --start=FILE:prompt.txt.

efficiency notes

For simple model benchmarking and profiling, bench.py might be useful. It's identical to what happens in the meat of the training loop of train.py, but omits much of the other complexities.

Note that the code by default uses PyTorch 2.0. At the time of writing (Dec 29, 2022) this makes torch.compile() available in the nightly release. The improvement from the one line of code is noticeable, e.g. cutting down iteration time from ~250ms / iter to 135ms / iter. Nice work PyTorch team!

todos

  • Investigate and add FSDP instead of DDP
  • Eval zero-shot perplexities on standard evals (e.g. LAMBADA? HELM? etc.)
  • Finetune the finetuning script, I think the hyperparams are not great
  • Schedule for linear batch size increase during training
  • Incorporate other embeddings (rotary, alibi)
  • Separate out the optim buffers from model params in checkpoints I think
  • Additional logging around network health (e.g. gradient clip events, magnitudes)
  • Few more investigations around better init etc.

troubleshooting

Note that by default this repo uses PyTorch 2.0 (i.e. torch.compile). This is fairly new and experimental, and not yet available on all platforms (e.g. Windows). If you're running into related error messages try to disable this by adding --compile=False flag. This will slow down the code but at least it will run.

For some context on this repository, GPT, and language modeling it might be helpful to watch my Zero To Hero series. Specifically, the GPT video is popular if you have some prior language modeling context.

For more questions/discussions feel free to stop by #nanoGPT on Discord:

acknowledgements

All nanoGPT experiments are powered by GPUs on Lambda labs, my favorite Cloud GPU provider. Thank you Lambda labs for sponsoring nanoGPT!

nanogpt's People

Contributors

adambala avatar akashmjn avatar ankandrew avatar apivovarov avatar ctjlewis avatar danielgross avatar gnobre avatar ho2103 avatar johnwildauer avatar jorahn avatar karpathy avatar kjslag avatar kovkev avatar laihoe avatar lantiga avatar lutzroeder avatar micropanda123 avatar nat avatar nynyg avatar okuvshynov avatar otaviogood avatar pwhiddy avatar python273 avatar ramtingh avatar ryouze avatar snehalraj avatar venusatuluri avatar vinjn avatar yassineyousfi avatar ymurenko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nanogpt's Issues

Training on M1 "MPS"

Most of the people do not have access to 8XA100 40GB systems. But a single M1 Max laptop with 64 GB memory could host the training. How difficult is it to port this code to "MPS" ?

Got stucked at the "dataset = load_dataset("openwebtext")

The last outputs are "
Downloading builder script: 2.86kB [00:00, 3.19MB/s]
Downloading builder script: 2.86kB [00:00, 3.01MB/s]
Downloading builder script: 2.86kB [00:00, 3.07MB/s]
Downloading builder script: 2.86kB [00:00, 2.45MB/s]
Downloading builder script: 2.86kB [00:00, 2.98MB/s]
Downloading metadata: 1.15kB [00:00, 1.24MB/s]
Downloading builder script: 2.86kB [00:00, 3.08MB/s]
Downloading metadata: 1.15kB [00:00, 1.47MB/s]
Downloading metadata: 1.15kB [00:00, 1.21MB/s]
Downloading metadata: 1.15kB [00:00, 1.45MB/s]
Downloading metadata: 1.15kB [00:00, 1.49MB/s]
Downloading metadata: 1.15kB [00:00, 1.11MB/s] ",
and the code keeps being pending.

Out of Memory

I ran into the problem that my VM was out of memory when I ran prepare.py on the OpenWebText data. Would be cool to see something like minimal specs required in the documentation.
Cheers :)

Just a question

If I understand correctly, you have max 600000 iterations times batches of 12, which is roughly 7M training examples fed to the transformer, way smaller than the 9B tokens of the training set. I certainly am missing something? Thanks

More a question - is there an easy way to test generation?

Hi, love this project as a way to learn from scratch with local development. I was able to finetune the model, generate the checkpoints, generate the samples.

Is there an easy way to test out text generation (ie: Complete the text mask, or Q&A?) Tried combining a few different other projects out there and can't figure it out easily - it spits out Shakespeare samples only.

Am I trying to do something that isn't possible? :)

PyTorch-nightly dependency chain

Heads up! (to anyone who played with this when it was released)

The repo's readme has:

Code by default now uses PyTorch 2.0. At the time of writing (Dec 29, 2022) this makes torch.compile() available in the nightly release.

This is just at the tail end of the compromised PyTorch-nightly dependency chain between December 25th and December 30th, 2022, which was reported on the PyTorch blog.

Perhaps another dependency is on the transformers package

When I try to run the finetune_shakespeare script I get the following error:

Initializing from OpenAI GPT-2 weights: gpt2-xl
Traceback (most recent call last):
File "/Users/amir/Projects/nano-gpt/nanoGPT/train.py", line 137, in
model = GPT.from_pretrained(init_from, override_args)
File "/Users/amir/Projects/nano-gpt/nanoGPT/model.py", line 160, in from_pretrained
from transformers import GPT2LMHeadModel
ModuleNotFoundError: No module named 'transformers'

Installing the transformers package from PyPi allowed the script to run without an error.

GPT with UNet architecture gets the loss down to ~1.0 with no significant computation costs.

TLDR:
This is a bit of a tangent of what the original repo is for (education), but I think it's an interesting finding. Basically by using the following architecture adopted from the UNet paper, we can use significantly deeper models and the computation cost is just slightly higher.

self.transformer = nn.ModuleDict(dict(
            wte=nn.Embedding(config.vocab_size, config.n_embd),
            wpe=nn.Embedding(config.block_size, config.n_embd),
            drop=nn.Dropout(config.dropout),
            compressing=nn.ModuleList([
                nn.ModuleList([Block(config) for _ in range(5)] + [BlockCompressing(config)]),
                nn.ModuleList([Block(config) for _ in range(2)] + [BlockCompressing(config)]),
                nn.ModuleList([Block(config) for _ in range(2)] + [BlockCompressing(config)]),
                nn.ModuleList([Block(config) for _ in range(2)] + [BlockCompressing(config)]),
                nn.ModuleList([Block(config) for _ in range(2)] + [BlockCompressing(config)])
            ]),
            middle=nn.ModuleList(
                [Block(config) for _ in range(100)]),
            expanding=nn.ModuleList([
                nn.ModuleList([BlockExpanding(config)] + [Block(config) for _ in range(2)]),
                nn.ModuleList([BlockExpanding(config)] + [Block(config) for _ in range(2)]),
                nn.ModuleList([BlockExpanding(config)] + [Block(config) for _ in range(2)]),
                nn.ModuleList([BlockExpanding(config)] + [Block(config) for _ in range(2)]),
                nn.ModuleList([BlockExpanding(config)] + [Block(config) for _ in range(2)]),
            ]),
            ln_f=nn.LayerNorm(config.n_embd),
        ))

The idea:
Using the same word block size (in our case 1024), seems pretty wasteful. Words like "the", "a", "she" etc. don't contain too much information, meaning we could compress the input significantly. This could be applied to sentences too: most sentences don't contain that much information and could be stored in fewer vectors.

The other observation is that most of the words are kept (because of the low drop out rate), and only needs to be shifted. Shifting can be done with only one block and we can assume that the missing words can be expressed with much less words than 1024.

Using all these assumptions, most of the information is copied from the input to the output. This brings us to the UNet architecture. UNet was designed for segmentation and it's architecture was designed on similar assumptions.

Adopting a UNet style architecture has one major advantage: a shorter word block size. This brings down the computation of the attention layer from (10241024) = 1048576, to (3232) = 1024. This is a significant reduction in computation, meaning the middle layers can be much much more deeper.

I don't know how well this architecture can be used for other tasks, like translation, vision or chatGPT. But for this particular case, it seems to work well. The idea is also not new, after few hours of searching, I found this paper from 2019: https://arxiv.org/pdf/1910.10488v1.pdf.

You can find the full model here: https://github.com/englertbruno/nanoGPT/blob/master/model_gpt_unet.py

Google Coral

Hey, all!

I purchased a Google Coral and wanted to start exploring AI / ML

This is hardly an issue, more a question:
Is Google Coral compatible with Torch? Can I use it with nanoGPT to get some benefit?
I also own a GeForce 2080.

I'm new to the game and thought this project would be a good place to start!
Thank you!

Running train.py on 2060 GPU

"Hello! I've been trying to run the train.py on a 2060 GPU, but this device does not support dtype=torch.bfloat16. What changes would I have to make to achieve my goal? Or can I only train on an Ampere architecture GPU for now? Thank you very much for sharing this project!"

GPU specs for finetuning gpt2-xl

I am able to finetune gpt2-xl model on AWS c5.9xlarge - CPU (72 GB memory) ( slow but doable)
It looks like 1x A100 40GB is capable of achieving this at much better speed.

I don't have access to A100, but I do have some more affordable multi-core GPUs.
I could not finetune the model on multi-core GPUs even with the collective memory > 40 GB.

So it appears that GPU memory has to be at least 40GB per core.
Does it sound right?

Training on AMD Ryzen 5 5600H with Radeon Graphics, 3301 Mhz (RTX 3050 Laptop), 6 Cores, 12 Threads

*Edit: Here is my Python Version and packages list, including NVIDIA CUDA info.

Python 3.8.10

aiohttp==3.8.3
aiosignal==1.3.1
async-timeout==4.0.2
attrs==22.2.0
blobfile==2.0.0
certifi==2022.12.7
charset-normalizer==2.1.1
colorama==0.4.6
datasets==2.8.0
dill==0.3.6
filelock==3.9.0
frozenlist==1.3.3
fsspec==2022.11.0
huggingface-hub==0.11.1
idna==3.4
lxml==4.9.2
multidict==6.0.4
multiprocess==0.70.14
numpy==1.24.1
packaging==23.0
pandas==1.5.2
Pillow==9.4.0
pyarrow==10.0.1
pycryptodomex==3.16.0
python-dateutil==2.8.2
pytz==2022.7
PyYAML==6.0
regex==2022.10.31
requests==2.28.1
responses==0.18.0
six==1.16.0
tiktoken==0.1.2
torch==1.13.1+cu117
torchaudio==0.13.1+cu117
torchvision==0.14.1+cu117
tqdm==4.64.1
typing-extensions==4.4.0
urllib3==1.26.13
xxhash==3.2.0
yarl==1.8.2

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_19:00:59_Pacific_Daylight_Time_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0

Following all the protocols in the README.md, upon 'prepare.py' run after the extraction, I started having some problems, I will copy below the console errors:

Loading cached split indices for dataset at C:\Users\jeanc\.cache\huggingface\datasets\openwebtext\plain_text\1.0.0\85b3ae7051d2d72e7c5fdf6dfb462603aaa26e9ed506202bf3a24d261c6c40a1\cache-e592ba88a0d6344c.arrow and C:\Users\jeanc\.cache\huggingface\datasets\openwebtext\plain_text\1.0.0\85b3ae7051d2d72e7c5fdf6dfb462603aaa26e9ed506202bf3a24d261c6c40a1\cache-9fc622ec8039deff.arrow
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\spawn.py", line 125, in _main
    prepare(preparation_data)
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 265, in run_path
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\spawn.py", line 125, in _main
    prepare(preparation_data)
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
Traceback (most recent call last):
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 265, in run_path
  File "<string>", line 1, in <module>
    return _run_module_code(code, init_globals, run_name,
    return _run_module_code(code, init_globals, run_name,
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\spawn.py", line 116, in spawn_main
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 97, in _run_module_code
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 97, in _run_module_code
    exitcode = _main(fd, parent_sentinel)
    _run_code(code, mod_globals, init_globals,
    _run_code(code, mod_globals, init_globals,
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\spawn.py", line 125, in _main
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 87, in _run_code
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 87, in _run_code
    prepare(preparation_data)
    exec(code, run_globals)
    exec(code, run_globals)
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\spawn.py", line 236, in prepare
  File "C:\AI\nanoGPT\data\openwebtext\prepare.py", line 43, in <module>
  File "C:\AI\nanoGPT\data\openwebtext\prepare.py", line 43, in <module>
    _fixup_main_from_path(data['init_main_from_path'])
    tokenized = split_dataset.map(
    tokenized = split_dataset.map(
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\spawn.py", line 287, in _fixup_main_from_path
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\datasets\dataset_dict.py", line 816, in map
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\datasets\dataset_dict.py", line 816, in map
    main_content = runpy.run_path(main_path,
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\AI\nanoGPT\data\openwebtext\prepare.py", line 43, in <module>
    tokenized = split_dataset.map(
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\datasets\dataset_dict.py", line 816, in map
    {
    {
    {
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\datasets\dataset_dict.py", line 817, in <dictcomp>
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\datasets\dataset_dict.py", line 817, in <dictcomp>
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\datasets\dataset_dict.py", line 817, in <dictcomp>
    k: dataset.map(
    k: dataset.map(
    k: dataset.map(
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\datasets\arrow_dataset.py", line 2926, in map
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\datasets\arrow_dataset.py", line 2926, in map
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\datasets\arrow_dataset.py", line 2926, in map
    with Pool(nb_of_missing_shards, initargs=initargs, initializer=initializer) as pool:
    with Pool(nb_of_missing_shards, initargs=initargs, initializer=initializer) as pool:
    with Pool(nb_of_missing_shards, initargs=initargs, initializer=initializer) as pool:
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\context.py", line 119, in Pool
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\context.py", line 119, in Pool
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\context.py", line 119, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild,
    return Pool(processes, initializer, initargs, maxtasksperchild,
    return Pool(processes, initializer, initargs, maxtasksperchild,
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\pool.py", line 212, in __init__
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\pool.py", line 212, in __init__
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\pool.py", line 212, in __init__
    self._repopulate_pool()
    self._repopulate_pool()
    self._repopulate_pool()
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\pool.py", line 303, in _repopulate_pool
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\pool.py", line 303, in _repopulate_pool
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\pool.py", line 303, in _repopulate_pool
    return self._repopulate_pool_static(self._ctx, self.Process,
    return self._repopulate_pool_static(self._ctx, self.Process,
    return self._repopulate_pool_static(self._ctx, self.Process,
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\pool.py", line 326, in _repopulate_pool_static
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\pool.py", line 326, in _repopulate_pool_static
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\pool.py", line 326, in _repopulate_pool_static
    w.start()
    w.start()
    w.start()
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\process.py", line 121, in start
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\process.py", line 121, in start
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\process.py", line 121, in start
    self._popen = self._Popen(self)
    self._popen = self._Popen(self)
    self._popen = self._Popen(self)
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\context.py", line 327, in _Popen
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\context.py", line 327, in _Popen
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\context.py", line 327, in _Popen
    return Popen(process_obj)
    return Popen(process_obj)
    return Popen(process_obj)
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\popen_spawn_win32.py", line 45, in __init__
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\popen_spawn_win32.py", line 45, in __init__
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\popen_spawn_win32.py", line 45, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
    prep_data = spawn.get_preparation_data(process_obj._name)
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\spawn.py", line 154, in get_preparation_data
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\spawn.py", line 154, in get_preparation_data
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
    _check_not_importing_main()
    _check_not_importing_main()
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\spawn.py", line 134, in _check_not_importing_main
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\spawn.py", line 134, in _check_not_importing_main
  File "C:\Users\jeanc\AppData\Local\Programs\Python\Python38\lib\site-packages\multiprocess\spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
    raise RuntimeError('''
RuntimeError:

    
    An attempt has been made to start a new process before the
    current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable. `

I made sure to set num_proc = 3

After this I tried some code edits, none of the were succesful (following some suggestions from similar errors in different scripts online).

At last, I attempted to ChatGPT, my first input was inserting the prepare.py code and asking what is the DataSet used in the code.

I had the following answer:

The dataset being used is the "openwebtext" dataset, which is loaded using the load_dataset function from the datasets module (presumably from the Hugging Face transformers library). This dataset is then split into a train and validation set using the train_test_split method, with the validation set being a very small fraction (0.0005) of the total dataset. The resulting train and validation sets are stored in the split_dataset dictionary, with the train set having approximately 8,009,762 examples and the validation set having approximately 4,007 examples.

My following prompt was (sorry for the low level comprehension of pytorch and A.I.):

how to use this dataset in this code

# Download training data from open datasets.
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
)

# Download test data from open datasets.
test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)

Then i had this answer:

To use the openwebtext dataset in the provided code, you would need to modify the code to first load the openwebtext dataset using the load_dataset function and then apply the required processing steps (tokenization and concatenation) to the dataset. You can then use the resulting tokenized dataset for training or evaluation.
Here is an example of how you could modify the code to use the openwebtext dataset:

import numpy as np
import tiktoken
from datasets import load_dataset

# Load the openwebtext dataset.
dataset = load_dataset("openwebtext")

# Tokenize the dataset using the encoding function and process function defined in the original code.
enc = tiktoken.get_encoding("gpt2")
def process(example):
    ids = enc.encode_ordinary(example['text'])
    ids.append(enc.eot_token)
    out = {'ids': ids, 'len': len(ids)}
    return out

tokenized = dataset.map(process, remove_columns=['text'])

# Concatenate all the ids in the dataset into one large file.
arr_len = np.sum(tokenized['len'])
filename = 'openwebtext.bin'
dtype = np.uint16
arr = np.memmap(filename, dtype=dtype, mode='w+', shape=(arr_len,))

idx = 0
for example in tqdm(tokenized):
    arr[idx : idx + example['len']] = example['ids']
    idx += example['len']
arr.flush()

You can then use the openwebtext.bin file for training or evaluation. To read the file, you can use np.memmap as shown in the original code.

So now the code is running and looking like:

# saves the openwebtext dataset to a binary file for training. following was helpful:
# https://github.com/HazyResearch/flash-attention/blob/main/training/src/datamodules/language_modeling_hf.py

from tqdm import tqdm
import numpy as np
import tiktoken
from datasets import load_dataset  # huggingface datasets

# number of workers in .map() call
# good number to use is ~order number of cpu cores // 2
num_proc = 3

# takes 54GB in huggingface .cache dir, about 8M documents (8,013,769)
dataset = load_dataset("openwebtext")

# owt by default only contains the 'train' split, so create a test split
split_dataset = dataset["train"].train_test_split(test_size=0.0005, seed=2357, shuffle=True)
split_dataset['val'] = split_dataset.pop('test')  # rename the test split to val

# this results in:
# >>> split_dataset
# DatasetDict({
#     train: Dataset({
#         features: ['text'],
#         num_rows: 8009762
#     })
#     val: Dataset({
#         features: ['text'],
#         num_rows: 4007
#     })
# })

# Tokenize the dataset using the encoding function and process function defined in the original code.
enc = tiktoken.get_encoding("gpt2")


def process(example):
    ids = enc.encode_ordinary(example['text'])
    ids.append(enc.eot_token)
    out = {'ids': ids, 'len': len(ids)}
    return out


tokenized = dataset.map(process, remove_columns=['text'])

# Concatenate all the ids in the dataset into one large file.
arr_len = np.sum(tokenized['len'])
filename = 'openwebtext.bin'
dtype = np.uint16
arr = np.memmap(filename, dtype=dtype, mode='w+', shape=(arr_len,))

idx = 0
for example in tqdm(tokenized):
    arr[idx: idx + example['len']] = example['ids']
    idx += example['len']
arr.flush()

# train.bin is ~17GB, val.bin ~8.5MB
# train has ~9B tokens (9,035,582,198)
# val has ~4M tokens (4,434,897)

# to read the bin files later, e.g. with numpy:
# m = np.memmap('train.bin', dtype=np.uint16, mode='r')
# m[:10]
# array([  50256,  52429,  52429,  52429,  52429,  52429,  52429,  52429,
#         52429,  52429], dtype=uint16)

# to look up the token values, use the enc.decode_single() method. e.g.:
# >>> enc.decode_single(50256)
# '

Now, I will continue to learn and understand about this new world, thank you @karpathy for the code and when everything is set up, I will learn how to learn, adapt and use this tool to the world around me, thank you.

I could not even start to comprehend after the first prepare.py run, so I am still sorting out some things, hope that this will help someone, someday.

The main difference is that num_proc is absent in the defined functions, other than the loop being written in a different way, probably that is the reason that it seems to work now, i will test later with the original code without the num_proc in the way you originally wrote @karpathy

*Edit 2

Yes, without num_proc being defined, it was only using 1 core. I ran the previous prepare.py version, now I edited and included num_proc

# saves the openwebtext dataset to a binary file for training. following was helpful:
# https://github.com/HazyResearch/flash-attention/blob/main/training/src/datamodules/language_modeling_hf.py
from torch import multiprocessing
from tqdm import tqdm
import numpy as np
import tiktoken
from datasets import load_dataset  # huggingface datasets

# number of workers in .map() call
# good number to use is ~order number of cpu cores // 2
num_proc = 3

# takes 54GB in huggingface .cache dir, about 8M documents (8,013,769)
dataset = load_dataset("openwebtext")

# owt by default only contains the 'train' split, so create a test split
split_dataset = dataset["train"].train_test_split(test_size=0.0005, seed=2357, shuffle=True)
split_dataset['val'] = split_dataset.pop('test')  # rename the test split to val

# this results in:
# >>> split_dataset
# DatasetDict({
#     train: Dataset({
#         features: ['text'],
#         num_rows: 8009762
#     })
#     val: Dataset({
#         features: ['text'],
#         num_rows: 4007
#     })
# })

# Tokenize the dataset using the encoding function and process function defined in the original code.
enc = tiktoken.get_encoding("gpt2")

if __name__ == '__main__':
    # Enable the creation of child processes.
    multiprocessing.freeze_support()

    # Load the openwebtext dataset.
    dataset = load_dataset("openwebtext")

    # Tokenize the dataset using the encoding function and process function defined in the original code.
    enc = tiktoken.get_encoding("gpt2")


    def process(example):
        ids = enc.encode_ordinary(example['text'])
        ids.append(enc.eot_token)
        out = {'ids': ids, 'len': len(ids)}
        return out


    # Set the number of worker processes to use.
    num_proc = num_proc

    tokenized = split_dataset.map(process, remove_columns=['text'], num_proc=num_proc)

    # Concatenate all the ids in the dataset into one large file.
    arr_len = np.sum(tokenized['len'])
    filename = 'openwebtext.bin'
    dtype = np.uint16
    arr = np.memmap(filename, dtype=dtype, mode='w+', shape=(arr_len,))

    idx = 0
    for example in tqdm(tokenized):
        arr[idx: idx + example['len']] = example['ids']
        idx += example['len']
    arr.flush()

# train.bin is ~17GB, val.bin ~8.5MB
# train has ~9B tokens (9,035,582,198)
# val has ~4M tokens (4,434,897)

# to read the bin files later, e.g. with numpy:
# m = np.memmap('train.bin', dtype=np.uint16, mode='r')
# m[:10]
# array([  50256,  52429,  52429,  52429,  52429,  52429,  52429,  52429,
#         52429,  52429], dtype=uint16)

# to look up the token values, use the enc.decode_single() method. e.g.:
# >>> enc.decode_single(50256)
# '

How to load the GPT-2 model

Can you give an example of how to use the official GPT-2 model.
I downloaded it successfully via https://raw.githubusercontent.com/openai/gpt-2/master/download_model.py
Moved and renamed the model.ckpt.data-00000-of-00001 to /out/ckpt.pt
But I got some pickl errors when loading it.

python sample.py config\eval_gpt2.py

Traceback (most recent call last):
File "d:\work\AI\nanoGPT\sample.py", line 35, in
checkpoint = torch.load(ckpt_path, map_location=device)
File "c:\Users\orosa\anaconda3\envs\evn_nanogpt\lib\site-packages\torch\serialization.py", line 795, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "c:\Users\orosa\anaconda3\envs\evn_nanogpt\lib\site-packages\torch\serialization.py", line 1002, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '\x03'.

# note: each worker gets a different seed

I'm confused about this line from train.py:

torch.manual_seed(1337 + gpu_id) # note: each worker gets a different seed

Doesn't that mean each of your DDP model instances will have a different random seed? Shouldn't the models on each DDP worker be identical, as we want to shard the data not the model.

my gpu only supports float16, how do i train a model?

PS F:\downloads\nanoGPT-master\nanoGPT-master> python train.py --learning_rate=0.005 --wandb_log=True Overriding: learning_rate = 0.005
Overriding: wandb_log = True
Traceback (most recent call last):
File "train.py", line 101, in
ctx = nullcontext() if device_type == 'cpu' else torch.amp.autocast(device_type=device_type, dtype=ptdtype)
File "F:\python3.8.10\lib\site-packages\torch\amp\autocast_mode.py", line 225, in init
raise RuntimeError('Current CUDA Device does not support bfloat16. Please switch dtype to float16.')
RuntimeError: Current CUDA Device does not support bfloat16. Please switch dtype to float16.

Proposal for a slightly improved minimal configuration system

Thanks @karpathy for this nice little project. I also really enjoyed watching the Youtube lecture that goes with it!

In my proposal I'm addressing the comment here https://github.com/karpathy/nanoGPT/blob/master/configurator.py#L12

A slight improvement over the existing system would be to rely on TOML configuration files, instead of Python files. TOML is part of the standard Python library since 3.11. The code replacement in train.py would roughly look like this:

import tomllib

a = 5
b = "beta"
c = True

config_str = """
a = 10
b = "alpha"
c = true
new = "test"
"""

config = tomllib.loads(config_str)

difference = set(config).difference(globals())

if difference:
    print(f"Not a configurable value: {difference}")

globals().update(config)

The main advantages are:

  • Remove the exec() statement, so it does not execute arbitrary Python code written in the config files.
  • Notify users of typos or values that cannot be configured.
  • The existing .py config files are basically valid TOML files. So can just be renamed.

I'd be happy to implement a solution along the lines suggested above, if welcome.

A question on getting garbage in sample.py (Generator)

Hi,

I am not an expert in Transformers or DNN, but I followed the steps for training tiny_shakespeare and sampled some tokens after training. I see that it is generating garbage.

Here is what I have used:

  1. Running on a CPU in cheap laptop (with character tokenization)

training command:
2. python train.py config/train_shakespeare_char.py --device=cpu --compile=False --eval_iters=20 --log_interval=1 --block_size=64 --batch_size=8 --max_iters=5000

sample command:
3.python sample.py --out_dir=out-shakespeare-char

How do I generate some meaningful shakespeare text? Do I increase max_iters?

train_shakespeare_char.py:

train a miniature character-level shakespeare model

good for debugging and playing on macbooks and such

out_dir = 'out-shakespeare-char'
eval_interval = 250 # keep frequent because we'll overfit
eval_iters = 200
log_interval = 10 # don't print too too often

we expect to overfit on this small dataset, so only save when val improves

always_save_checkpoint = False

wandb_log = False # override via command line if you like
wandb_project = 'shakespeare-char'
wandb_run_name = 'mini-gpt'

dataset = 'shakespeare_char'
batch_size = 64
block_size = 256 # context of up to 128 previous characters

baby GPT model :)

n_layer = 6
n_head = 6
n_embd = 384
dropout = 0.2

learning_rate = 1e-3 # with baby networks can afford to go a bit higher
max_iters = 5000
lr_decay_iters = 5000 # make equal to max_iters usually
min_lr = 1e-4 # learning_rate / 10 usually
beta2 = 0.99 # make a bit bigger because number of tokens per iter is small

warmup_iters = 100 # not super necessary potentially

on macbook also add

device = 'cpu' # run on cpu only

compile = False # do not torch compile the model

Try using gelu approximate = 'tanh'

Suggestion

I noticed that you are defining a fused GELU similar to the one used in the Google BERT repository. As a tip, the GELU. function in PyTorch has an optional approximate argument that you can set to tanh which should give you equivalent results but with slightly better efficiency. I'm not sure if using this option would go against the spirit of your project, but I just wanted to let you know about it.

Regardless, I think your ability to extract useful insights from complex systems is impressive and inspiring!

Jax/Flax Rewrite

Thanks for the incredibly lucid GPT implementation!

I've started rewriting nanoGPT in Jax/Flax as a test-bed to play with the new jax.experimental.pjit API. Thought I'd put it here for anyone who's interested.
https://github.com/jenkspt/gpt-jax

Also, following Jax convention -- I figured it might be reasonable to try torch.compile with the entire training update step (rather than just the forward pass) i.e https://github.com/jenkspt/gpt-jax/blob/c4e38cc35264c0eab9508bf0180c5f6e52753938/train.py#L60-L76

Thank you

Andrej, thank you for everything -- you have been a great teacher to many of us.

Please feel free to close this issue now :)

Stop words?

I'm trying to use nanoGPT to generate Python code, and I don't find a stop words implementation in the code right now, so what I'm getting is this:

Write a hello world function in Python3. Generate only code and no human language. Stop with double new line.
def hello_world():
... print("Hello world")
Hello world, you are nice
Output:
Hello world, you are nice
More about Python 3
Python 3 is a pretty new version of Python.
It has a lot new features like multithreading

I did make a naive change to the models.py in GPT.generate, as shown below. But it's not working.

if idx_next in self.stop:
    break

I wonder if there's any way to let it learn to stop with "\n\n" without fine tuning.

OpenWebTextCorpus DataLoader

I created an OpenWebTextCorpus DataLoader for training, and I thought you might find it useful.

It automatically downloads the tar file from google drive, which is a 12GB compressed file (instead of 54GB from the datasets package, but it does require gdown as a dependency). It's an IterableDataset, and encodes the batches on the fly (no preprocessing). All the encoding/batching is done in the collate_fn.

I didn't submit this as a pull request, as it requires a complete rewrite of your training code :)

Nevertheless, I'm happy to do so if you find it useful. Thanks!

import os
import random
import tarfile
from functools import partial
from typing import Optional, List, Callable
from itertools import islice
import torch
import tiktoken
import gdown


class OpenWebTextCorpus(torch.utils.data.IterableDataset):
    url = "https://drive.google.com/uc?id=1EA5V0oetDCOke7afsktL_JDQ-ETtNOvx"

    def __init__(self, tar_filename: str):
        super().__init__()
        self.tar_filename = tar_filename
        if os.path.exists(self.tar_filename) is False:
            OpenWebTextCorpus.download_file()

    @property
    def document_count(self):
        return 8013769

    def download_file(self, quiet: Optional[bool] = False):
        """8M documents (8,013,769) - 12GB Download
        https://skylion007.github.io/OpenWebTextCorpus/
        https://drive.google.com/drive/folders/1IaD_SIIB-K3Sij_-JjWoPy_UrWqQRdjx
        """
        gdown.download(OpenWebTextCorpus.url, self.tar_filename, quiet=quiet)

    def read_tar_file__xz(self):
        with tarfile.open(
            self.tar_filename, mode="r", encoding="utf-8"
        ) as inside_tar:
            for xz_file in inside_tar:
                with inside_tar.extractfile(xz_file) as inside_xz:
                    with tarfile.open(
                        fileobj=inside_xz, mode="r:xz", encoding="utf-8"
                    ) as txt_directory:
                        for txt_file in txt_directory:
                            yield txt_directory.extractfile(
                                txt_file
                            ).read().decode("utf-8")

    def get_stream(self):
        return self.read_tar_file__xz()

    def __iter__(self):
        worker_info = torch.utils.data.get_worker_info()
        if not worker_info:
            return self.get_stream()
        else:
            return islice(
                self.get_stream(), worker_info.id, None, worker_info.num_workers
            )


def collate_fn(
    batch: List[str],
    encoder: Callable,
    block_size: int,
    dtype: torch.dtype,
    use_dynamic_batching: bool,
):
    encoded = encoder.encode_ordinary_batch(batch)
    max_length = (
        min(block_size, max([len(input_ids) for input_ids in encoded]))
        if use_dynamic_batching
        else block_size
    )
    input_ids = torch.empty((len(batch), max_length), dtype=dtype).fill_(
        encoder.eot_token
    )
    targets = torch.empty((len(batch), max_length), dtype=dtype).fill_(
        encoder.eot_token
    )
    for index in range(len(encoded)):
        block = encoded[index]
        if len(block) - max_length > 0:
            # sample tokens
            start = random.randrange(0, len(block) - max_length)
            block = block[start : start + max_length + 1]
        l = len(block[:-1][:max_length])
        input_ids[index, :l] = torch.tensor(block[:-1][:max_length])
        targets[index, :l] = torch.tensor(block[1:][:max_length])
    return {
        "input_ids": input_ids,
        "targets": targets,
    }


def get_data_loader(
    tar_filename: str,
    block_size: Optional[int] = 1024,
    batch_size: Optional[int] = 4,
    num_workers: Optional[int] = 4,
    prefetch_factor: Optional[int] = 8,
    dtype: Optional[torch.dtype] = torch.int64,
    use_dynamic_batching: Optional[bool] = False,
):
    iterable_dataset = OpenWebTextCorpus(tar_filename=tar_filename)
    encoder = tiktoken.get_encoding("gpt2")
    assert (
        torch.tensor(encoder.max_token_value).type(dtype).item()
        == encoder.max_token_value
    ), "`dtype` does not cover the full range of values for the vocabulary"
    collate_fn_partial = partial(
        collate_fn,
        encoder=encoder,
        block_size=block_size,
        dtype=dtype,
        use_dynamic_batching=use_dynamic_batching,
    )
    return torch.utils.data.DataLoader(
        dataset=iterable_dataset,
        batch_size=batch_size,
        num_workers=num_workers,
        prefetch_factor=prefetch_factor,
        collate_fn=collate_fn_partial,
        pin_memory=True,
    )

Remove @torch.jit.script decorator when compiling the model?

Hey there, should we remove the @torch.jit.script decorator for the fused_gelu activation function if we compile the model? I benchmarked both and found the inference time to be substantially faster without the decorator. This was tested on an NVIDIA A6000 card. Thanks!

Pluck last token before lm_head(x) during inference?

Reading the forward code, the generate code, and the training code it looks like you only need all logits during training, and that during generation (inference) only the last logit is actually used.

Since lm_head is a substantial op, it could make sense to pluck the last token before lm_head as opposed to after during inference?

I.e. something like:

# if we are given some desired targets also calculate the loss
loss = None
if targets is not None:
    logits = self.lm_head(x)
    loss = F.cross_entropy(logits.view(-1, logits.size(-1)), targets.view(-1), ignore_index=-1)
else:
    logits = self.lm_head(x[:, :-1, :]).squeeze(1)

return logits, loss

Haven't tested that code, apologies for any typos/errors.

CUDA out of memory

Run on an NVIDIA V100 PCIe3 32GB and got the following OOM error after the step 0:

RuntimeError: CUDA out of memory. Tried to allocate 786.00 MiB (GPU 0; 15.78 GiB total capacity; 14.28 GiB already allocated; 620.69 MiB free; 14.32 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

checkpoints don't seem to be working

I'm testing out train.py on google colab but no checkpoints are created, even after iter 1000 .

I'm using this command:

!cd /content/nanoGPT/ && python train.py --dataset=shakespeare --compile=False --n_layer=4 --n_head=4 --n_embd=64 --eval_iters=20 --block_size=64 --batch_size=8 --init_from=gpt2 --always_save_checkpoint=True --dtype=float32

Cuda out of Memory

Try on a cluster using multiple nodes.
Example:

  1. run "torchrun --nproc_per_node=2 --nnodes=2 --node_rank=0 --master_addr=<Master node's IP>--master_port=1234 train.py --dataset=shakespeare --dtype=float16 --batch_size=2 --compile=False" on the Master node
  2. ssh to the second node
  3. run "torchrun --nproc_per_node=2 --nnodes=2 --node_rank=1 --master_addr=<Master node's IP> --master_port=1234 train.py --dataset=shakespeare --dtype=float16 --batch_size=2 --compile=False"

Got errors: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 394.00 MiB (GPU 0; 15.78 GiB total capacity; 5.17 GiB already allocated; 8.68 GiB free; 5.25 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Dataset load

Hello I've an issue while loading my dataset in prepare.py (for obenwebtext). The download and the extraction complete successfully but the generation of train split raise an error.

I've already try to look for the file 0180327-a95f1342cd685fb7d22805aa720870d2.txt in the archive and add it manually to the extracted dataset but it doesn't work. The ignore_verification is False.

If you need more informations I can give you whatever you need

Thanks for your help

Config :

  • AMD Ryzen 5 5600X
  • Nvidia 3060ti (CUDA 11.7)
  • 32Gb RAM (3200Mhz/CAS16)
  • Windows 10 64bits
  • Python 3.9.13 (virtualenv)
Computing checksums of downloaded files. They can be used for integrity verification. You can disable this by passing ignore_verifications=True to load_dataset
Computing checksums: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:08<00:00,  8.82s/it]
C:\Users\emili\Desktop\nanoGPT\venv\lib\site-packages\datasets\download\download_manager.py:431: FutureWarning: 'num_proc' was deprecated in version 2.6.2 and will be removed in 3.0.0. Pass `DownloadConfig(num_proc=<num_proc>)` to the initializer instead.
  warnings.warn(
Extracting data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20610/20610 [05:27<00:00, 62.85it/s]
Generating train split:   0%|▋                                                                                                                                                | 35271/8013769 [01:43<2:24:57, 917.33 examples/s]Traceback (most recent call last):
  File "C:\Users\emili\Desktop\nanoGPT\venv\lib\site-packages\datasets\builder.py", line 1570, in _prepare_split_single
    for key, record in generator:
  File "C:\Users\emili\.cache\huggingface\modules\datasets_modules\datasets\openwebtext\85b3ae7051d2d72e7c5fdf6dfb462603aaa26e9ed506202bf3a24d261c6c40a1\openwebtext.py", line 85, in _generate_examples
    with open(filepath, encoding="utf-8") as f:
  File "C:\Users\emili\Desktop\nanoGPT\venv\lib\site-packages\datasets\streaming.py", line 69, in wrapper
    return function(*args, use_auth_token=use_auth_token, **kwargs)
  File "C:\Users\emili\Desktop\nanoGPT\venv\lib\site-packages\datasets\download\streaming_download_manager.py", line 445, in xopen
    return open(main_hop, mode, *args, **kwargs)
OSError: [Errno 22] Invalid argument: 'C:\\Users\\emili\\.cache\\huggingface\\datasets\\downloads\\extracted\\85b7a70ee547a4372aa7cf8fab0e93cd8849e09e1cba8454c1d113746400e918\\0180327-a95f1342cd685fb7d22805aa720870d2.txt'    

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\Users\emili\Desktop\nanoGPT\data\openwebtext\prepare.py", line 15, in <module>
    dataset = load_dataset("openwebtext")
  File "C:\Users\emili\Desktop\nanoGPT\venv\lib\site-packages\datasets\load.py", line 1757, in load_dataset
    builder_instance.download_and_prepare(
  File "C:\Users\emili\Desktop\nanoGPT\venv\lib\site-packages\datasets\builder.py", line 860, in download_and_prepare
    self._download_and_prepare(
  File "C:\Users\emili\Desktop\nanoGPT\venv\lib\site-packages\datasets\builder.py", line 1611, in _download_and_prepare
    super()._download_and_prepare(
  File "C:\Users\emili\Desktop\nanoGPT\venv\lib\site-packages\datasets\builder.py", line 953, in _download_and_prepare
    self._prepare_split(split_generator, **prepare_split_kwargs)
  File "C:\Users\emili\Desktop\nanoGPT\venv\lib\site-packages\datasets\builder.py", line 1449, in _prepare_split
    for job_id, done, content in self._prepare_split_single(
  File "C:\Users\emili\Desktop\nanoGPT\venv\lib\site-packages\datasets\builder.py", line 1606, in _prepare_split_single
    raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.builder.DatasetGenerationError: An error occurred while generating the dataset

Error when using Pytorch 2.0 (Compile=False)

Run train.py using "torchrun --standalone --nproc_per_node=1 train.py --dataset=shakespeare --dtype=float32 --batch_size=8 --compile=True" and got the following error:

master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified.
Traceback (most recent call last):
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 1087, in run_node
c return node.target(*args, **kwargs)
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py", line 916, in torch_dispatch
r = func(*args, **kwargs)
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/_ops.py", line 284, in call
return self._op(*args, **kwargs or {})
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/_prims_common/wrappers.py", line 209, in _fn
result = fn(*args, **kwargs)
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/_prims_common/wrappers.py", line 119, in _fn
result = fn(**bound.arguments)
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/_refs/init.py", line 974, in add
return prims.add(a, b)
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/_ops.py", line 284, in call
return self._op(*args, **kwargs or {})
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/_prims/init.py", line 349, in elementwise_meta
shape = utils.extract_shape(*args
, allow_cpu_scalar_tensors=True)
AttributeError: module 'torch._prims.utils' has no attribute 'extract_shape'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 1046, in get_fake_value
return wrap_fake_exception(
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 721, in wrap_fake_exception
return fn()
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 1047, in
lambda: run_node(tx.output, node, args, kwargs, nnmodule)
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 1099, in run_node
raise RuntimeError(
RuntimeError: Failed running call_function (*(FakeTensor(FakeTensor(..., device='meta', size=(8, 1024, 768)), cuda:0), FakeTensor(FakeTensor(..., device='meta', size=(1, 1024, 768)), cuda:0)), **{}):
module 'torch._prims.utils' has no attribute 'extract_shape'
(scroll up for backtrace)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/p/work/liux3790/nanoGPT/train.py", line 241, in
losses = estimate_loss()
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/p/work/liux3790/nanoGPT/train.py", line 202, in estimate_loss
logits, loss = model(X, Y)
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1488, in _call_impl
return forward_call(*args, **kwargs)
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1153, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1106, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0]) # type: ignore[index]
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1488, in _call_impl
return forward_call(*args, **kwargs)
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 83, in forward
return self.dynamo_ctx(self._orig_mod.forward)(*args, **kwargs)
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 212, in _fn
return fn(*args, **kwargs)
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 330, in catch_errors
return hijacked_callback(frame, cache_size, hooks)
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 480, in _convert_frame
result = inner_convert(frame, cache_size, hooks)
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 103, in _fn
return fn(*args, **kwargs)
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 94, in time_wrapper
r = func(*args, **kwargs)
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 339, in _convert_frame_assert
return _compile(
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 400, in _compile
out_code = transform_code_object(code, transform)
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 341, in transform_code_object
transformations(instructions, code_options)
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 387, in transform
tracer.run()
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1692, in run
super().run()
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 538, in run
and self.step()
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 501, in step
getattr(self, inst.opname)(inst)
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 137, in impl
self.push(fn_var.call_function(self, self.popn(nargs), {}))
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/_dynamo/variables/builtin.py", line 288, in call_function
return wrap_fx_proxy(tx, proxy, **options)
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 752, in wrap_fx_proxy
return wrap_fx_proxy_cls(
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 787, in wrap_fx_proxy_cls
example_value = get_fake_value(proxy.node, tx)
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 1066, in get_fake_value
raise TorchRuntimeError() from e
torch._dynamo.exc.TorchRuntimeError:

from user code:
File "/p/work/liux3790/nanoGPT/model.py", line 137, in forward
x = self.transformer.drop(tok_emb + pos_emb)

Set torch._dynamo.config.verbose=True for more information

You can suppress this exception and fall back to eager by setting:
torch._dynamo.config.suppress_errors = True

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 266712) of binary: /p/home/liux3790/miniconda3/bin/python
Traceback (most recent call last):
File "/p/home/liux3790/miniconda3/bin/torchrun", line 8, in
sys.exit(main())
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/distributed/run.py", line 779, in main
run(args)
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/distributed/run.py", line 770, in run
elastic_launch(
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/p/home/liux3790/miniconda3/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

train.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2023-01-17_21:51:28
host : nid06029.cm.cluster
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 266712)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

what is the main speed up trick for nanoGPT?

question

curiosity, does someone know what the main trick to having nano GPT2 train so quickly? (from the 1 week I was used to to 1day it seems) https://github.com/karpathy/nanoGPT After a discussion it seems its only a smaller batch size...then it seems to achieve the same val loss quicker due to this. Is this the main trick really? Also doesn't a smaller batch size give us more uncertainty on the loss function? How do we know the two model truly perform the same? e.g. see confidence intervals on means depends on sqrt of N.

cross: https://www.quora.com/unanswered/What-is-the-main-speed-up-trick-s-for-NanoGPT-from-Andrej-Karpathy
cross2: https://www.reddit.com/r/learnmachinelearning/comments/10w84m4/what_is_the_main_speedup_tricks_for_nanogpt_from/

related3: https://ai.stackexchange.com/questions/39186/why-do-llms-need-massive-distributed-training-across-nodes-if-the-models-fit

Issue with running prepare.py

I received the following error while running python prepare.py:

`Traceback (most recent call last):
File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\builder.py", line 1570, in _prepare_split_single
for key, record in generator:
File "C:\Users\fresh.cache\huggingface\modules\datasets_modules\datasets\openwebtext\85b3ae7051d2d72e7c5fdf6dfb462603aaa26e9ed506202bf3a24d261c6c40a1\openwebtext.py", line 85,
in _generate_examples
with open(filepath, encoding="utf-8") as f:
File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\streaming.py", line 69, in wrapper
return function(*args, use_auth_token=use_auth_token, **kwargs)
File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\download\streaming_download_manager.py", line 445, in xopen
return open(main_hop, mode, *args, **kwargs)
OSError: [Errno 22] Invalid argument: 'C:\Users\fresh\.cache\huggingface\datasets\downloads\extracted\f03a89c11b1133c3973ac7aed71b6be5c62feb33c5ec06cffb06511974f7194e\001
5896-b1054262f7da52a0518521e29c8e352c.txt'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\Users\fresh\Downloads\nanoGPT\data\openwebtext\prepare.py", line 14, in
dataset = load_dataset("openwebtext")
File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\load.py", line 1757, in load_dataset
builder_instance.download_and_prepare(
File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\builder.py", line 860, in download_and_prepare
self._download_and_prepare(
File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\builder.py", line 1611, in _download_and_prepare
super()._download_and_prepare(
File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\builder.py", line 953, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\builder.py", line 1449, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\builder.py", line 1606, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.builder.DatasetGenerationError: An error occurred while generating the dataset
`

Make wandb training logs public

Hello, again Andrej,

Would you mind making the training logs public so we can follow your progress in reproducing GPT2?

You can do this by clicking on the lock on your W&B workspace:
make_public
and maybe drop a link on the readme.

Thanks =)

Finetune code translation tasks

Is this repository kinda ready to finetune code translation tasks? E.g I’d like to explore some ideas to convert figma files to framework specific code.

new to LLM, any advise to solve such task is very welcomed.

Another thank you

Andrej, I taught myself most of what I know about ML by copying your code, trying to understand every line, and then hacking it into something new of my own.

This repo is just one more gift to humanity. Thank you very much! I admire you! Close this issue if you like :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.