openai / grok Goto Github PK

License: MIT License

Python 98.16% Shell 0.45% Jupyter Notebook 1.39%

grok's Introduction

OpenAI Grok Curve Experiments

Paper

This is the code for the paper Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets by Alethea Power, Yuri Burda, Harri Edwards, Igor Babuschkin, and Vedant Misra

Installation and Training

pip install -e .
./scripts/train.py

grok's People

Contributors

Stargazers

Watchers

Forkers

arpitvaghela classicvalues ejmichaud tomfrederik ricklentz picemily mueller-mp langosco rohan-sachan leogrin mbalesni sunil-s przecze pranavgade20 niklasnolte aman56 tikquuss yuluqinn isabella232 iyaja linnetfire shyran-systems bhairavmehta95 ayoubjadouli lmcdonough a-why-not-fork-repositories-good-luck evoliaprotocol joolstorrentecalo goompean standardgalactic apollohuang1 ghas-results seanpm2001 zhiweixx ghas-results cybaj emblaran w0lramd andreslavescu jason9267 shbfy okoge-kaz 139demajia iamromec webuti fentime samedovzaur1 lyuji282 adriandittmann eltociear ducva daelyte ahmedseragcodes temic137 ovniroto dj1211tin ehzawad gmh5225 yuniancong winscat thefruxz tinghao-ai iamysk freemandaily xushilundao davidneko misterye samurait xudbin xiake-chen manakanemu if001 shreyaslax trbozo zeroxclem seencxy shengzing damlayasarr yibit kmmao cpp233 vidalnt washgo sjy kleffy kylepoops alysson-azevedo yannit0 qaynam sugatoray jrc-dev shahrukhx01 oreml chengkangzai suhasbhairav cephren nathaniel-nyanor som3one0 timothygachengo dushmis

grok's Issues

Why is it impossible for python devs to add the python version to the docs? Let's discuss

... for those of us that use pyenv or conda to avoid dependency issues

grok devs: "works for me"

Attribute error when running train.py

As the in title, I'm running into an AttributeError: can't set attribute when running train.py, is it because I'm using Python 3.8 or something is off with the passing of hparams?

The label is in the input

Thank you for sharing your implementation.

If I understood correctly, the toy example in this paper is to train a network (Transformer) to solve equation:

$$a \circ b = c$$

given $a$, $b$ as inputs, predicting the correct $c$.

To translate it for the Transformer, we tokenize everything, and add end-of-sentence <|EOS|> token in the following fashion (which is suggested by this code).

<|EOS|> <a> <OP>  <=> <c> <|EOS|>

where <a>  and <c> are integers.

By design, we may use <|EOS|> <a> <OP>  <=> <?> <|EOS|> as the input, where <?> is a placeholder token for the solution to the equation. The output of the Transformer can be the predicted equation: <|EOS|> <a> <OP>  <=> <c_> <|EOS|>, where <c_> indicates the predicted token c. And the target should be the correct, full equation: <|EOS|> <a> <OP>  <=> <c> <|EOS|>. We then calculate the loss base on the second-to-the-last tokens: <c>, <c_>.

However, in this implementation, the input is the first 6 tokens, i.e. <|EOS|> <a> <OP>  <=> <c>, while the target is the last 6 tokens, i.e. <a> <OP>  <=> <c> <|EOS|>. The attached figure shows the first batch of x (input) and y (target) obtained in debugging form follow position

https://github.com/openai/grok/blob/43efed280af24a8837b05fd9c97a3d14f295666f/grok/training.py#L292C1-L293C63

In the figure above, 0 indicating <|EOS|> token, 1 indicating '<=>' token, 6 indicating the '**2+' operation (which is a conditional equation depending on odd or even <a>)

The problem is the solution is already in the input x. Therefore, I think the model is trained on a wrong task.

https://github.com/openai/grok.git

questions regarding modulus division

Thanks for the very interesting paper! I have two questions regarding modular division:

Does figure 1 of the paper correspond to this equation x◦y=x/y (mod p) for 0≤x<p, 0<y<p where p == 97?
Wouldn't x/y (mod p) produce fractional results? How would you get the cross-entropy loss (against these fractional targets) then?

I tried staring at the code but couldn't really connect the dots :(

hi, everyone, this is not Elon’s grok

Grok is here！

OpenAI open-sourced grok 3 years ago. lol.

Why openai opensource Grok? not XAI?

Did GPT-5 write Grok? 🤣

Perfect troll OpenAI. 👌🏽

Stop this

This is so ...

So Elon stole the name "Grok" from OpenAI? 👀

Elon goes bad everyday 😬🙄

(Look at the commit history)

Where is @ilyasut? Proof of life

@ilyasut didn't exit(1) himself.

Unable to initialize backend 'tpu': UNIMPLEMENTED: LoadPjrtPlugin is not implemented on windows yet. (set JAX_PLATFORMS='' to automatically choose an available backend)

How to fix that?

Grok is actually llama!

I've always wondered least amount of work went into grok.

Just a finetune of llama 7bn model

what say

.......

[Bug] Function definition missing in data.py

The make_data.py file in scripts invokes create_data_files(args.data_directory) where default directory as setup is data.

The function invokes

ArithmeticTokenizer.create_token_file(data_dir)
ArithmeticDataset.create_dataset_files(data_dir)

both of which are not defined in the corresponding class. This breaks the code when running for multiple experiments.

1) what? 😂

This is the troll that got me to join Github 🤣

Tasks

Beta Give feedback

No tasks being tracked yet.

Options

What is this 3yrs old of a repo

蹲

呜呼这又是何物啊

的

Is this a good time to buy Tesla stock?

🫨 Code for Grokking paper from 2022? What?

What! 🫨

Setup package versions

Hello,

Can you include which versions of the libraries used in this project are compatible with it? Someone has an open pull request for issues with pytorch-lightning but even fixing that, there's still issues on my version of torch. Would be helpful to know what was used to reproduce results.

Why not xAI Grok？

It is not xAI Grok，hhhhh

Non-working code from OpenAI

Why is this bug happening?

(myenv) shyamaluser@Shyamals-iMac grok % ./scripts/train.py 
Namespace(random_seed=-1, gpu=0, max_epochs=None, max_steps=100000, batchsize=0, n_layers=2, n_heads=4, d_model=128, dropout=0.0, weight_noise=0.0, non_linearity='relu', max_context_len=50, math_operator='+', operand_length=None, train_data_pct=5, warmup_steps=10, anneal_lr_steps=100000, anneal_lr=False, max_lr=0.001, weight_decay=0, weight_decay_kind='to_zero', noise_factor=0, save_activations=False, save_outputs=False, logdir='/Users/shyamaluser/grok', datadir='/Users/shyamaluser/grok/data')
Traceback (most recent call last):
  File "/Users/shyamaluser/grok/./scripts/train.py", line 14, in <module>
    print(grok.training.train(hparams))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/shyamaluser/grok/grok/training.py", line 703, in train
    model = TrainableTransformer(hparams).float()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/shyamaluser/grok/grok/training.py", line 50, in __init__
    self.hparams = hparams  # type: ignore
    ^^^^^^^^^^^^
  File "/Users/shyamaluser/grok/myenv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in __setattr__
    super().__setattr__(name, value)
AttributeError: property 'hparams' of 'TrainableTransformer' object has no setter

I'm trying to run this but

Gotta set the functions correctly first, readme file would be a perfect choice to get it faster.

Grok Flatearther

Grok Grok Grok