This is the code for the paper Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets by Alethea Power, Yuri Burda, Harri Edwards, Igor Babuschkin, and Vedant Misra
pip install -e .
./scripts/train.py
License: MIT License
This is the code for the paper Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets by Alethea Power, Yuri Burda, Harri Edwards, Igor Babuschkin, and Vedant Misra
pip install -e .
./scripts/train.py
nvm
Thank you for sharing your implementation.
If I understood correctly, the toy example in this paper is to train a network (Transformer) to solve equation:
given
To translate it for the Transformer, we tokenize everything, and add end-of-sentence <|EOS|>
token in the following fashion (which is suggested by this code).
<|EOS|>
<a>
<OP>
<b>
<=>
<c>
<|EOS|>
where <a>
<b>
and <c>
are integers.
By design, we may use <|EOS|>
<a>
<OP>
<b>
<=>
<?>
<|EOS|>
as the input, where <?>
is a placeholder token for the solution to the equation. The output of the Transformer can be the predicted equation: <|EOS|>
<a>
<OP>
<b>
<=>
<c_>
<|EOS|>,
where <c_>
indicates the predicted token c. And the target should be the correct, full equation: <|EOS|>
<a>
<OP>
<b>
<=>
<c>
<|EOS|>.
We then calculate the loss base on the second-to-the-last tokens: <c>,
<c_>
.
However, in this implementation, the input is the first 6 tokens, i.e. <|EOS|>
<a>
<OP>
<b>
<=>
<c>
, while the target is the last 6 tokens, i.e. <a>
<OP>
<b>
<=>
<c>
<|EOS|>
. The attached figure shows the first batch of x
(input) and y
(target) obtained in debugging form follow position
In the figure above, 0 indicating <|EOS|>
token, 1 indicating '<=>' token, 6 indicating the '**2+' operation (which is a conditional equation depending on odd or even <a>
)
The problem is the solution is already in the input x. Therefore, I think the model is trained on a wrong task.
Thanks for the very interesting paper! I have two questions regarding modular division:
x◦y=x/y (mod p) for 0≤x<p, 0<y<p
where p == 97?x/y (mod p)
produce fractional results? How would you get the cross-entropy loss (against these fractional targets) then?I tried staring at the code but couldn't really connect the dots :(
OpenAI open-sourced grok 3 years ago. lol.
Perfect troll OpenAI. 👌🏽
This is so ...
Elon goes bad everyday 😬🙄
(Look at the commit history)
@ilyasut didn't exit(1) himself.
How to fix that?
I've always wondered least amount of work went into grok.
Just a finetune of llama 7bn model
what say
The make_data.py
file in scripts
invokes create_data_files(args.data_directory)
where default directory as setup is data
.
The function invokes
ArithmeticTokenizer.create_token_file(data_dir)
ArithmeticDataset.create_dataset_files(data_dir)
both of which are not defined in the corresponding class. This breaks the code when running for multiple experiments.
This is the troll that got me to join Github 🤣
What! 🫨
Hello,
Can you include which versions of the libraries used in this project are compatible with it? Someone has an open pull request for issues with pytorch-lightning
but even fixing that, there's still issues on my version of torch
. Would be helpful to know what was used to reproduce results.
It is not xAI Grok,hhhhh
Why is this bug happening?
(myenv) shyamaluser@Shyamals-iMac grok % ./scripts/train.py
Namespace(random_seed=-1, gpu=0, max_epochs=None, max_steps=100000, batchsize=0, n_layers=2, n_heads=4, d_model=128, dropout=0.0, weight_noise=0.0, non_linearity='relu', max_context_len=50, math_operator='+', operand_length=None, train_data_pct=5, warmup_steps=10, anneal_lr_steps=100000, anneal_lr=False, max_lr=0.001, weight_decay=0, weight_decay_kind='to_zero', noise_factor=0, save_activations=False, save_outputs=False, logdir='/Users/shyamaluser/grok', datadir='/Users/shyamaluser/grok/data')
Traceback (most recent call last):
File "/Users/shyamaluser/grok/./scripts/train.py", line 14, in <module>
print(grok.training.train(hparams))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/shyamaluser/grok/grok/training.py", line 703, in train
model = TrainableTransformer(hparams).float()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/shyamaluser/grok/grok/training.py", line 50, in __init__
self.hparams = hparams # type: ignore
^^^^^^^^^^^^
File "/Users/shyamaluser/grok/myenv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in __setattr__
super().__setattr__(name, value)
AttributeError: property 'hparams' of 'TrainableTransformer' object has no setter
Gotta set the functions correctly first, readme file would be a perfect choice to get it faster.
Grok Grok Grok
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.