kmeng01 / memit Goto Github PK
View Code? Open in Web Editor NEWMass-editing thousands of facts into a transformer memory (ICLR 2023)
Home Page: https://memit.baulab.info
License: MIT License
Mass-editing thousands of facts into a transformer memory (ICLR 2023)
Home Page: https://memit.baulab.info
License: MIT License
What is the difference between muti-counterfact and counterfact dataset? Which one should I choose to rebuild the result in the paper?
Is it normal for the edit to stop optimizing after first step?
Command:
python -m experiments.evaluate —alg_name=MEMIT —model_name="EleutherAI/gpt-j-6B" —hparams_fname=EleutherAI_gpt-j-6B.json —ds_name=cf —dataset_size_limit=1000 —num_edits=1
Example:
Tying optimization objective to 27
Recording initial value of v*
loss 10.159 = 10.159 + 0.0 + 0.0 avg prob of [ bishop] 0.00010042281064670533
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
Init norm 67.62352752685547 | Delta norm 31.999399185180664 | Target norm 75.150146484375```
It seems like the NLL loss could be further optimized?
Dear authors:
I read two of your articles on knowledge editing and benefited a lot.
Sorry to bother you, I have two questions I want to make sure of with you:
MEMIT's Equation 9, why specifically split into two items, looks like it can be combined into [1,u]. Does the specific split [1,n] and [n+1,u] have any special meanings?
ROME looks like it can also perform batch editing? The original paper is to calculate a set of [k*, v*] and then update W_proj. But what if I calculate [k*_1, k*_2...k*_n] with [v*_1, v*_2...v*_n], and then update W_proj? In this way, ROME can also be batch edited?
Looking forward to your reply, thank you
[1] Locating and Editing Factual Associations in GPT
[2] MASS-EDITING MEMORY IN A TRANSFORMER
[Issue desc]
I've try to run MEMIT on the machine with 32GB GPU * 2 (Tesla V100), under the 10k edit number.
However, it fails, due to the large weight size(GPT-J-6B) and the data(even in small batch_size) could not be placed into single GPU instance.
[Comment for Emhancement]
I would like to ask if there is an implementation on the support for Multi-GPU on MEMIT?
I mean the model could be placed in single one GPU, and every batches of data wich leveraged by model could be placed in another GPU. Overall, a Non-distributed 2 GPU(or more) training implementation.
Hi authors,
MEMIT is an interesting work.
When I run: python -m experiments.evaluate --alg_name=MEMIT --model_name=EleutherAI/gpt-j-6B --hparams_fname=EleutherAI_gpt-j-6B.json --num_edits=10 --use_cache
There is error about CUDA out of memory:
File "\memit-main\memit\memit_main.py", line 97, in
weights_copy = {k: v.detach().clone() for k, v in weights.items()}
RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 24.00 GiB total capacity; 23.15 GiB already allocated; 0 bytes free; 23.16 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
My local is 24GB 3090 GPU.
Would you help me how to run the MEMIT code ? How to revise the configure file (EleutherAI_gpt-j-6B.json) , in order to reduce memory?
Thank you very much.
MEMIT is very nice.
Hey,
Thanks for sharing your work!
I have a question about how you chose to spread the residual across the remaining layers at each update step (Eq. 20).
You chose the updated values as:
M' = M + residual / (L - l + 1)
claiming it spreads the residual equally across the updated layers, but actually if there are 4 updates layers:
the first layer will provide 1/4 of the residual,
the second layer will provide 1/12 (=1/3 - 1/4) of the residual,
the third layer will provide 1/6 (=1/2-1/3) of the residual,
and the fourth layer will provide 1/2 (=1-1/2) of the residual.
Shouldn't the correct update be:
M' = M + residual * (l - first_edited_layer + 1) / (L - first_edited_layer + 1)?
Thanks
Great Job. I run some code of this repo. After casual trace I dont know which layer is import for knowledge storage. Is that only heuristic?
When I use the “-- conserve_memory” parameter, the following error occurs. It seems that this method does not have this parameter.
Traceback (most recent call last):
File "/root/.local/conda/envs/memit/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/.local/conda/envs/memit/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/root/code/memit/experiments/evaluate.py", line 301, in
main(
File "/root/code/memit/experiments/evaluate.py", line 148, in main
edited_model, weights_copy = apply_algo(
TypeError: apply_memit_to_model() got an unexpected keyword argument 'return_orig_weights_device'
what is the difference between multicounterfact and counterfact?
When locally running memit.ipynb, an error occurs as follows:
Retrieving covariance statistics for EleutherAI_gpt-j-6B @ transformer.h.3.mlp.fc_out.
Attempting to download EleutherAI_gpt-j-6B/wikipedia_stats/transformer.h.3.mlp.fc_out_float32_mom2_100000.npz from https://memit.baulab.info/data/stats/EleutherAI_gpt-j-6B/wikipedia_stats/transformer.h.3.mlp.fc_out_float32_mom2_100000.npz.
Unable to download due to [Errno 17] File exists: 'data'. Computing locally....
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
notebooks/memit/memit_main.py:44, in apply_memit_to_model(model, tok, requests, hparams, copy, return_orig_weights, cache_template)
...
461 )
463 # if not using an existing config, then create a new config on the fly
464 if not builder_config:
ValueError: BuilderConfig 20200501.en not found. Available: ['20220301.aa',...
The error seems to occur because the data
folder in the root directory is missing, which could be because Git ignores empty folders.
To resolve the issue, simply add an empty data
folder to the root directory. This should allow the script to run without encountering the "ValueError".
I sincerely hope you could tell me how to handle this error?
Traceback (most recent call last):
File "/home/lyn/miniconda3/envs/memit/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/lyn/miniconda3/envs/memit/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/lyn/memit/experiments/evaluate.py", line 299, in
main(
File "/home/lyn/memit/experiments/evaluate.py", line 146, in main
edited_model, weights_copy = apply_algo(
File "/home/lyn/memit/memit/memit_main.py", line 44, in apply_memit_to_model
deltas = execute_memit(model, tok, requests, hparams, cache_template=cache_template)
File "/home/lyn/memit/memit/memit_main.py", line 196, in execute_memit
adj_k = torch.linalg.solve(
torch._C._LinAlgError: linalg.solve: The diagonal element 2 is zero, the solve could not be completed because the input matrix is singular.
I just run with this command:
CUDA_VISIBLE_DEVICES=2 python3 -m experiments.evaluate --alg_name=MEMIT --model_name=/home/lyn/EleutherAI/gpt-j-6B --hparams_fname=EleutherAI_gpt-j-6B.json --num_edits=1
the paper uses an A6000 GPU with 48GB of RAM but the GPU in my workstation I have 4 A5500 with 24GB of RAM. Can I use the method suggested in the paper by separating out the model editing and model running. Or is there a way for me to run it parallel between my GPUs? My current idea to use this library called transformer-utils that uses a smaller model. I'm getting the message that I'm running out storage when running the model editing
I understand this was written for autoregressive models, but do you think you could apply it to the decoder portion of an encoder to dcoder module? Like, if the input is "What is the capital of France?" and the decoder outputs "The capital of France is" and you change the appropriate MLP to point to Rome? with the subject being France?
When reading the ROME paper, I found this sample from CounterFact:
{
"case_id": 1067,
"pararel_idx": 21136,
"requested_rewrite": {
"prompt": "The headquarters of {} is in",
"entity": "Gazi University",
"relation_id": "P159",
"target_new": {"str": "Glasgow", "id": "Q4093"},
"target_true": {"str": "Ankara", "id": "Q3640"},
},
"paraphrase_prompts": [
"The headquarter of Gazi University is located in",
"Gazi University is headquartered in",
],
"neighborhood_prompts": [
"The headquarter of TRT Haber is located in",
"Agricultural Bank is headquartered in",
"TRT Avaz is based in",
"AnadoluJet's headquarters are in",
"The headquarters of National Intelligence Organization is in",
"The headquarter of MKE Ankaragücü is in",
"The headquarters of Agricultural Bank is in",
"The headquarter of Turkish Red Crescent is located in",
"Turkish Historical Society is headquartered in",
"Gençlerbirliği S.K. is headquartered in",
],
"generation_prompts": [
"Gazi University's headquarters is surrounded by",
"Gazi University's headquarters is surrounded by",
"The headquarters of Gazi University is surrounded by restaurants including",
"The headquarters of Gazi University is surrounded by restaurants including",
"The headquarters of Gazi University is surrounded by restaurants including",
"Gazi University's headquarters is surrounded by",
"The headquarters of Gazi University is surrounded by restaurants including",
"One can get to Gazi University's headquarters by navigating",
"One can get to Gazi University's headquarters by navigating",
"One can get to Gazi University's headquarters by navigating",
],
}
But the actual dataset that can be found here (https://memit.baulab.info/data/dsets/counterfact.json) has a different format for paraphrase prompts. Here is an example (I'll put only the paraphrase prompts):
{
...
"paraphrase_prompts": [
"Shayna does this and Yossel goes still and dies. Danielle Darrieux, a native",
"An album was recorded for Capitol Nashville but never released. Danielle Darrieux spoke the language"
],
...
}
We notice the apparently random sentences at the start of each paraphrase prompt. The code does not seem to filter these prefixes.
If this is not an error, why there is this difference ? And what is its impact on the evaluation procedure ?
I'm trying to replicate MEMIT on GPTJ-6B, and I'm getting the following error (just on the first request/prompt example in your memit.ipynb notebook):
Traceback (most recent call last):
File "notebooks/my_new_file_copying_your_interactive_notebook.py", line 58, in <module>
model_new, orig_weights = demo_model_editing(
File "/memit/notebooks/experiments/py/demo.py", line 50, in demo_model_editing
model_new, orig_weights = apply_method(
File "/memit/notebooks/memit/memit_main.py", line 44, in apply_memit_to_model
deltas = execute_memit(model, tok, requests, hparams, cache_template=cache_template)
File "/memit/notebooks/memit/memit_main.py", line 160, in execute_memit
cur_zs = get_module_input_output_at_words(
File "/memit/notebooks/memit/compute_z.py", line 212, in get_module_input_output_at_words
l_input, l_output = repr_tools.get_reprs_at_word_tokens(
File "/memit/notebooks/rome/repr_tools.py", line 32, in get_reprs_at_word_tokens
return get_reprs_at_idxs(
File "/memit/notebooks/rome/repr_tools.py", line 150, in get_reprs_at_idxs
_process(tr.input, batch_idxs, "in")
File "/memit/notebooks/rome/repr_tools.py", line 131, in _process
cur_repr = cur_repr[0] if type(cur_repr) is tuple else cur_repr
IndexError: tuple index out of range
I also tried running the non-notebook version via experiments.evaluate
and got the same exact error. Doing some debugging printouts there revealed that this error occurred on calling _process
for the input, and the tuple cur_repr
was an empty tuple, with a batch_idxs
value of [[7]]
. Thus, I'm unable to apply MEMIT and move forward with the evaluation. Is anyone else running into this issue, and if so how were you able to resolve it?
Could this process be used on Llama 3 8b? I find that "Llama 3 is an auto-regressive language model that uses an optimized transformer architecture."
I have a task to update the knowledge in Llama 3 for the Phaser game engine. It currently gives mixed responses using both Phaser 2 and Phaser 3 code examples. We'd like it to either learn the difference or forget everything it 'knows' about Phaser 2 (and 3 if necessary - I have an effective RAG to provide Phaser 3 knowledge in this case).
(Llama 3 is just a 'for instance', we're not locked into any single model at the moment. I'm about to start trying a variety of them to see if there's something more suited to this task).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.