Comments (15)
I also have an unrelated question. Why is WEIGHT_INIT_SAMPLES=0
in the all_exp3.sh
file? The other files have it set to 1000
from flad.
Another error I encountered is when I try to run multirun_create_weight_inits.py
. I get an error on this line which says that weight_init_only
is not a valid attribute. To fix this I just commented it out as I noticed that it is indeed not an attribute of the class but then I get an error here which says AttributeError: 'DataParallel' object has no attribute 'name_or_path'
. To fix this error I changed the line to model_name = model.module.name_or_path.replace("/","-")
After all these changes I've been stuck here for quite a while -
The progress bar doesn't seem to be running and the end time estimate is not populated
Do let me know if you have any suggestions to fix these in a different way?
EDIT: I checked in after a few hours and it seems to be running (The ETA is 50ish hours for T0Mix alongside the 3B Model)
from flad.
I also have an unrelated question. Why is
WEIGHT_INIT_SAMPLES=0
in theall_exp3.sh
file? The other files have it set to 1000
WEIGHT_INIT_SAMPLES is only used for the UCB1 algorithm which has an explicit reward initialization phase. See algorithms 1 and 2 from the paper (https://arxiv.org/pdf/2302.00674.pdf)
In reality, you could also initialize the rewards for EXP3, but we didn't for our experiments.
from flad.
Another error I encountered is when I try to run
multirun_create_weight_inits.py
. I get an error on this line which says thatweight_init_only
is not a valid attribute. To fix this I just commented it out as I noticed that it is indeed not an attribute of the class but then I get an error here which saysAttributeError: 'DataParallel' object has no attribute 'name_or_path'
. To fix this error I changed the line tomodel_name = model.module.name_or_path.replace("/","-")
After all these changes I've been stuck here for quite a while -
The progress bar doesn't seem to be running and the end time estimate is not populatedDo let me know if you have any suggestions to fix these in a different way?
EDIT: I checked in after a few hours and it seems to be running (The ETA is 50ish hours for T0Mix alongside the 3B Model)
Regarding the weight initialization. The script currently is set to run over 2 quantities of weight initialization samples (this line) and 5 different seeds (this line).
You can reduce those to just use a single weight initialization samples, and a single seed which will significantly speed up the initialization.
Thanks for pointing this out, I think I know what the problem is. At one point, I moved the weight initialization into the trainer class (here), but didn't make it compatible with the multirun_train_mixed script.
For now, the solution is to compute gradients prior to the Exploit-only method, as you're currently doing. And I will add that information into the instructions. Thank you for finding this bug!
Let me know if it still doesn't work for some reason after pre-computing the gradients.
from flad.
Thanks for the reference, I'll check it out!
For the alignment computation, I did limit it to only one seed and one weight initialization sample value but it's still taking around 40 hours on a A100 (80 GB). Also does the program write any intermediate outputs? It did create a directory but it's still empty (around 30 hours have passed). Just wanted to confirm the same
from flad.
Update: It did not work. I did not save all logs in a text file, in hindsight I should have done that but this is the only log which is still there on the terminal
Not quite sure what's wrong though
I ran this command - python3 src/multirun_create_weight_inits.py --target_dataset $TARGET_DATASET --auxiliary_dataset $AUXILIARY_DATASET
Also there is this folder which was created but is empty - FLAD/outputs/weight_inits/T5_LM_3B/T0Mixture/copa/42/1000
from flad.
Okay, I've made a few fixes and been able to run the all_exploit.sh
script and the multirun_create_weight_inits.py
script.
Try pulling the newest version of the code base and let me know if you can run all_exploit.sh
and multirun_create_weight_inits.py
from flad.
Thank you for the quick response! Just to confirm should I still be running the gradient alignment computations first? or can they be run in parallel now?
from flad.
Also, I did catch this error once: AttributeError: 'DataParallel' object has no attribute 'name_or_path'
It has to do with how the model was initialized by huggingface. However, after my bug fixes it's disappeared for me. In case it's still there for you, let me know and I'll make changes for that as well.
The solution that I found for that is to change lines 1639-1640 from:
# Initialize weights if needed
if self.args.weight_initialization_samples > 0:
self._initialize_weights(train_dataloader, target_dataloader, model)
to
# Initialize weights if needed
if self.args.weight_initialization_samples > 0:
if hasattr(model, "name_or_path"):
self._initialize_weights(train_dataloader, target_dataloader, model)
else:
self._initialize_weights(train_dataloader, target_dataloader, self.model)
I'm hesitant to make that change unless it's required though, because that will affect both the EXP3 and UCB1 trainers as well.
from flad.
Thank you for the quick response! Just to confirm should I still be running the gradient alignment computations first? or can they be run in parallel now?
The all_exploit.sh
script should take care of the alignment computations
from flad.
Got it!
I'll rerun it and update you with the outcome 🙌
from flad.
Got it! I'll rerun it and update you with the outcome 🙌
Sorry for so many back and forths. When I was debugging the all_exploit.sh
script, it was running so I assumed it was also calculating the alignment, but it actually doesn't. So, then I went back to compute the alignments with multirun_create_weight_inits.py
and I realized why it was taking such a long time. The weight_init_only flag actually is important and I must have removed it's use at some point, so I've added it back in.
Now I've successfully been able to precompute the alignments, and then train the exploit-only model.
To replicate our experiments you do need to run multirun_create_weight_inits.py
first, then you can run the all_exploit.sh
script. Pull the newest update and let me know if that fixes it for you
from flad.
Thanks for all the fixes!
I've started the run. The ETA is just 20 mins however I did have to change the lines you mentioned above to fix the dataparallel error.
from flad.
Any update on this? Have you succeeded with the exploit-only baseline?
Have you been able to run either the EXP3 or UCB1 methods?
Just checking to see if you've found any other bugs
from flad.
Hey
So I did manage to run the exploit-only baseline after precomputing gradients. I could run EXP3 without it too so haven't retested it. I did not get time to rerun the UCB1 baseline. I'll update you incase I hit any issues there
In terms of issues the only one was that dataparallel error and adding the changes you suggested above fixed it
So closing this issue as well as everything seems to be fine now
Thanks for all the help!
from flad.
Related Issues (1)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flad.