Hi, I tried your BBTv2 code but failed to get comparable results as reported in your paper.
python deepbbt.py --model_name "roberta-large" --task_name "snli" --n_prompt_tokens 50 --intrinsic_dim 500 --k_shot 16 --device "cuda:0" --seed 42 --loss_type "ce" --cat_or_add "add" --random_proj "normal" --sigma1 1 --sigma2 0.2 --popsize 20 --bound 0 --budget 8000 --print_every 50 --eval_every 100
Done. Elapsed time: 39.49383888641993 (mins)
Evaluate on test data...
Evaluate data in 75.54 seconds!
[tester]
SNLIMetric: acc=0.5509975570032574, hinge=2.8394456026220167, ce=11.656479801339513
Test acc: 0.551
which is higher than other gradient-free baselines but much smaller that the number reported in your paper (60.62).
I'm wondering why. Do I need to tune the random seed?