This is the repository for PET.
- dgl 0.7.0
- pytorch 1.8.0
- tensorboard 2.7.0
- sklearn 0.22.1
- torchmetrics 0.7.3 (for efficient model evaluation, especially when using multi GPU)
- elasticsearch-dsl 7.4.0 (requires local elasticsearch lib)
- elasticsearch
Run following in a tmux session to establish an elasticsearch
daemon
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.17.1-linux-x86_64.tar.gz
tar -xf elasticsearch-7.17.1-linux-x86_64.tar.gz
cd elasticsearch-7.17.1/bin/
./elasticsearch
Note: The produced retrieve pool has an order corresponding to the shuffled dataset. If you re-split or re-shuffle the dataset, you will need to run retrieve codes again.
Under the utils
folder, to produce retrieve pool of DATASET with size k, run
python insert_es.py --dataset DATASET
python pre_search_new.py --dataset DATASET --ret-size k
Under the test folder:
For CTR task, run:
python run_PET_sequential.py --dataset tmall --in_size 16 --lr 5e-4 --wd 1e-4 --batch_size 100
python run_PET_sequential.py --dataset taobao --in_size 16 --lr 1e-4 --wd 5e-4 --batch_size 200
python run_PET_sequential.py --dataset alipay --in_size 32 --lr 5e-4 --wd 5e-4 --batch_size 100
For top-N recommendation task:
python run_PET_rec.py --dataset ml-1m --batch_size 100
python run_PET_rec.py --dataset lastfm --batch_size 500