georgetown-ir-lab / cedr Goto Github PK
View Code? Open in Web Editor NEWCode for CEDR: Contextualized Embeddings for Document Ranking, accepted at SIGIR 2019.
License: MIT License
Code for CEDR: Contextualized Embeddings for Document Ranking, accepted at SIGIR 2019.
License: MIT License
I can't find anything about the mean average precison of your new system (CEDR). Am I missing something or did you really not measured it? Since it's the most common evaluation metric in IR I wonder why you didn't even mention it in the paper.
Furthermore these resources could be relevant to you:
I want to ask if it's convenient to release the bert baseline model ckpt?
Thanks~
Sean, there is some peculiar tqdm setups in CEDR. Could we remove the leave=false and the ncols; perhaps set the units and some descriptions?
If you agree, I can submit a PR.
We'd like to make use of the more generic transformers library. There is some migration information at https://huggingface.co/transformers/migration.html
We're trying to upgrade a BertRanker:
class VanillaBertTransformerRanker(BertRanker):
def __init__(self):
super().__init__()
self.tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
self.bert = AutoModel.from_pretrained('bert-base-uncased')
self.dropout = torch.nn.Dropout(0.1)
self.cls = torch.nn.Linear(self.BERT_SIZE, 1)
<snip>
for layer in result:
cls_output = layer[:, 0]
cls_result = []
for i in range(cls_output.shape[0] // BATCH):
cls_result.append(cls_output[i*BATCH:(i+1)*BATCH])
cls_result = torch.stack(cls_result, dim=2).mean(dim=2)
cls_results.append(cls_result)
We needed to do some casting in train.py, e.g.:
torch.tensor(record['query_tok']).to(torch.int64)
However, the shapes get a bit out of kilter in encode_bert: before the stack, we end up with size [4] rather than [4,768]
# vanilla_bert
# shape of first tensor: torch.Size([283, 768])
# shape of first result: torch.Size([8, 283, 768])
# shape of first cls_result: torch.Size([4, 768])
# shape of first cls_result before stack: torch.Size([4, 768])
# shape of cls_result before stack: 2
# shape of first cls_result after stack: torch.Size([768])
# shape of cls_result after stack: 4
# shape of first cls_result: torch.Size([4, 768])
# shape of first cls_result before stack: torch.Size([4, 768])
# shape of cls_result before stack: 2
# ...
# transformer bert
# shape of first tensor: torch.Size([283, 768])
# shape of first result: torch.Size([8, 283, 768])
# shape of first cls_result before stack: torch.Size([4, 768])
# shape of cls_result before stack: 2
# shape of first cls_result after stack: torch.Size([768])
# shape of cls_result after stack: 4
# shape of first cls_result before stack: torch.Size([4])
# shape of cls_result before stack: 2
# ---> error
Hi,
I use indri, run
awk '{print $3}' /data1/liushu/cedr-master/data/robust/*.run | python /data1/liushu/cedr-master/cedr/extract_docs_from_index.py indri /data1/liushu/cedr-master/index-robust04-20220531/ > /data1/liushu/cedr-master/data/robust/documents.tsv
But an error was reported
Traceback (most recent call last):
File "/data1/liushu/cedr-master/cedr/extract_docs_from_index.py", line 61, in <module>
main_cli()
File "/data1/liushu/cedr-master/cedr/extract_docs_from_index.py", line 49, in main_cli
doc_extractor = INDEX_MAP[args.index_type](args.index_path)
File "/data1/liushu/cedr-master/cedr/extract_docs_from_index.py", line 8, in indri_doc_extractor
index = pyndri.Index(path)
File "/home/ps/anaconda3/envs/aiwin/lib/python3.6/site-packages/pyndri/__init__.py", line 52, in __init__
super(Index, self).__init__(*args, **kwargs)
OSError: ../src/Parameters.cpp(469): Couldn't open parameter file 'index-robust04-20220531/manifest' for reading.
How to fix it?
Thank you very much!
I get the Robust04 dataset file as follow:
1.TREC-Disk-4.tar.gz
2.TREC-Disk-5.tar.gz
and while unzip those files, I get a lot of files named "LAL010289" which contains many documents with HTML labels(,).
Could you give me some advice on what should I do next?
Should I move the document files to the same file folder and install the Indri engine to index them?
Thank you very much!
I found that in every training epoch,16*32 training pairs are randomly selected from the training data. I wonder if the random choice would affect the model learning because the training data changed every epoch. Also, I wonder whether the small amount of training data would be effective in evaluation and test set.
Hi,
I have noticed that in rerank.py you call train.run_model giving 5 arguments. However train.run_model is only defined with 4arguments.
It seems like the out_path should be removed from this call.
Thanks for your work,
Kind regards.
I couldn't find some training pairs in qrels.
Also is eval data for robust used as training data?
I am wondering about detailed setting for the model(PACRR, KNRM, DRMM) with bert which is not finetuned at table 1 in https://arxiv.org/pdf/1904.07094.pdf.
Thanks for your great work!
some of the code use for validation is a bit difficult to read due to the use of setdefault() - would you accept a PR for replacing this with defaultdict?
Craig
When I try to train a Vanilla BERT model,using like this
python3 train.py \
--model vanilla_bert \
--datafiles data/wt/queries.tsv data/wt/document.tsv\
--qrels data/wt/qrels \
--train_pairs data/wt/train.wt12.pairs \
--valid_run data/wt/valid.wt12.run \
--model_out_dir models/vbert
But it does not have document.tsv file and --train_pairs data/ws/train_pairs
it has many train_pairs files like train.wt12.pairs or train.wt13.pairs which I need to set.Same question about --valid_run data/valid_run
Thanks a lot for your help
Because I want to run this code with other data sets, how can I get .run and .pair files similar to those in /data?
It's unclear that whether the title or the description of query is used for the model on Table1 performance on Robust04?
First of all, thanks a lot for your interesting work on CEDR and for the code in this repository.
I downloaded the Vanilla BERT and CEDR-KNRM checkpoints from #18 and checked the query ids in the .run
files contained in the downloaded archive. While the sets of query ids in cedrknrm-robust-f[1-5].run
match those in data/robust/f[1-5].test.run
, the sets of query ids in vbert-robust-f[1-5].run
do not match those in data/robust/f[1-5].test.run
(e.g. the set of query ids in vbert-robust-f1.run
is different from the set of query ids in data/robust/f1.test.run
, and also cedrknrm-robust-f1.run
).
Why are the folds for Vanilla BERT and CEDR-KNRM different? On which folds have the Vanilla BERT checkpoints been trained/validated? Given that the test folds of the Vanilla BERT and CEDR-KNRM checkpoints are different I assume that the provided Vanilla BERT checkpoints have not been used as initial weights for obtaining the provided CEDR-KNRM checkpoints. Is this assumption correct? If yes, which Vanilla BERT checkpoints have been used to initialize CEDR-KNRM training? Do you mind sharing these checkpoints too?
I'm currently investigate issues reproducing the results published in the paper. More on that in a separate ticket ...
Hello, have you ever run CEDR_KNRM on MSMARCO document ranking task?
I encountered some problems when I trained CEDR_KNRM initialized with the fine-tuned BERT (the performance almost no longer increases or even decreases). I wonder if it's because the training settings on robust are not suitable for MARCO?
Look forward to some empirical guidance. Thank you.
Thanks your wonderful work CEDR, it helps to understand the document ranking task a lot!
I tried to use this repo and followed the instruction to train a vanilla_bert with 1-fold and 2-fold data on the Robust04 dataset. However, I obtained the NDCG@20 result as:
vanilla_bert_1fold :0.40704 vanilla_bert_2fold: 0.45290
it is much different from your release checkpoints in #18 :
vanilla_bert_1fold: 0.42185 vanilla_bert_2fold: 0.47948
The hyperparameters are as follows:
Would you please share your setting of hyperparameters during fine-tuning the vanilla-bert ?
result = []
for item in items:
if len(item) < l:
item = [1. for _ in item] + ([0.] * (l - len(item)))
if len(item) >= l:
item = [1. for _ in item[:l]]
result.append(item)
in data.py, line 148.
This code causes the masks to be all equal to 1.0 ? Is there a problem here?
About running this instruction:
awk '{print $3}' data/robust/*.run | python extract_docs_from_index.py indri PATH_TO_INDRI_INDEX > data/robust/documents.tsv
I got a problem.
What's PATH_TO_INDRI_INDEX ?,should I modify any code in extract_docs_from_index.py?
I run this py file and the error is
Thanks for your great work!!
I couldn't find out how to build indri index. Could you tell me more about this? Thank you!
This is a follow-up on #21. I tried to reproduce the results on Robust 04 but failed to do so using the code in this repository. In the following I report my results on test fold f1
obtained in 3 experiments:
.run
files.When evaluating the provided cedrknrm-robust-f1.run
file in #18 with
bin/trec_eval -m P.20 data/robust/qrels cedrknrm-robust-f1.run
bin/gdeval.pl -k 20 data/robust/qrels cedrknrm-robust-f1.run
I'm getting P@20 = 0.4470 and nDCG@20 = 0.5177. When using a .run
file generated with the provided weights cedrknrm-robust-f1.p
python rerank.py --model cedr_knrm --datafiles data/robust/queries.tsv data/robust/documents.tsv \
--run data/robust/f1.test.run --model_weights cedrknrm-robust-f1.p --out_path cedrknrm-robust-f1.extra.run
bin/trec_eval -m P.20 data/robust/qrels cedrknrm-robust-f1.extra.run
bin/gdeval.pl -k 20 data/robust/qrels cedrknrm-robust-f1.extra.run
I'm getting P@20 = 0.4290 and nDCG@20 = 0.5038. I'd expect these metrics to be equal to those of the provided cedrknrm-robust-f1.run
file. What is the reason for this difference?
This is were I'm getting results that are far below the expected results (only for CEDR-KNRM, not for Vanilla BERT). I started by training and evaluating a Vanilla BERT ranker:
python train.py --model vanilla_bert --datafiles data/robust/queries.tsv data/robust/documents.tsv \
--qrels data/robust/qrels --train_pairs data/robust/f1.train.pairs --valid_run data/robust/f1.valid.run --model_out_dir trained_bert
python rerank.py --model vanilla_bert --datafiles data/robust/queries.tsv data/robust/documents.tsv \
--run data/robust/f1.test.run --model_weights trained_bert/weights.p --out_path trained_bert/test.run
bin/trec_eval -m P.20 data/robust/qrels trained_bert/test.run
bin/gdeval.pl -k 20 data/robust/qrels trained_bert/test.run
I'm getting P@20 = 0.3690 and nDCG@20 = 0.4231 which is consistent with evaluating the provided vbert-robust-f1.run
file:
bin/trec_eval -m P.20 data/robust/qrels vbert-robust-f1.run
bin/gdeval.pl -k 20 data/robust/qrels vbert-robust-f1.run
This gives P@20 = 0.3550 and nDCG@20 = 0.4219 which comes quite close. I understand that here I simply ignored the inconsistencies reported in #21 but it is at least coarse cross-check of model performance on a single fold. When training a CEDR-KNRM model with this BERT model as initialization
python train.py --model cedr_knrm --datafiles data/robust/queries.tsv data/robust/documents.tsv \
--qrels data/robust/qrels --train_pairs data/robust/f1.train.pairs --valid_run data/robust/f1.valid.run \
--initial_bert_weights trained_bert/weights.p --model_out_dir trained_cedr
python rerank.py --model cedr_knrm --datafiles data/robust/queries.tsv data/robust/documents.tsv \
--run data/robust/f1.test.run --model_weights trained_cedr/weights.p --out_path trained_cedr/test.run
bin/trec_eval -m P.20 data/robust/qrels trained_cedr/test.run
bin/gdeval.pl -k 20 data/robust/qrels trained_cedr/test.run
I'm getting P@20 = 0.3790 and nDCG@20 = 0.4347. This is slightly better than a Vanilla BERT ranker but far below the performance obtained in Experiment 1. I also repeated Experiment 2 with f1.test.run
, f1.valid.run
and f1.train.pairs
files that I generated myself from Anserini runs with a default BM25 configuration and still get results very close to those above.
Has anyone been able to get results similar to those as in Experiment 1 by training a BERT and CEDR-KNRM model as explained in the project's README?
vbert-robust-f1.p
weights as initialization to CEDR-KNRM trainingI made this experiment in an attempt to debug the performance gap found in the previous experiment. I'm fully aware that training and evaluating a CEDR-KNRM model on fold 1 (i.e. f1
) with the provided vbert-robust-f1.p
is invalid because of the inconsistencies reported in #21. This is because the folds used for training/validating/testing vbert-robust-f1.p
differ from those in data/robust/f[1-5]*
.
In other words, validation and evaluation of the trained CEDR-KNRM model is done with queries that have been used for training the provided vbert-robust-f1.p
. So this setup is using partially training data for evaluation which of course gives better evaluation results. I was surprised to see that with this invalid setup, I'm able to reproduce the numbers obtained in Experiment 1, or at least come very close. Here's what I did:
python train.py --model cedr_knrm --datafiles data/robust/queries.tsv data/robust/documents.tsv \
--qrels data/robust/qrels --train_pairs data/robust/f1.train.pairs --valid_run data/robust/f1.valid.run \
--initial_bert_weights vbert-robust-f1.p --model_out_dir trained_cedr_invalid
python rerank.py --model cedr_knrm --datafiles data/robust/queries.tsv data/robust/documents.tsv \
--run data/robust/f1.test.run --model_weights trained_cedr_invalid/weights.p --out_path trained_cedr_invalid/test.run
bin/trec_eval -m P.20 data/robust/qrels trained_cedr_invalid/test.run
bin/gdeval.pl -k 20 data/robust/qrels trained_cedr_invalid/test.run
With this setup I'm getting a CEDR-KNRM performance of P@20 = 0.4400 and nDCG@20 = 0.5050. Given these results and the inconsistencies reported in #21, I wonder if the performance of the cedrknrm-robust-f[1-5].run
checkpoints is the result of an invalid CEDR-KNRM training and evaluation setup or, more likely, if I did something wrong? Any hints appreciated!
Hi, I am intersted to your work. Could you release the codes about how to preprocess the raw TREC Robust04 dataset? I want to reproduce the results in your paper. However I failed, I think the problem must in the step of data-preprocessing. Appreciate for your help~
Could we make this less specific. I'm pretty sure it works with torch 1.4.
Would you consider using pytrec_eval for validation purposes, instead of trec_eval?
I think there are several advantages of doing so:
You don't have to ship/rely on trec_eval, which might not be complied for the correct platform. In contrast, I think that relying on pytrec_eval means that its compiled as appropriate.
This means that if we provide a "dataset" that contains labels as input, we dont need to have a separate qrels file.
You don't need to fork a separate process to run trec_eval, so it should be faster.
I try to run the extract_docs_from_index.py with this command and the index is pre-index provided by Pyserini:
awk '{print $3}' data/robust/*.run | python extract_docs_from_index.py lucene index-robust04-20191213/ > data/robust/documents.tsv
but I get an error:
and I do not change any code in the file.
my java version is:
openjdk version "1.8.0_282"
OpenJDK Runtime Environment (build 1.8.0_282-8u282-b08-0ubuntu1~20.04-b08)
OpenJDK 64-Bit Server VM (build 25.282-b08, mixed mode)
Do I have the correct java?
Could you give some advice on this error?
Thanks a lot!
I index the Robust04 document files myself and run the extract_docs_from_index.py successfully!
Then I check the document.tsv file with pandas package and found that there are 73855 records here. I don't know how many files should be there and I appreciate that if you can tell me the correct number of records here!
Hi Sean,
I want to train cedrpacrr using BERT checkpoint by the following command:
python train.py \
--model cedr_pacrr \ # or cedr_knrm / cedr_drmm
--datafiles data/queries.tsv data/documents.tsv \
--qrels data/qrels \
--train_pairs data/train_pairs \
--valid_run data/valid_run \
--initial_bert_weights models/vbert/weights.p \
--model_out_dir models/cedrpacrr
As you know in your data directory train and validation data are in 5 folds. Also, weight of BERT check point has 5 folds.
Would you please guide me what I should do with train_pair, valid_run and weights?
Thanks in advance,
Kind Regards,
Zahra
data.iter_valid_records() doesn't yield anything if the validation set is smaller than batch_size.
Adding a final block as follows works:
if len(batch['query_id']) > 0:
yield _pack_n_ship(batch)
This also means that the final validation % batch_size documents are omitted from validation.
I suspect data.iter_train_pairs() has exactly the same issue.
I reproduce VanillaBERT
but only get the result
NDCG@20:0.3889
P@20 :0.3180
optimizer :AdamW
batch size :1
lr = 1e-5
Train by HingeLoss and
by following code's provided 'f*.train.pairs' list to random choose pos neg pairs
far from paper's
NDCG@20:0.4541
P@20 :0.4042
class VanillaBERT(nn.Module):
def __init__(self):
super(VanillaBERT, self).__init__()
self.bert = BertModel.from_pretrained(pretrained_weights)
self.dropout = torch.nn.Dropout(0.1)
self.Out_FC = nn.Linear(768,1)
def forward(self, _input_ids,_token_type_ids):
outputs= self.bert(input_ids=_input_ids.squeeze(1),token_type_ids=_token_type_ids.squeeze(1))
#print(np.shape(outputs[0][0][0]))
cls_reps = outputs[0][0][0]
Pred_out = self.Out_FC(self.dropout(cls_reps))
return Pred_out
As discussed: it would be great if I could declare a dependency on this repo, even if it doesnt make it to pypi.
In the guideline notes,be sure to use an index that has appropriate pre-processing.
But I didn't know how to build the index file.I did't the robust04 or clubweb09 document dataset either.
I want to replicate the experiment,but I can't work it out now because of this.
Can anyone help me?
Validation for large cutoffs and numbers of queries can be slower than training. Are there any optimisations that can be done? E.g. tokenising just once, rather than for each iteration?
When I run command
awk '{print $3}' data/robust/*.run | python extract_docs_from_index.py indri PATH_TO_INDRI_INDEX > data/robust/documents.tsv
I got error like:
Traceback (most recent call last):
File "extract_docs_from_index.py", line 60, in <module>
main_cli()
File "extract_docs_from_index.py", line 48, in main_cli
doc_extractor = INDEX_MAP[args.index_type](args.index_path)
File "extract_docs_from_index.py", line 8, in indri_doc_extractor
index = pyndri.Index(path)
File "/root/anaconda3/lib/python3.6/site-packages/pyndri/__init__.py", line 52, in __init__
super(Index, self).__init__(*args, **kwargs)
OSError: ../src/Parameters.cpp(469): Couldn't open parameter file 'indri-5.14/manifest' for reading.
it seems I need to do some indexing first. how could I do that?
Hi, this link provides four pre-trained GloVe model. Which one you used in your paper? Appreciate for your answer~
what is PATH_TO_INDRI_INDEX mean in my own project?
awk '{print $3}' data/robust/*.run | python extract_docs_from_index.py indri PATH_TO_INDRI_INDEX > data/robust/documents.tsv
Hi, I'm confused about how to get WebTrack 2012-2014 datasets. I would appreciate it if you could provide me with the specific process. Thanks a lot.
E.g. https://github.com/Georgetown-IR-Lab/cedr/blob/master/data.py#L70:
random.shuffle(pos_ids)
pos_id = pos_ids[0]
Why not
pos_id = random.choice(pos_ids)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.