georgetown-ir-lab / cedr Goto Github PK

View Code? Open in Web Editor NEW

156.0 156.0 28.0 2.94 MB

Code for CEDR: Contextualized Embeddings for Document Ranking, accepted at SIGIR 2019.

License: MIT License

Perl 25.48% Python 74.52%

cedr's People

Contributors

Stargazers

Watchers

cedr's Issues

MAP results missing

I can't find anything about the mean average precison of your new system (CEDR). Am I missing something or did you really not measured it? Since it's the most common evaluation metric in IR I wonder why you didn't even mention it in the paper.

Furthermore these resources could be relevant to you:

Release Bert Baseline model

I want to ask if it's convenient to release the bert baseline model ckpt?
Thanks~

tqdm

Sean, there is some peculiar tqdm setups in CEDR. Could we remove the leave=false and the ncols; perhaps set the units and some descriptions?

If you agree, I can submit a PR.

transformers rather than pytorch_pretrained_bert

We'd like to make use of the more generic transformers library. There is some migration information at https://huggingface.co/transformers/migration.html

We're trying to upgrade a BertRanker:

class VanillaBertTransformerRanker(BertRanker):
    def __init__(self):
        super().__init__()

        self.tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
        self.bert = AutoModel.from_pretrained('bert-base-uncased')

        self.dropout = torch.nn.Dropout(0.1)
        self.cls = torch.nn.Linear(self.BERT_SIZE, 1)
<snip>

        for layer in result:
            cls_output = layer[:, 0]
            cls_result = []
            for i in range(cls_output.shape[0] // BATCH):
                cls_result.append(cls_output[i*BATCH:(i+1)*BATCH])

            cls_result = torch.stack(cls_result, dim=2).mean(dim=2)
            cls_results.append(cls_result)

We needed to do some casting in train.py, e.g.:

torch.tensor(record['query_tok']).to(torch.int64)

However, the shapes get a bit out of kilter in encode_bert: before the stack, we end up with size [4] rather than [4,768]

# vanilla_bert
# shape of first tensor: torch.Size([283, 768])
# shape of first result: torch.Size([8, 283, 768])
# shape of first cls_result: torch.Size([4, 768])
# shape of first cls_result before stack: torch.Size([4, 768])
# shape of cls_result before stack: 2
# shape of first cls_result after stack: torch.Size([768])
# shape of cls_result after stack: 4
# shape of first cls_result: torch.Size([4, 768])
# shape of first cls_result before stack: torch.Size([4, 768])
# shape of cls_result before stack: 2
# ...
    
# transformer bert
# shape of first tensor: torch.Size([283, 768])
# shape of first result: torch.Size([8, 283, 768])
# shape of first cls_result before stack: torch.Size([4, 768])
# shape of cls_result before stack: 2
# shape of first cls_result after stack: torch.Size([768])
# shape of cls_result after stack: 4
# shape of first cls_result before stack: torch.Size([4])
# shape of cls_result before stack: 2
# ---> error

+@albertoueda

Some problems with indri

Hi，
I use indri, run

awk '{print $3}' /data1/liushu/cedr-master/data/robust/*.run | python /data1/liushu/cedr-master/cedr/extract_docs_from_index.py indri /data1/liushu/cedr-master/index-robust04-20220531/ > /data1/liushu/cedr-master/data/robust/documents.tsv

But an error was reported

Traceback (most recent call last):
  File "/data1/liushu/cedr-master/cedr/extract_docs_from_index.py", line 61, in <module>
    main_cli()
  File "/data1/liushu/cedr-master/cedr/extract_docs_from_index.py", line 49, in main_cli
    doc_extractor = INDEX_MAP[args.index_type](args.index_path)
  File "/data1/liushu/cedr-master/cedr/extract_docs_from_index.py", line 8, in indri_doc_extractor
    index = pyndri.Index(path)
  File "/home/ps/anaconda3/envs/aiwin/lib/python3.6/site-packages/pyndri/__init__.py", line 52, in __init__
    super(Index, self).__init__(*args, **kwargs)
OSError: ../src/Parameters.cpp(469): Couldn't open parameter file 'index-robust04-20220531/manifest' for reading.

How to fix it?
Thank you very much！

How to process Robust04 dataset?

I get the Robust04 dataset file as follow:
1.TREC-Disk-4.tar.gz
2.TREC-Disk-5.tar.gz
and while unzip those files, I get a lot of files named "LAL010289" which contains many documents with HTML labels(,).

Could you give me some advice on what should I do next?
Should I move the document files to the same file folder and install the Indri engine to index them?
Thank you very much!

train data random choice

I found that in every training epoch,16*32 training pairs are randomly selected from the training data. I wonder if the random choice would affect the model learning because the training data changed every epoch. Also, I wonder whether the small amount of training data would be effective in evaluation and test set.

Error in rerank.py args

Hi,

I have noticed that in rerank.py you call train.run_model giving 5 arguments. However train.run_model is only defined with 4arguments.
It seems like the out_path should be removed from this call.

Thanks for your work,
Kind regards.

How were the train pairs for Robust made?

I couldn't find some training pairs in qrels.
Also is eval data for robust used as training data?

About detailed setting for the model with bert which is not finetuned

I am wondering about detailed setting for the model(PACRR, KNRM, DRMM) with bert which is not finetuned at table 1 in https://arxiv.org/pdf/1904.07094.pdf.

In those setting, is the parameters for bert trainable or not?
If it is not trainable, will it be the same setting if I just make bert parameters non-trainable?

Thanks for your great work!

replace setdefault with defaultdict

some of the code use for validation is a bit difficult to read due to the use of setdefault() - would you accept a PR for replacing this with defaultdict?

Craig

where is the documents.tsv file

When I try to train a Vanilla BERT model，using like this

python3 train.py \
  --model vanilla_bert \
  --datafiles data/wt/queries.tsv   data/wt/document.tsv\
  --qrels data/wt/qrels \
  --train_pairs data/wt/train.wt12.pairs \
  --valid_run data/wt/valid.wt12.run \
  --model_out_dir models/vbert

But it does not have document.tsv file and --train_pairs data/ws/train_pairs it has many train_pairs files like train.wt12.pairs or train.wt13.pairs which I need to set.Same question about --valid_run data/valid_run

Thanks a lot for your help

data

Because I want to run this code with other data sets, how can I get .run and .pair files similar to those in /data?

Robust04 title or desc for the query

It's unclear that whether the title or the description of query is used for the model on Table1 performance on Robust04?

Run files of Vanilla BERT checkpoints do not match test folds in data/robust

First of all, thanks a lot for your interesting work on CEDR and for the code in this repository.

I downloaded the Vanilla BERT and CEDR-KNRM checkpoints from #18 and checked the query ids in the .run files contained in the downloaded archive. While the sets of query ids in cedrknrm-robust-f[1-5].run match those in data/robust/f[1-5].test.run, the sets of query ids in vbert-robust-f[1-5].run do not match those in data/robust/f[1-5].test.run (e.g. the set of query ids in vbert-robust-f1.run is different from the set of query ids in data/robust/f1.test.run, and also cedrknrm-robust-f1.run).

Why are the folds for Vanilla BERT and CEDR-KNRM different? On which folds have the Vanilla BERT checkpoints been trained/validated? Given that the test folds of the Vanilla BERT and CEDR-KNRM checkpoints are different I assume that the provided Vanilla BERT checkpoints have not been used as initial weights for obtaining the provided CEDR-KNRM checkpoints. Is this assumption correct? If yes, which Vanilla BERT checkpoints have been used to initialize CEDR-KNRM training? Do you mind sharing these checkpoints too?

I'm currently investigate issues reproducing the results published in the paper. More on that in a separate ticket ...

CEDR for MARCO document ranking

Hello, have you ever run CEDR_KNRM on MSMARCO document ranking task?
I encountered some problems when I trained CEDR_KNRM initialized with the fine-tuned BERT (the performance almost no longer increases or even decreases). I wonder if it's because the training settings on robust are not suitable for MARCO？

Look forward to some empirical guidance. Thank you.

Question about training a vanilla_bert

Thanks your wonderful work CEDR, it helps to understand the document ranking task a lot!
I tried to use this repo and followed the instruction to train a vanilla_bert with 1-fold and 2-fold data on the Robust04 dataset. However, I obtained the NDCG@20 result as：
vanilla_bert_1fold ：0.40704 vanilla_bert_2fold: 0.45290
it is much different from your release checkpoints in #18 :
vanilla_bert_1fold: 0.42185 vanilla_bert_2fold: 0.47948

The hyperparameters are as follows：

Would you please share your setting of hyperparameters during fine-tuning the vanilla-bert ？

Could you release manifest.json? Thanks a lot~

The masks are all equal to 1

result = []
for item in items:

    if len(item) < l:

        item = [1. for _ in item] + ([0.] * (l - len(item)))

    if len(item) >= l:

        item = [1. for _ in item[:l]]
    result.append(item)

in data.py, line 148.

This code causes the masks to be all equal to 1.0 ? Is there a problem here?

About documents.tsv

About running this instruction:

Indri index

awk '{print $3}' data/robust/*.run | python extract_docs_from_index.py indri PATH_TO_INDRI_INDEX > data/robust/documents.tsv
I got a problem.
What's PATH_TO_INDRI_INDEX ?,should I modify any code in extract_docs_from_index.py?
I run this py file and the error is

the following arguments are required: index_type, index_path

Thanks for your great work!！

hardware configuration + trainable false + fine tune BERT

Indri index

I couldn't find out how to build indri index. Could you tell me more about this? Thank you!

Difficulties to reproduce results on Robust 04

This is a follow-up on #21. I tried to reproduce the results on Robust 04 but failed to do so using the code in this repository. In the following I report my results on test fold f1 obtained in 3 experiments:

Experiment 1: Use provided CEDR-KNRM weights and `.run` files.

When evaluating the provided cedrknrm-robust-f1.run file in #18 with

bin/trec_eval -m P.20 data/robust/qrels cedrknrm-robust-f1.run
bin/gdeval.pl -k 20 data/robust/qrels cedrknrm-robust-f1.run

I'm getting P@20 = 0.4470 and nDCG@20 = 0.5177. When using a .run file generated with the provided weights cedrknrm-robust-f1.p

python rerank.py --model cedr_knrm --datafiles data/robust/queries.tsv data/robust/documents.tsv \
  --run data/robust/f1.test.run --model_weights cedrknrm-robust-f1.p --out_path cedrknrm-robust-f1.extra.run

bin/trec_eval -m P.20 data/robust/qrels cedrknrm-robust-f1.extra.run
bin/gdeval.pl -k 20 data/robust/qrels cedrknrm-robust-f1.extra.run

I'm getting P@20 = 0.4290 and nDCG@20 = 0.5038. I'd expect these metrics to be equal to those of the provided cedrknrm-robust-f1.run file. What is the reason for this difference?

Experiment 2: Train my own BERT and CEDR-KNRM models.

This is were I'm getting results that are far below the expected results (only for CEDR-KNRM, not for Vanilla BERT). I started by training and evaluating a Vanilla BERT ranker:

python train.py --model vanilla_bert --datafiles data/robust/queries.tsv data/robust/documents.tsv \
    --qrels data/robust/qrels --train_pairs data/robust/f1.train.pairs --valid_run data/robust/f1.valid.run --model_out_dir trained_bert
python rerank.py --model vanilla_bert --datafiles data/robust/queries.tsv data/robust/documents.tsv \
    --run data/robust/f1.test.run --model_weights trained_bert/weights.p --out_path trained_bert/test.run

bin/trec_eval -m P.20 data/robust/qrels trained_bert/test.run
bin/gdeval.pl -k 20 data/robust/qrels trained_bert/test.run

I'm getting P@20 = 0.3690 and nDCG@20 = 0.4231 which is consistent with evaluating the provided vbert-robust-f1.run file:

bin/trec_eval -m P.20 data/robust/qrels vbert-robust-f1.run
bin/gdeval.pl -k 20 data/robust/qrels vbert-robust-f1.run

This gives P@20 = 0.3550 and nDCG@20 = 0.4219 which comes quite close. I understand that here I simply ignored the inconsistencies reported in #21 but it is at least coarse cross-check of model performance on a single fold. When training a CEDR-KNRM model with this BERT model as initialization

python train.py --model cedr_knrm --datafiles data/robust/queries.tsv data/robust/documents.tsv \
    --qrels data/robust/qrels --train_pairs data/robust/f1.train.pairs --valid_run data/robust/f1.valid.run \
    --initial_bert_weights trained_bert/weights.p --model_out_dir trained_cedr
python rerank.py --model cedr_knrm --datafiles data/robust/queries.tsv data/robust/documents.tsv \
    --run data/robust/f1.test.run --model_weights trained_cedr/weights.p --out_path trained_cedr/test.run

bin/trec_eval -m P.20 data/robust/qrels trained_cedr/test.run
bin/gdeval.pl -k 20 data/robust/qrels trained_cedr/test.run

I'm getting P@20 = 0.3790 and nDCG@20 = 0.4347. This is slightly better than a Vanilla BERT ranker but far below the performance obtained in Experiment 1. I also repeated Experiment 2 with f1.test.run, f1.valid.run and f1.train.pairs files that I generated myself from Anserini runs with a default BM25 configuration and still get results very close to those above.

Has anyone been able to get results similar to those as in Experiment 1 by training a BERT and CEDR-KNRM model as explained in the project's README?

Experiment 3: Use provided `vbert-robust-f1.p` weights as initialization to CEDR-KNRM training

I made this experiment in an attempt to debug the performance gap found in the previous experiment. I'm fully aware that training and evaluating a CEDR-KNRM model on fold 1 (i.e. f1) with the provided vbert-robust-f1.p is invalid because of the inconsistencies reported in #21. This is because the folds used for training/validating/testing vbert-robust-f1.p differ from those in data/robust/f[1-5]*.

In other words, validation and evaluation of the trained CEDR-KNRM model is done with queries that have been used for training the provided vbert-robust-f1.p. So this setup is using partially training data for evaluation which of course gives better evaluation results. I was surprised to see that with this invalid setup, I'm able to reproduce the numbers obtained in Experiment 1, or at least come very close. Here's what I did:

python train.py --model cedr_knrm --datafiles data/robust/queries.tsv data/robust/documents.tsv \
    --qrels data/robust/qrels --train_pairs data/robust/f1.train.pairs --valid_run data/robust/f1.valid.run \
    --initial_bert_weights vbert-robust-f1.p --model_out_dir trained_cedr_invalid
python rerank.py --model cedr_knrm --datafiles data/robust/queries.tsv data/robust/documents.tsv \
    --run data/robust/f1.test.run --model_weights trained_cedr_invalid/weights.p --out_path trained_cedr_invalid/test.run

bin/trec_eval -m P.20 data/robust/qrels trained_cedr_invalid/test.run
bin/gdeval.pl -k 20 data/robust/qrels trained_cedr_invalid/test.run

With this setup I'm getting a CEDR-KNRM performance of P@20 = 0.4400 and nDCG@20 = 0.5050. Given these results and the inconsistencies reported in #21, I wonder if the performance of the cedrknrm-robust-f[1-5].run checkpoints is the result of an invalid CEDR-KNRM training and evaluation setup or, more likely, if I did something wrong? Any hints appreciated!

Question about Robust04

Hi, I am intersted to your work. Could you release the codes about how to preprocess the raw TREC Robust04 dataset? I want to reproduce the results in your paper. However I failed, I think the problem must in the step of data-preprocessing. Appreciate for your help~

why torch version is pegged?

Could we make this less specific. I'm pretty sure it works with torch 1.4.

consider using pytrec_eval

Would you consider using pytrec_eval for validation purposes, instead of trec_eval?

I think there are several advantages of doing so:

You don't have to ship/rely on trec_eval, which might not be complied for the correct platform. In contrast, I think that relying on pytrec_eval means that its compiled as appropriate.
This means that if we provide a "dataset" that contains labels as input, we dont need to have a separate qrels file.
You don't need to fork a separate process to run trec_eval, so it should be faster.

Question about running extract_docs_from_index.py

I try to run the extract_docs_from_index.py with this command and the index is pre-index provided by Pyserini:
awk '{print $3}' data/robust/*.run | python extract_docs_from_index.py lucene index-robust04-20191213/ > data/robust/documents.tsv

but I get an error:

and I do not change any code in the file.

my java version is:
openjdk version "1.8.0_282"
OpenJDK Runtime Environment (build 1.8.0_282-8u282-b08-0ubuntu1~20.04-b08)
OpenJDK 64-Bit Server VM (build 25.282-b08, mixed mode)
Do I have the correct java?

Could you give some advice on this error?
Thanks a lot!

I index the Robust04 document files myself and run the extract_docs_from_index.py successfully!
Then I check the document.tsv file with pandas package and found that there are 73855 records here. I don't know how many files should be there and I appreciate that if you can tell me the correct number of records here!

Train cedrpacrr using 5 folds BERT checkpoint, train_pairs and valid_run

Hi Sean,

I want to train cedrpacrr using BERT checkpoint by the following command:

python train.py \
  --model cedr_pacrr \ # or cedr_knrm / cedr_drmm
  --datafiles data/queries.tsv data/documents.tsv \
  --qrels data/qrels \
  --train_pairs data/train_pairs \
  --valid_run data/valid_run \
  --initial_bert_weights models/vbert/weights.p \
  --model_out_dir models/cedrpacrr

As you know in your data directory train and validation data are in 5 folds. Also, weight of BERT check point has 5 folds.

Would you please guide me what I should do with train_pair, valid_run and weights?

Thanks in advance,
Kind Regards,
Zahra

small validation set doesnt work

data.iter_valid_records() doesn't yield anything if the validation set is smaller than batch_size.

Adding a final block as follows works:

if len(batch['query_id']) > 0:
  yield _pack_n_ship(batch)

This also means that the final validation % batch_size documents are omitted from validation.

I suspect data.iter_train_pairs() has exactly the same issue.

Cannot reproduce VanillaBERT's result on Robust04

I reproduce VanillaBERT
but only get the result
NDCG@20:0.3889
P@20 :0.3180
optimizer :AdamW
batch size :1
lr = 1e-5
Train by HingeLoss and
by following code's provided 'f*.train.pairs' list to random choose pos neg pairs

far from paper's
NDCG@20:0.4541
P@20 :0.4042

class VanillaBERT(nn.Module):

    def __init__(self):

        super(VanillaBERT, self).__init__()
        self.bert = BertModel.from_pretrained(pretrained_weights)
        self.dropout =  torch.nn.Dropout(0.1)
        self.Out_FC = nn.Linear(768,1)

    def forward(self, _input_ids,_token_type_ids):
        outputs= self.bert(input_ids=_input_ids.squeeze(1),token_type_ids=_token_type_ids.squeeze(1))
        #print(np.shape(outputs[0][0][0]))
        cls_reps = outputs[0][0][0]
        Pred_out = self.Out_FC(self.dropout(cls_reps))

        return Pred_out

Make pip install-compatible

As discussed: it would be great if I could declare a dependency on this repo, even if it doesnt make it to pypi.

how to get a pre-processing index file for lucene or indri

In the guideline notes,be sure to use an index that has appropriate pre-processing.
But I didn't know how to build the index file.I did't the robust04 or clubweb09 document dataset either.
I want to replicate the experiment,but I can't work it out now because of this.
Can anyone help me?

can tokenisation be done just once for every validation item?

Validation for large cutoffs and numbers of queries can be slower than training. Are there any optimisations that can be done? E.g. tokenising just once, rather than for each iteration?

Bug with extract_docs_from_index.py

When I run command

awk '{print $3}' data/robust/*.run | python extract_docs_from_index.py indri PATH_TO_INDRI_INDEX > data/robust/documents.tsv

I got error like:

Traceback (most recent call last):
  File "extract_docs_from_index.py", line 60, in <module>
    main_cli()
  File "extract_docs_from_index.py", line 48, in main_cli
    doc_extractor = INDEX_MAP[args.index_type](args.index_path)
  File "extract_docs_from_index.py", line 8, in indri_doc_extractor
    index = pyndri.Index(path)
  File "/root/anaconda3/lib/python3.6/site-packages/pyndri/__init__.py", line 52, in __init__
    super(Index, self).__init__(*args, **kwargs)
OSError: ../src/Parameters.cpp(469): Couldn't open parameter file 'indri-5.14/manifest' for reading.
it seems I need to do some indexing first. how could I do that?

random.shuffle(pos_ids)
pos_id = pos_ids[0]

Why not

pos_id = random.choice(pos_ids)

georgetown-ir-lab / cedr Goto Github PK

cedr's People

Contributors

Stargazers

Watchers

Forkers

cedr's Issues

Indri index

the following arguments are required: index_type, index_path

Experiment 1: Use provided CEDR-KNRM weights and .run files.

Experiment 2: Train my own BERT and CEDR-KNRM models.

Experiment 3: Use provided vbert-robust-f1.p weights as initialization to CEDR-KNRM training

Recommend Projects

Recommend Topics

Recommend Org

Experiment 1: Use provided CEDR-KNRM weights and `.run` files.

Experiment 3: Use provided `vbert-robust-f1.p` weights as initialization to CEDR-KNRM training