Giter Site home page Giter Site logo

pyterrier_dr's Issues

TypeError: from_file() missing 1 required positional argument: 'shared'

torch.version: '1.13.0'

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_69880/976215978.py in <module>
      1 retr_pipeline = model >> pyterrier_dr.TorchIndex('/nfstrecdl/msmarco-passage.tct-hnp')
----> 2 retr_pipeline.search('Hello Terrier')

/opt/conda/envs/dr/lib/python3.7/site-packages/pyterrier/transformer.py in search(self, query, qid, sort)
    214         import pandas as pd
    215         queryDf = pd.DataFrame([[qid, query]], columns=["qid", "query"])
--> 216         rtr = self.transform(queryDf)
    217         if "qid" in rtr.columns and "rank" in rtr.columns:
    218             rtr = rtr.sort_values(["qid", "rank"], ascending=[True,True])

/opt/conda/envs/dr/lib/python3.7/site-packages/pyterrier/transformer.py in transform(self, topics)
    885     def transform(self, topics):
    886         for m in self.models:
--> 887             topics = m.transform(topics)
    888         return topics
    889 

/opt/conda/envs/dr/lib/python3.7/site-packages/pyterrier_dr/indexes.py in transform(self, inp)
    657             query_vecs = query_vecs.half()
    658 
--> 659         self.docnos_and_data()
    660 
    661         step = self._cuda_data.shape[0]

/opt/conda/envs/dr/lib/python3.7/site-packages/pyterrier_dr/indexes.py in docnos_and_data(self)
    637                 'f2': (torch.HalfStorage,  torch.HalfTensor,  torch.cuda.HalfTensor,  2),
    638             }[self.dtype]
--> 639             self._cpu_data = TType(SType.from_file(str(self.index_path/'index.npy'), size=meta['count'] * meta['vec_size'])).reshape(meta['count'], meta['vec_size'])
    640             doc_batch_size = self.idx_mem//SIZE//meta['vec_size']
    641             if self.half:

TypeError: from_file() missing 1 required positional argument: 'shared'

apply over an index

In the FCRS usecase (w/ @yashonwu), we have a need to take a (dense) index, and apply a "model" to it, to make a new index. I think it would make sense to replace our custom index and ranking code with pyterrier_dr.

Currently, that just means applying an torch.nn.Linear. This can be done using a GPU, batchwise. More generically, it could be a function. What could be a reasonable API for PyTerrier DR?

index_new = index.downcast( model.img_transform, batchsize=K, device=torch.cuda() ) 

I wouldnt expect all indices to support this. I think some FAISS impls could support it, but all I really care about is the numpy index impl.

Save a model

Hi,

How to save a trained a model? i looked for a 'save' method but i couldn't find one

Thanks.

Indexing Robust

Hey,

I'm having some problems while trying to index the Robust collection using NumpyIndex. I tried to index it by following the example given in the Pyterrier documentation:

index = pyterrier_dr.NumpyIndex('./indices/robust_MB')
files = pt.io.find_files("path/to/robust")
idx_pipeline = model >> index
idx_pipeline.index(files)

But it doesn't work. I think it's because i need to pass the fields and meta when indexing but NumpyIndex doesn't accept this parameters.

Any help is welcome. Thanks.

`TctColBert.reverse()'s function

Heyy,

I'm a bit confused on what the method TctColBert.reverse() do? and is it used by any other method internally?
Also, in this line of code:
Q = Q[:, 4:, :].mean(dim=1) # remove the first 4 tokens (representing [CLS] [ Q ]), and average
seems stupid to ask but why is [CLS] [ Q ] represeted in 4 tokens and not 2?

Thank you!

Memory requirement for TorchIndex

Hi,

NumpyIndex does not require the full index to be in memory, but can the same be said on TorchIndex?
I'm trying to run an experiment on the MS MARCO dev set with TorchIndex qnd i get this error saying cuda is out of memory.

Thanks for any help!

Here's the code:
`#Training
model = pyterrier_dr.TctColBert(model_name='distilbert-base-uncased')
dataset = pt.get_dataset('irds:msmarco-passage/train/judged')
model.fit(dataset=dataset, steps=1000)

#Evaluation
dataset = pt.get_dataset('irds:msmarco-passage/dev/judged')
index = pyterrier_dr.TorchIndex('index.torch')
idx_pipeline = model >> index
idx_pipeline.index(dataset.get_corpus_iter())

retr_pipeline = model >> index
pt.Experiment(
[retr_pipeline],
dataset.get_topics(),
dataset.get_qrels(),
eval_metrics=["map", "recip_rank","ndcg", "ndcg_cut_10"]
)`

And i get this error:
`Traceback (most recent call last):
File "/XXX/DR_cuda/DR_cuda.py", line 22, in
pt.Experiment(

File "/XXX/dr_env/lib/python3.8/site-packages/pyterrier/pipelines.py", line 450, in Experiment
time, evalMeasuresDict = _run_and_evaluate(

File "/XXX/dr_env/lib/python3.8/site-packages/pyterrier/pipelines.py", line 170, in _run_and_evaluate
res = system.transform(topics)

File "/XXX/dr_env/lib/python3.8/site-packages/pyterrier/ops.py", line 335, in transform
topics = m.transform(topics)

File "/XXX/dr_env/lib/python3.8/site-packages/pyterrier_dr/indexes.py", line 676, in transform
scores = query_vecs @ self._cuda_data[:bsize].T

RuntimeError: CUDA out of memory. Tried to allocate 33.70 GiB (GPU 0; 10.92 GiB total capacity; 1.63 GiB already allocated; 8.27 GiB free; 2.00 GiB reserved in total by PyTorch)

srun: error: gpu-nc06: task 0: Exited with exit code 1`

Error code 139

Hi,

While trying to index the Arguana collection and then run an experiment on it i encounter the error code 139. From what i understand it's a segmentation violation error, but i don't understand where it is comming from.

Here's my code:
`
dataset = pt.get_dataset('irds:beir/arguana')
index = pyterrier_dr.NumpyIndex('index_arg.np')
idx_pipeline = model >> index
idx_pipeline.index(dataset.get_corpus_iter())

retr_pipeline = model >> index
pt.Experiment(
[retr_pipeline],
dataset.get_topics(),
dataset.get_qrels(),
eval_metrics=["map", "recip_rank",RR@10,"ndcg", "ndcg_cut_10"]
)
`

and i get this :
beir/arguana documents: 100%|██████████| 8674/8674 [15:44<00:00, 9.19it/s]srun: error: gpu-nc07: task 0: Exited with exit code 139

If anyone knows anything about this please let me know.
Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.