Giter Site home page Giter Site logo

cvangysel / pyndri Goto Github PK

View Code? Open in Web Editor NEW
88.0 88.0 19.0 84 KB

pyndri is a Python interface to the Indri search engine.

Home Page: http://ilps.science.uva.nl

License: MIT License

Python 50.29% C++ 49.71%
indri-search-engine information-retrieval python research

pyndri's People

Contributors

cvangysel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyndri's Issues

Error reading the dictionary

Good morning,
In order to perform my tests, I need to get all the words for a document (for exemple get all the words of document 1), I used the snipet code that you gave in https://arxiv.org/pdf/1701.00749.pdf like this :
print([dictionary[token_id] for token_id in index.document(1)[1]]) I've gotten the following error :
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in <listcomp> File "~local/lib/python3.5/site-packages/pyndri/dictionary.py", line 32, in __getitem__ return self.id2token[token_id] KeyError: 0
Please, can you explain me how can I get access to all the document words ?
Thanks

Running tests, I got these issues

`======================================================================
FAIL: test_1empty (main.IndriTest)

Traceback (most recent call last):
File "/home/derrer/Documents/UniMaterials/distribSystemsInstall/pyndri/tests/pyndri_tests.py", line 121, in setUp
self.assertEqual(ret, 0)
AssertionError: -11 != 0

======================================================================
FAIL: test_document (main.IndriTest)

Traceback (most recent call last):
File "/home/derrer/Documents/UniMaterials/distribSystemsInstall/pyndri/tests/pyndri_tests.py", line 121, in setUp
self.assertEqual(ret, 0)
AssertionError: -11 != 0

======================================================================
FAIL: test_document_length (main.IndriTest)

Traceback (most recent call last):
File "/home/derrer/Documents/UniMaterials/distribSystemsInstall/pyndri/tests/pyndri_tests.py", line 121, in setUp
self.assertEqual(ret, 0)
AssertionError: -11 != 0

======================================================================
FAIL: test_iter_index (main.IndriTest)

Traceback (most recent call last):
File "/home/derrer/Documents/UniMaterials/distribSystemsInstall/pyndri/tests/pyndri_tests.py", line 121, in setUp
self.assertEqual(ret, 0)
AssertionError: -11 != 0

======================================================================
FAIL: test_meta (main.IndriTest)

Traceback (most recent call last):
File "/home/derrer/Documents/UniMaterials/distribSystemsInstall/pyndri/tests/pyndri_tests.py", line 121, in setUp
self.assertEqual(ret, 0)
AssertionError: -11 != 0

======================================================================
FAIL: test_okapi (main.IndriTest)

Traceback (most recent call last):
File "/home/derrer/Documents/UniMaterials/distribSystemsInstall/pyndri/tests/pyndri_tests.py", line 121, in setUp
self.assertEqual(ret, 0)
AssertionError: -11 != 0

======================================================================
FAIL: test_prf_query_environment (main.IndriTest)

Traceback (most recent call last):
File "/home/derrer/Documents/UniMaterials/distribSystemsInstall/pyndri/tests/pyndri_tests.py", line 121, in setUp
self.assertEqual(ret, 0)
AssertionError: -11 != 0

======================================================================
FAIL: test_process_term (main.IndriTest)

Traceback (most recent call last):
File "/home/derrer/Documents/UniMaterials/distribSystemsInstall/pyndri/tests/pyndri_tests.py", line 121, in setUp
self.assertEqual(ret, 0)
AssertionError: -11 != 0

======================================================================
FAIL: test_query_documentset (main.IndriTest)

Traceback (most recent call last):
File "/home/derrer/Documents/UniMaterials/distribSystemsInstall/pyndri/tests/pyndri_tests.py", line 121, in setUp
self.assertEqual(ret, 0)
AssertionError: -11 != 0

======================================================================
FAIL: test_query_environment (main.IndriTest)

Traceback (most recent call last):
File "/home/derrer/Documents/UniMaterials/distribSystemsInstall/pyndri/tests/pyndri_tests.py", line 121, in setUp
self.assertEqual(ret, 0)
AssertionError: -11 != 0

======================================================================
FAIL: test_query_expander (main.IndriTest)

Traceback (most recent call last):
File "/home/derrer/Documents/UniMaterials/distribSystemsInstall/pyndri/tests/pyndri_tests.py", line 121, in setUp
self.assertEqual(ret, 0)
AssertionError: -11 != 0

======================================================================
FAIL: test_query_results_requested (main.IndriTest)

Traceback (most recent call last):
File "/home/derrer/Documents/UniMaterials/distribSystemsInstall/pyndri/tests/pyndri_tests.py", line 121, in setUp
self.assertEqual(ret, 0)
AssertionError: -11 != 0

======================================================================
FAIL: test_query_snippets (main.IndriTest)

Traceback (most recent call last):
File "/home/derrer/Documents/UniMaterials/distribSystemsInstall/pyndri/tests/pyndri_tests.py", line 121, in setUp
self.assertEqual(ret, 0)
AssertionError: -11 != 0

======================================================================
FAIL: test_raw_dictionary (main.IndriTest)

Traceback (most recent call last):
File "/home/derrer/Documents/UniMaterials/distribSystemsInstall/pyndri/tests/pyndri_tests.py", line 121, in setUp
self.assertEqual(ret, 0)
AssertionError: -11 != 0

======================================================================
FAIL: test_repr (main.IndriTest)

Traceback (most recent call last):
File "/home/derrer/Documents/UniMaterials/distribSystemsInstall/pyndri/tests/pyndri_tests.py", line 121, in setUp
self.assertEqual(ret, 0)
AssertionError: -11 != 0

======================================================================
FAIL: test_simple_query (main.IndriTest)

Traceback (most recent call last):
File "/home/derrer/Documents/UniMaterials/distribSystemsInstall/pyndri/tests/pyndri_tests.py", line 121, in setUp
self.assertEqual(ret, 0)
AssertionError: -11 != 0

======================================================================
FAIL: test_tfidf (main.IndriTest)

Traceback (most recent call last):
File "/home/derrer/Documents/UniMaterials/distribSystemsInstall/pyndri/tests/pyndri_tests.py", line 121, in setUp
self.assertEqual(ret, 0)
AssertionError: -11 != 0

======================================================================
FAIL: test_tokenize (main.IndriTest)

Traceback (most recent call last):
File "/home/derrer/Documents/UniMaterials/distribSystemsInstall/pyndri/tests/pyndri_tests.py", line 121, in setUp
self.assertEqual(ret, 0)
AssertionError: -11 != 0

======================================================================
FAIL: test_with (main.IndriTest)

Traceback (most recent call last):
File "/home/derrer/Documents/UniMaterials/distribSystemsInstall/pyndri/tests/pyndri_tests.py", line 121, in setUp
self.assertEqual(ret, 0)
AssertionError: -11 != 0


Ran 23 tests in 5.854s

FAILED (failures=19)`

Is an AssertionError critical? If so, how do I deal with it?

installation in cygwin

Hi,
when I'm trying to install the package in cygwin I get the following error:

                              ^
g++ -shared -Wl,--enable-auto-image-base build/temp.cygwin-2.10.0-x86_64-3.6/src/pyndri.o -Lusr/local/include/indri -L/usr/lib/python3.6/config -L/usr/lib -lindri -lz -lpthread -lm -lpython3.6m -o build/lib.cygwin-2.10.0-x86_64-3.6/pyndri_ext.cpython-36m-x86_64-cygwin.dll
/usr/lib/gcc/x86_64-pc-cygwin/7.3.0/../../../../x86_64-pc-cygwin/bin/ld: cannot find -lindri
collect2: error: ld returned 1 exit status
error: command 'g++' failed with exit status 1

I tried to set the environment variable 'LD_LIBRARY_PATH' to the indri local folder but it didn't help.

is there any chance you can help me out to get it installed properly?
thanks in advance

Install Pyndri

Hi,
I start to install pyndri, in the first step (sudo apt install g++ zlib1g-dev python3.5-dev python3-pip), I received this messages:

Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
g++ : Depends: cpp (>= 4:7.3.0-3ubuntu2.1) but 4:5.3.1-1ubuntu1 is to be installed
Depends: gcc (>= 4:7.3.0-3ubuntu2.1) but 4:5.3.1-1ubuntu1 is to be installed
Depends: g++-7 (>= 7.3.0-27~) but it is not going to be installed
Depends: gcc-7 (>= 7.3.0-27~) but it is not going to be installed
python3-pip : Depends: python3-distutils but it is not going to be installed
Recommends: python3-dev (>= 3.2) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

What should I do?

Regards

When running example code, python gets terminated.

When I run the example code, the python is terminated with terminate called after throwing an instance of 'lemur::api::Exception' and exit code 134 (interrupted by signal 6: SIGABRT)

After googling for a few hours, I haven't found any solutions or reasons of the error.
This might not be Pyndri problems but have you seen this error before by any chance?

setup error on Ubuntu system

Hi,
Many thanks for your contributing, that could be very useful. I clone the 'pyndri' to my computer,and run 'sudo python setup.py install' error report:

  • system: ubuntu

  • run 'sudo python setup.py install'

  • error report is : running install
    running build
    running build_py
    running build_ext
    building 'pyndri_ext' extension
    x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -D_GLIBCXX_USE_CXX11_ABI=0 -DP_NEEDS_GNU_CXX_NAMESPACE=1 -UNDEBUG -I/usr/include/python2.7 -c src/pyndri.cpp -o build/temp.linux-x86_64-2.7/src/pyndri.o
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
    src/pyndri.cpp:8:42: fatal error: antlr/NoViableAltException.hpp: 没有那个文件或目录
    #include <antlr/NoViableAltException.hpp>
    ^
    compilation terminated.
    error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

  • what i have tried: i try to install dependency package like

sudo apt-get install build-essential autoconf libtool pkg-config python-opengl python-imaging python-pyrex python-pyside.qtopengl idle-python2.7 qt4-dev-tools qt4-designer libqtgui4 libqtcore4 libqt4-xml libqt4-test libqt4-script libqt4-network libqt4-dbus python-qt4 python-qt4-gl libgle3 python-dev

sudo easy_install greenlet

sudo easy_install gevent
sudo pip install lxml --upgrade

  • then those seems didn't work , still report error the same as above.

Setup on google colab

Hello! I am trying to setup pyndri on colab but I can't get through the first phase where you need to install indri.
I visited the link for project indri but all i could find was 2 jar files (I couldn't find any file named indri-5.11.tar.gz).
Does anyone have any idea on how to set it up on colab(indri/pyndri)?
Thank you in advance!

Is it working?

I can't compile the project. It gives the following error:
src/pyndri.cpp:8:42: fatal error: indri/CompressedCollection.hpp: No such file or directory

IOError: Indri repository contain more than one index.

I have indexed a huge number of documents using IndriBuildIndex. I am able to run queries using IndriRunQuery on the same index, but when try to open the index in pyIndri I get the following error:

IOError: Indri repository contain more than one index.

Error Run sample code

i have run this code:
sudo apt install g++ zlib1g-dev python3.5-dev python3-pip
sudo pip3 install setuptools

i have run this code too:
./configure CXX="g++ -D_GLIBCXX_USE_CXX11_ABI=0"
make
sudo make install

and then i try to run example code in pyndri

import pyndri

index = pyndri.Index('~/indri-5.11/runquery/alQuran/indeks')

for document_id in range(index.document_base(), index.maximum_document()):
print(index.document(document_id))

and get error in

Traceback (most recent call last):
File "opik.py", line 3, in
index = pyndri.Index('/indri-5.11/runquery/alQuran/indeks')
File "/usr/local/lib/python3.5/site-packages/pyndri/init.py", line 46, in init
super(Index, self).init(*args, **kwargs)
OSError: ../src/Parameters.cpp(469): Couldn't open parameter file '
/indri-5.11/runquery/alQuran/indeks/manifest' for reading.

what should i do,, i am newby in python and pyndri.. thanks for your answer

How can i connect indri to mongodb

Hello everyone i want to index my collection of data imported into mongodb database
how can i connect indri tto index mongodb data ?
or how can i use pyndri to connect to mongodb ?

Get only id2token dictionary (memory problem)

Hi,
I'm working with Clueweb09 category A - a big index (about 2TB).
I need to extract the textual content of documents, and to do so I can extract the tokens-tuple using index.document(doc_id), but in order to "translate" it to text, I need id2token dictionary.

The problem is that I see I can get id2token only using index.get_dictionary(), but it uploads pretty much everything to the memory, and even though my machine got over 100GB of RAM, it gets killed in the process.

Can I get only the id2token dictionary? (hopefully that won't be to big)
Do you have any other solution for my problem?

Thanks,
Avihay

Installation Error

src/pyndri.cpp:8:42: fatal error: indri/CompressedCollection.hpp: No such file or directory
#include <indri/CompressedCollection.hpp>
^
compilation terminated.
error: command 'x86_64-pc-linux-gnu-g++' failed with exit status 1

Installation Error

When I run the command sudo pip3 install pyndri I get this error:

The directory '/home/sars/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/home/sars/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Collecting pyndri
  Downloading pyndri-0.1.tar.gz
Installing collected packages: pyndri
  Running setup.py install for pyndri ... error
    Complete output from command /usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-yc8t90wu/pyndri/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-xtj8tbrx-record/install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.5
    creating build/lib.linux-x86_64-3.5/pyndri
    copying py/__init__.py -> build/lib.linux-x86_64-3.5/pyndri
    copying py/compat.py -> build/lib.linux-x86_64-3.5/pyndri
    copying py/dictionary.py -> build/lib.linux-x86_64-3.5/pyndri
    running build_ext
    building 'pyndri_ext' extension
    creating build/temp.linux-x86_64-3.5
    creating build/temp.linux-x86_64-3.5/src
    x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -D_GLIBCXX_USE_CXX11_ABI=0 -DP_NEEDS_GNU_CXX_NAMESPACE=1 -UNDEBUG -I/usr/include/python3.5m -c src/pyndri.cpp -o build/temp.linux-x86_64-3.5/src/pyndri.o
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    src/pyndri.cpp:8:42: fatal error: antlr/NoViableAltException.hpp: No such file or directory
    compilation terminated.
    error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
    
    ----------------------------------------
Command "/usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-yc8t90wu/pyndri/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-xtj8tbrx-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-yc8t90wu/pyndri/
You are using pip version 8.1.1, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

Error while running "pip3 install pyndri"

Hi there,
While running the command "pip3 install pyndri" I've got the following error:

cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
src/pyndri.cpp:9:42: fatal error: antlr/NoViableAltException.hpp: No such file or directory
#include <antlr/NoViableAltException.hpp>
^
compilation terminated.
error: command 'gcc' failed with exit status 1
Command ".../python3.5/bin/python3.5 -u -c "import setuptools, tokenize;file='/tmp/pip-build-xqx2sa5u/pyndri/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-0bhj1piv-record/install-record.txt --single-version-externally-managed --compile --install-headers .../python3.5/include/site/python3.5/pyndri" failed with error code 1 in /tmp/pip-build-xqx2sa5u/pyndri/

I've run the "install setuptools" command successfully.
Could you help me, please?
Best regards

Number of total docs is wrong (index.document_count())

Just a small detail:
The number of total docs is defined in index.document_count() is -1 of what it should be.

For instance, if max_document is 2 and document_base is 1, then the total # docs should be 2 instead of 1.
Hence, I believe the right formula is:
index.document_count() = ( index.maximum_document() - index.document_base() +1 )

How can a get content of document

i try to get content (text in document which i index), but is failure to get it

import pyndri

index = pyndri.Index('/home/opiq/indri/runquery/alQuran/indeks')
dictionary = pyndri.extract_dictionary(index)
_, int_doc_id = index.document_ids([' QS_1:1 '])
print([dictionary[token_id]
for token_id in index.document(int_doc_id)[1]])

but i get error

> Traceback (most recent call last):
>   File "contoh-dok.py", line 5, in <module>
>     _, int_doc_id = index.document_ids([' QS_1:1 '])
> ValueError: not enough values to unpack (expected 2, got 0)

thanks if for your answer

Pseudo relevance feedback

Is pseudo relevance feedback based retrieval possible in pyndri?
All the examples that I see are related to without query expansion.

Thanks in advance

Access the BM25 implementation

Good morning;
I need to my experiments to access the implementation of the BM25 model because I have to perform some changes to my model, please, how can I get access to the BM25 implementation in order to make my experiments, or call the function making the BM25 ?
Thanks

Little Help

Hi,
I'm new to pyndri and I was wondering if I could use a little help here.
I want to extract from the index the words vectors of documents that contain a specific word (e.g. extract all of the words-vector of the documents contain the word, Asia). Is there any convenient way to extract this information by pyndri?
In addition, I want to run an RM3 model. how can I do that?
Thank a lot!

Installation Error

Hi,
I try to install indri on my local folder at linux (I have no sudo rights).
When I run the command: "CC=gcc CXX=g++ pip3 install pyndri --user".
I get this error:
" g++ -pthread -shared build/temp.linux-x86_64-3.5/src/pyndri.o -lindri -lz -lpthread -lm -o build/lib.linux-x86_64-3.5/pyndri_ext.cpython-35m-x86_64-linux-gnu.so
/usr/bin/ld: cannot find -lindri
collect2: ld returned 1 exit status
error: command 'g++' failed with exit status 1

----------------------------------------

Command "/lv_local/home/ortal.as/python/bin/python3.5 -u -c "import setuptools, tokenize;file='/tmp/pip-build-_fastrzt/pyndri/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-qlf7fpqe-record/install-record.txt --single-version-externally-managed --compile --user --prefix=" failed with error code 1 in /tmp/pip-build-_fastrzt/pyndri/"

The installation log in the attached file:
Installation Log.txt

I would be grateful if you had an idea how to solve the problem.
Thanks in advance,
Ortal

The stemmer option are not available

Good Morning,

I installed the latest pyndri today (5.2.18).
Options from the example stem.py (krovetz_stem, porter_stem) are not available. The only available option is "stem".

Thannk you in advance,
Ortal

Segfault (SIGSEGV) in queries with non-ascii characters

I have an index that is built on Wikipedia articles. Whenever a query contains a non-ascii character, something (I can't exactly locate the error, but it's definitely below Index.query()) throws a segmentation fault. This happens for '–' and 'ō', but not 'é' (presumably because the latter is part of ascii-256?).

Undefined Symbol on import (not the one in the FAQ)

I build and installed indri (latest) and installed pyndri for python3.5 but when I try to import pyndri it gives the following error:

In [1]: import pyndri
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-601147fcfd90> in <module>()
----> 1 import pyndri

/usr/local/lib/python3.5/dist-packages/pyndri-1.0-py3.5-linux-x86_64.egg/pyndri/__init__.py in <module>()
      1 from pyndri.dictionary import Dictionary, extract_dictionary
      2 
----> 3 from pyndri_ext import Index as __IndexBase
      4 from pyndri_ext import QueryEnvironment, stem, tokenize
      5 

ImportError: /usr/local/lib/python3.5/dist-packages/pyndri-1.0-py3.5-linux-x86_64.egg/pyndri_ext.cpython-35m-x86_64-linux-gnu.so: undefined symbol: _ZN5indri3api16QueryEnvironment8addIndexERKSs

I'm using Linux Mint 18 Sarah x64 with Python 3.5.2 and I build Indri and Pyndri using the following commands:
for Indri after tar xvzf:
./configure
make
sudo make install

for Pyndri:
sudo python3 setup.py install

The error is produced directly after attempting to import pyndri in python3.
I can't really make much out of the error code, any help please?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.