Giter Site home page Giter Site logo

Comments (7)

aryamancodes avatar aryamancodes commented on June 20, 2024

Hi @mobehbooei, let me try to set up Solr and get back to you on that issue. In the meantime, you can build an index for the ACL Anthology collection using Pyserini as described in this guide. A sample of building the index using Pyserini can also be seen in the "Steps to reproduce" section of this issue

from anserini.

ygorg avatar ygorg commented on June 20, 2024

I think this is because the README is not up to date, since anserini moved to lucene 9.3 since 02/08/22 2725655) support for Elastic Search and Solr was dropped (#1951).
From the instruction you followed, you need to create the index using target/appassembler/bin/IndexCollection without solr being involved.
You maybe can clone anserini from right before the 02/08/2023 (or anserini-0.14.4), but keeping the latest relevant source file for acl-indexing (see #2084). I have not tried that but was thinking of it, please let me know if it works !

from anserini.

lintool avatar lintool commented on June 20, 2024

Hi @mobehbooei - yes, support for Solr has been dropped, so we should go back to direct Anserini indexing. The commands on this issue should work: #2069

i.e.,

python -m pyserini.index -collection AclAnthology -generator AclAnthologyGenerator -threads 8 -input build/data/ -index index/lucene-index-acl-paragraph -storePositions -storeDocvectors -storeContents -storeRaw -optimize

Can you please try it out and then update this page accordingly? https://github.com/castorini/anserini/blob/master/docs/acl-anthology.md

Send PR directly please.

from anserini.

mobehbooei avatar mobehbooei commented on June 20, 2024

Hi everyone @ygorg @aryamancodes @lintool - Thanks for the responses. I tried the pyserini approach but I still have some issues same as #2069
I am getting this error first:

2023-05-01 16:36:51,108 ERROR [main] collection.AclAnthology (AclAnthology.java:60) - Unable to open volumes.yaml

and then lots of this error:

2023-05-01 16:36:51,582 ERROR [pool-2-thread-6] index.IndexCollection$LocalIndexerThread (IndexCollection.java:348) - pool-2-thread-6: Unexpected Exception:
java.lang.NullPointerException: null
        at io.anserini.collection.AclAnthology$Document.<init>(AclAnthology.java:154) ~[anserini-0.21.0-fatjar.jar:?]
        at io.anserini.collection.AclAnthology$Segment.readNext(AclAnthology.java:115) ~[anserini-0.21.0-fatjar.jar:?]
        at io.anserini.collection.FileSegment$1.hasNext(FileSegment.java:136) ~[anserini-0.21.0-fatjar.jar:?]
        at io.anserini.index.IndexCollection$LocalIndexerThread.run(IndexCollection.java:287) [anserini-0.21.0-fatjar.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:829) [?:?]

Tried this and this to prevent yaml from creating aliases which can't be parsed by anserini but still have the same error!
I am using WSL on my Windows and installed the pyserini package according to this detailed version.
@aryamancodes as you mentioned here it worked for you, so do you have any idea what my problem is? tnx

from anserini.

ygorg avatar ygorg commented on June 20, 2024

You might not have the latest version the AclAnthology.java file. Because in the latest version the error message is more verbose. Try updating the file or cloning the latest version of anserini.

from anserini.

mobehbooei avatar mobehbooei commented on June 20, 2024

Thanks @ygorg. That worked. It needed the Development Installation of pyserini to have the latest versions.

from anserini.

lintool avatar lintool commented on June 20, 2024

Closing - ref: #2126 and castorini/pyserini#1537

from anserini.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.