Comments (7)
Hi @mobehbooei, let me try to set up Solr and get back to you on that issue. In the meantime, you can build an index for the ACL Anthology collection using Pyserini as described in this guide. A sample of building the index using Pyserini can also be seen in the "Steps to reproduce" section of this issue
from anserini.
I think this is because the README is not up to date, since anserini moved to lucene 9.3 since 02/08/22 2725655) support for Elastic Search and Solr was dropped (#1951).
From the instruction you followed, you need to create the index using target/appassembler/bin/IndexCollection
without solr being involved.
You maybe can clone anserini from right before the 02/08/2023 (or anserini-0.14.4), but keeping the latest relevant source file for acl-indexing (see #2084). I have not tried that but was thinking of it, please let me know if it works !
from anserini.
Hi @mobehbooei - yes, support for Solr has been dropped, so we should go back to direct Anserini indexing. The commands on this issue should work: #2069
i.e.,
python -m pyserini.index -collection AclAnthology -generator AclAnthologyGenerator -threads 8 -input build/data/ -index index/lucene-index-acl-paragraph -storePositions -storeDocvectors -storeContents -storeRaw -optimize
Can you please try it out and then update this page accordingly? https://github.com/castorini/anserini/blob/master/docs/acl-anthology.md
Send PR directly please.
from anserini.
Hi everyone @ygorg @aryamancodes @lintool - Thanks for the responses. I tried the pyserini approach but I still have some issues same as #2069
I am getting this error first:
2023-05-01 16:36:51,108 ERROR [main] collection.AclAnthology (AclAnthology.java:60) - Unable to open volumes.yaml
and then lots of this error:
2023-05-01 16:36:51,582 ERROR [pool-2-thread-6] index.IndexCollection$LocalIndexerThread (IndexCollection.java:348) - pool-2-thread-6: Unexpected Exception:
java.lang.NullPointerException: null
at io.anserini.collection.AclAnthology$Document.<init>(AclAnthology.java:154) ~[anserini-0.21.0-fatjar.jar:?]
at io.anserini.collection.AclAnthology$Segment.readNext(AclAnthology.java:115) ~[anserini-0.21.0-fatjar.jar:?]
at io.anserini.collection.FileSegment$1.hasNext(FileSegment.java:136) ~[anserini-0.21.0-fatjar.jar:?]
at io.anserini.index.IndexCollection$LocalIndexerThread.run(IndexCollection.java:287) [anserini-0.21.0-fatjar.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
Tried this and this to prevent yaml from creating aliases which can't be parsed by anserini but still have the same error!
I am using WSL on my Windows and installed the pyserini package according to this detailed version.
@aryamancodes as you mentioned here it worked for you, so do you have any idea what my problem is? tnx
from anserini.
You might not have the latest version the AclAnthology.java file. Because in the latest version the error message is more verbose. Try updating the file or cloning the latest version of anserini.
from anserini.
Thanks @ygorg. That worked. It needed the Development Installation of pyserini to have the latest versions.
from anserini.
Closing - ref: #2126 and castorini/pyserini#1537
from anserini.
Related Issues (20)
- Incorporate jtreceval directly into our repo HOT 3
- Jank in HNSW and InvertedDense search: -threads and -parallelism
- Test failure → Build failure HOT 3
- Upgrade to Lucene 9.9 HOT 13
- Enable recursive graph bisection?
- Lucene 9.9: Benchmark HNSW improvements HOT 11
- Lucene 9.9: Benchmark sparse improvements HOT 1
- Counter-intuitive result: more RAM = slower indexing (standard inverted indexes) HOT 3
- Integrate jtreceval into Anserini HOT 2
- Add ability to download pre-built indexes HOT 3
- Unable to run BEIR (v1.0.0): SPLADE++ CoCondenser-EnsembleDistil regressions HOT 1
- Iterator Design Pattern concerns
- Chain of Responsibility Pattern concerns
- Strategy Design Pattern concerns
- Reproduce "End-to-End Retrieval with Learned Dense and Sparse Representations Using Lucene" with pre-built indexes HOT 1
- Basic rank fusion implementation in Anserini HOT 1
- SearchCollection -rf.qrels option HOT 1
- Errors with openai-ada2-int8 regressions: GCLocker errors HOT 4
- error
- Cache path change
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from anserini.