cvangysel / spud Goto Github PK
View Code? Open in Web Editor NEWThis project forked from ronancummins/spud
A lucene implementation of the SPUD language model for document retrieval
This project forked from ronancummins/spud
A lucene implementation of the SPUD language model for document retrieval
This set of classes is a lucene implementation of the SPUD retrieval model that appears in "A Polya Urn Document Language Model for Improved Information Retrieval" by Ronan Cummins, Jiaul Hoque Paik, and Yuanhua Lv. The classes depend on the following publicly available jar files: lucene-core-5.0.0.jar lucene-queryparser-5.0.0.jar lucene-analyzers-common-5.0.0.jar lucene-queries-5.0.0.jar commons-math3-3.3.jar jsoup-1.7.3.jar To build the classes, create a "classes" directory at the same level as "src". >mkdir classes Then run >make all Included in this download is the cranfield-collection (modified to the TREC format). The three important files for the modified cranfield collection are: cran.all.1400.trec-format (the documents) cran.qry.trec-format (the queries) cran.qrels.trec-format (the qrels) The only two classes with main methods are: indexing.LuceneTRECIndexer scoring.QuerySearch To index the cranfield collection, create an index file containing the full paths of files that you wish to index. There should be only one line in the index file for the cranfield collection. E.g. ././cran.all.1400.trec-format Then from the classes directory run: >java -cp .:../lib/* indexing.LuceneTRECIndexer ../cranfield-collection/lucene_index ../cranfield-collection/index-file 1 0 This will create the index in the "lucene_index" directory You can then run the queries on the collection from the classes directory as follows: >java -cp .:../lib/* searching.QuerySearch ../cranfield-collection/lucene_index ../cranfield-collection/cran.qry.trec-format ../cranfield-collection/cran.qrels.trec-format This should run the basic spud model using the queries and also calculate some effectiveness metrics for the queries. Copyright © 2015 Ronan Cummins This work is free. It comes without any warranty to the extent permitted by applicable law. You can redistribute it and/or modify it under the terms of the Do What The Fuck You Want To Public License, Version 2, as published by Sam Hocevar. See http://www.wtfpl.net/ for more details.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.