Giter Site home page Giter Site logo

spud's Introduction

This set of classes is a lucene implementation of the SPUD retrieval 
model that appears in "A Polya Urn Document Language Model for 
Improved Information Retrieval" by Ronan Cummins, Jiaul Hoque Paik, 
and Yuanhua Lv.



The classes depend on the following publicly available jar files:

lucene-core-5.0.0.jar
lucene-queryparser-5.0.0.jar
lucene-analyzers-common-5.0.0.jar
lucene-queries-5.0.0.jar
commons-math3-3.3.jar
jsoup-1.7.3.jar



To build the classes, create a "classes" directory at the same level as "src". 

>mkdir classes

Then run

>make all

Included in this download is the cranfield-collection (modified to the TREC format). 
The three important files for the modified cranfield collection are:

cran.all.1400.trec-format (the documents)
cran.qry.trec-format (the queries)
cran.qrels.trec-format (the qrels)


The only two classes with main methods are:
indexing.LuceneTRECIndexer
scoring.QuerySearch


To index the cranfield collection, create an index file containing the full paths of files that you wish to index.
There should be only one line in the index file for the cranfield collection. E.g. 
././cran.all.1400.trec-format


Then from the classes directory run:
>java -cp .:../lib/* indexing.LuceneTRECIndexer ../cranfield-collection/lucene_index ../cranfield-collection/index-file 1 0 

This will create the index in the "lucene_index" directory


You can then run the queries on the collection from the classes directory as follows:
>java -cp .:../lib/* searching.QuerySearch ../cranfield-collection/lucene_index ../cranfield-collection/cran.qry.trec-format ../cranfield-collection/cran.qrels.trec-format

This should run the basic spud model using the queries and also calculate some effectiveness metrics for the queries. 



Copyright © 2015 Ronan Cummins
This work is free. It comes without any warranty to the extent 
permitted by applicable law. You can redistribute it and/or modify it 
under the terms of the Do What The Fuck You Want To Public License, Version 2,
as published by Sam Hocevar. See http://www.wtfpl.net/ for more details.






spud's People

Contributors

ronancummins avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.