Giter Site home page Giter Site logo

cps842's Introduction

To build this project please install Golang and dep, then on mac/linux run build.sh

Golang: https://golang.org/doc/install\ Dep: https://github.com/golang/dep#installation

###Running the program Please run the right version of the binary for the os. Under output folder there are 3 file

  1. cps842-linux
  2. cps842-mac
  3. cps842-win

####Run invert Program help, using the -h flag

./cps842 invert -h
Take a collection of documents and generate its inverted index.

Usage:
  cps842 invert [flags]

Flags:
  -f, --file string            The location of CACM collection (required)
  -h, --help                   help for invert
  -o, --output-folder string   The location to output the final files (required)
  -p, --porter                 enable Porter's Stemming algorithm
  -s, --stop-word string       add stop word removal

To run the program without porter stemming/ stop word and assuming the input file is cacm.all

./cps842 invert -f cacm.all -o "./data"

With stop word

./cps842 invert -f cacm.all -o "./data" -s common_words

For Porter's Stemming algorithm and to lower case every word add either -p or -l flags to the run.

Ex of all the flags, ./cps842 invert -f cacm.all -o "./data" -s common_words -l -p

####Run search

Program help, using the -h flag

./cps842 search -h
Search for the best documents match using cos-sim and output the best match

Usage:
  cps842 search [flags]

Flags:
  -f, --folder string   Folder location where posting/doc files are (required)
  -h, --help            help for search
  -i, --interact-mode   Interact with the user
  -s, --search string   What to search for

Given that the posting file is in the folder data. Run the following command to execute the program

 ./cps842 search -f ./data -s "<term/sent>"

Or to use the interact mode

 ./cps842 search -f ./data -i

####Run eval Program help, using the -h flag

./cps842 eval -h
Evaluate the performance of the IR system, output will be the average MAP and R-Precision values over all queries

Usage:
  cps842 eval [flags]

Flags:
  -f, --folder string   Folder location where the posting/doc files are (required)
  -h, --help            help for eval
  -r, --qrels string    The qrels.text file
  -q, --query string    The query file
  -c, --cosSimW float   The cosine similarity score weight (w1)
  -p, --pageRankW float The PageRank score weight (w2)

Given that the input folder isdata, qrels ./input/qrels.text and query is ./input/query.text . Run the following command to execute the program

./cps842 eval -f data -r ./input/qrels.text -q ./input/query.text

cps842's People

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.