Giter Site home page Giter Site logo

anserini's Introduction

Note: To run YoGosling, you will need the twitter4j.properties, Java, Maven installed.
If you get into any problem or find any bug, please send an email to [email protected]
with a snippet of log where exception/error comes from, so that we can help/fix. :D

YoGosling

###Build with Maven

mvn clean package appassembler:assemble

YoGosling is a branch from Anserini[https://github.com/lintool/Anserini] project. Like Anserini, To run YoGosling, you must save your Twitter API OAuth credentials in a file named twitter4j.properties in your current working YoGosling root directory. See this page for more information about Twitter4j configurations. The file should contain the following (replace the ********** instances with your information):

oauth.consumerKey=**********
oauth.consumerSecret=**********
oauth.accessToken=**********
oauth.accessTokenSecret=**********

###Index and search

sh target/appassembler/bin/TRECSearcher -groupid <groupid> -index <index_name>  -host <host> -port <port> 

-host, -port options are the url and port of the open RTS valuation broker. -groupid option is the groupid obtained from RTS orgnizers. Details for getting groupids and conneting to RTS broker can be found at one of the discussions in the mailing list.

###Connect evaluation broker REST(ful) API POST /register/system

https://github.com/YoGosling/Anserini/blob/master/src/main/java/io/anserini/rts/Registrar.java#L26

GET /topics/:clientid

https://github.com/YoGosling/Anserini/blob/master/src/main/java/io/anserini/rts/TopicPoller.java#L33

POST /tweet/:topid/:tweetid/:clientid

https://github.com/YoGosling/Anserini/blob/master/src/main/java/io/anserini/rts/TRECScenarioRunnable.java#L168

To get rid of the whelming log info, there is a separate log to check whether YoGosling did the right thing: push seemingly "relevant" tweets! Under the root directory,

cd src/main/java/io/anserini/rts/scenarioLog
vi scenarioALog 

Where you will probably see something like this,

Scenario A      24 Jun 2016 14:36:35 GMT        1466778995738   MB256   746351277148372992
Scenario A      24 Jun 2016 14:38:35 GMT        1466779115319   MB415   746351738509271040
Scenario A      24 Jun 2016 14:39:40 GMT        1466779180099   MB415   746352040503349249

Also for scenario B,

cd src/main/java/io/anserini/rts/scenarioLog
vi scenarioBLog 

###Algorithm

YoGosling is a modified version of the best performing automatic system in TREC 2015. For algorithm details, please refer to the paper, Simple Dynamic Emission Strategies for Microblog Filtering

####Relevance Scoring Example: Star Wars

Document #298: { coins } ll 2016 Niue $2 1 oz. Proof Silver Star Wars Classics Series - Han Solo | GEM Proof (Original Mint ... link

Interest Profile: Star Wars

titleQuery: text:star text:wars 

titleCoordSimilarity = 2/2 = 1.0 

titleExpansionQuery: +(text:star^3.0 text:wars^3.0) #epoch:[1464847032 TO 1464847092]

titleExpansionSimilarity = 6.0 (as follows)

finalSimilarityScore = titleCoordSimilarity * titleExpansionSimilarity = 1.0 * 6.0 = 6.0

YoGosling log snippet

2016-06-02 13:58:12,592 INFO  [Timer-2] rts.TRECScenarioRunnable (TRECScenarioRunnable.java:305) - 6.0 = sum of:
  3.0 = weight(text:star^3.0 in 298) [TitleExpansionSimilarity], result of:
    3.0 = score(doc=298,freq=1.0), product of:
      3.0 = queryWeight, product of:
        3.0 = boost
        1.0 = idf(docFreq=1, maxDocs=445)
        1.0 = queryNorm
      1.0 = fieldWeight in 298, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        1.0 = idf(docFreq=1, maxDocs=445)
        1.0 = fieldNorm(doc=298)
  3.0 = weight(text:wars^3.0 in 298) [TitleExpansionSimilarity], result of:
    3.0 = score(doc=298,freq=1.0), product of:
      3.0 = queryWeight, product of:
        3.0 = boost
        1.0 = idf(docFreq=1, maxDocs=445)
        1.0 = queryNorm
      1.0 = fieldWeight in 298, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        1.0 = idf(docFreq=1, maxDocs=445)
        1.0 = fieldNorm(doc=298)

2016-06-02 13:58:12,592 INFO  [Timer-2] rts.TRECScenarioRunnable (TRECScenarioRunnable.java:306) - Multiplied by 1.0 Final score 6.0
2016-06-02 13:58:12,592 INFO  [Timer-2] rts.TRECScenarioRunnable (TRECScenarioRunnable.java:308) - Raw text{ coins } ll 2016 Niue $2 1 oz. Proof Silver Star Wars Classics Series - Han Solo | GEM Proof (Original Mint ... https://t.co/6pQTdwW9Iw 2

Anserini

Twitter (Near) Real-Time Search

To get access to the Twitter public stream, you need a developer account to obtain OAuth credentials. After creating an account on the Twitter developer site, you can obtain these credentials by creating an "application". After you've created an application, create an access token by clicking on the button "Create my access token".

To to run the Twitter (near) real-time search demo, you must save your Twitter API OAuth credentials in a file named twitter4j.properties in your current working directory. See this page for more information about Twitter4j configurations. The file should contain the following (replace the ********** instances with your information):

oauth.consumerKey=**********
oauth.consumerSecret=**********
oauth.accessToken=**********
oauth.accessTokenSecret=**********

Once you've done that, fire up the demo with:

sh target/appassembler/bin/TweetSearcher -index twitter-index

The demo starts up an HTTP server on port 8080, but this can be changed with the -port option. Query via a web browser at http://localhost:8080/search?query=query. Try birthday, as there are always birthdays being celebrated.

User could change the maximum number of hits returned at 'http://localhost:8080/search?query=birthday&top=15'. The default number of hits is 20.

anserini's People

Contributors

aroegies avatar iorixxx avatar jimmy0017 avatar lintool avatar luchentan avatar xeniaqian94 avatar xingniu avatar yogosling avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.