Giter Site home page Giter Site logo

pombredanne / elasticsearch-river-twitter Goto Github PK

View Code? Open in Web Editor NEW

This project forked from elastic/elasticsearch-river-twitter

0.0 1.0 0.0 210 KB

Twitter River Plugin for ElasticSearch

Home Page: http://www.elasticsearch.org

License: Apache License 2.0

elasticsearch-river-twitter's Introduction

Twitter River Plugin for ElasticSearch

The Twitter River plugin allows index twitter stream using Elasticsearch rivers feature.

In order to install the plugin, simply run: bin/plugin -install elasticsearch/elasticsearch-river-twitter/1.4.0.

---------------------------------------------------
| Twitter Plugin          | ElasticSearch         |
---------------------------------------------------
| 1.5.0-SNAPSHOT (master) | 0.20-0.90 -> master   |
---------------------------------------------------
| 1.4.0                   | 0.20-0.90 -> master   |
---------------------------------------------------
| 1.3.0                   | 0.20-0.90 -> master   |
---------------------------------------------------
| 1.2.0                   | 0.19                  |
---------------------------------------------------
| 1.1.0                   | 0.19                  |
---------------------------------------------------
| 1.0.0                   | 0.18                  |
---------------------------------------------------

The twitter river indexes the public twitter stream, aka the hose, and makes it searchable.

Breaking news: Twitter does not support anymore HTTP user/password authentication. You now must use OAuth API.

Prerequisites

You need to get an OAuth token in order to use Twitter river. Please follow Twitter documentation, basically:

  • Login to: https://dev.twitter.com/apps/
  • Create a new Twitter application (let's say elasticsearch): https://dev.twitter.com/apps/new You don't need a callback URL.
  • When done, click on Create my access token.
  • Open OAuth tool tab and note Consumer key, Consumer secret, Access token and Access token secret.

Create river

Creating the twitter river can be done using:

curl -XPUT localhost:9200/_river/my_twitter_river/_meta -d '
{
    "type" : "twitter",
    "twitter" : {
        "oauth" : {
            "consumer_key" : "*** YOUR Consumer key HERE ***",
            "consumer_secret" : "*** YOUR Consumer secret HERE ***",
            "access_token" : "*** YOUR Access token HERE ***",
            "access_token_secret" : "*** YOUR Access token secret HERE ***"
        }
    },
    "index" : {
        "index" : "my_twitter_river",
        "type" : "status",
        "bulk_size" : 100
    }
}
'

The above lists all the options controlling the creation of a twitter river.

By default, the twitter river will read a small random of all public statuses using sample API.

But, you can define statuses type you want to read:

For example:

curl -XPUT localhost:9200/_river/my_twitter_river/_meta -d '
{
    "type" : "twitter",
    "twitter" : {
        "oauth" : {
            "consumer_key" : "*** YOUR Consumer key HERE ***",
            "consumer_secret" : "*** YOUR Consumer secret HERE ***",
            "access_token" : "*** YOUR Access token HERE ***",
            "access_token_secret" : "*** YOUR Access token secret HERE ***"
        },
        "type" : "firehose"
    }
}
'

Note that if you define a filter (see next section), type will be automatically set to filter.

Tweets will be indexed once a bulk_size of them have been accumulated.

Filtered Stream

Filtered stream can also be supported (as per the twitter stream API). Filter stream can be configured to support tracks, follow, and locations. The configuration is the same as the twitter API (a single comma separated string value, or using json arrays). Here is an example:

{
    "type" : "twitter",
    "twitter" : {
        "oauth" : {
            "consumer_key" : "*** YOUR Consumer key HERE ***",
            "consumer_secret" : "*** YOUR Consumer secret HERE ***",
            "access_token" : "*** YOUR Access token HERE ***",
            "access_token_secret" : "*** YOUR Access token secret HERE ***"
        },
        "filter" : {
            "tracks" : "test,something,please",
            "follow" : "111,222,333",
            "locations" : "-122.75,36.8,-121.75,37.8,-74,40,-73,41"
        }
    }
}

Here is an array based configuration example:

{
    "type" : "twitter",
    "twitter" : {
        "oauth" : {
            "consumer_key" : "*** YOUR Consumer key HERE ***",
            "consumer_secret" : "*** YOUR Consumer secret HERE ***",
            "access_token" : "*** YOUR Access token HERE ***",
            "access_token_secret" : "*** YOUR Access token secret HERE ***"
        },
        "filter" : {
            "tracks" : ["test", "something"],
            "follow" : [111, 222, 333],
            "locations" : [ [-122.75,36.8], [-121.75,37.8], [-74,40], [-73,41]]
        }
    }
}

Indexing RAW Twitter stream

By default, elasticsearch twitter river will convert tweets to an equivalent representation in elasticsearch. If you want to index RAW twitter JSON content without any transformation, you can set raw to true:

curl -XPUT localhost:9200/_river/my_twitter_river/_meta -d '
{
    "type" : "twitter",
    "twitter" : {
        "oauth" : {
            "consumer_key" : "*** YOUR Consumer key HERE ***",
            "consumer_secret" : "*** YOUR Consumer secret HERE ***",
            "access_token" : "*** YOUR Access token HERE ***",
            "access_token_secret" : "*** YOUR Access token secret HERE ***"
        },
        "raw" : true
    }
}
'

Note that you should think of creating a mapping first for your tweets. See Twitter documentation on raw Tweet format:

curl -XPUT localhost:9200/my_twitter_river/status/_mapping -d '
{
    "status" : {
        "properties" : {
            "text" : {"type" : "string", "analyzer" : "standard"}
        }
    }
}
'

Ignoring Retweets

If you don't want to index retweets (aka RT), just set ignore_retweet to true (default to false):

curl -XPUT localhost:9200/_river/my_twitter_river/_meta -d '
{
    "type" : "twitter",
    "twitter" : {
        "oauth" : {
            "consumer_key" : "*** YOUR Consumer key HERE ***",
            "consumer_secret" : "*** YOUR Consumer secret HERE ***",
            "access_token" : "*** YOUR Access token HERE ***",
            "access_token_secret" : "*** YOUR Access token secret HERE ***"
        },
        "ignore_retweet" : true
    }
}
'

Remove the river

If you need to stop the Twitter river, you have to remove it:

curl -XDELETE http://localhost:9200/_river/my_twitter_river/

License

This software is licensed under the Apache 2 license, quoted below.

Copyright 2009-2013 Shay Banon and ElasticSearch <http://www.elasticsearch.org>

Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.

elasticsearch-river-twitter's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.