Giter Site home page Giter Site logo

kjahan / twitter_mining Goto Github PK

View Code? Open in Web Editor NEW
25.0 10.0 17.0 356 KB

Twitter Mining in Java

Home Page: http://www.kazemjahanbakhsh.com/codes/election.html

Java 100.00%
twitter mining nlp sentiment-analysis naive-bayes-classifier latent-dirichlet-allocation

twitter_mining's Introduction

Twitter Mining Project

This project is a ML/NLP library in Java for analyzing tweets and building predictive models. The predictive models are built to help election/ad/marketing campaigns dig into social media conversations (public opinions) in order to get insights for making intelligent decisions.

The project consists of four main packages and a resource directory:

  1. Algorithms package contains implementations of a few ml/nlp algorithms for running text analysis on tweets contents.
  2. Twitter package is designed to wrap twitter data regradless of the persistent layer that one uses to store/retrieve tweets.
  3. Runanalysis package is the interface for running ml/nlp algorithms.
  4. Utilities package is developed to provide a collection of helper classes for different analysis.
  5. Resources directory includes a few data sources used for tweets analysis such as stop words, training data for sentiment analysis and so on.

Packages Details:

Algorithms Package:

  1. LDA Algorithm: an implentation of Latent Dirichlet Allocation algorithm used for topic modeling.
  2. NaiveBayes Classifier: a customized version of Naive Bayes classifier for running sentiment analysis on tweets.
  3. TextAnalysis: a class for performing various text analysis such as computing word frequencies.
  4. TweetsStatistics: provides functionalities for computing basic statistics from tweets.

Twitter Package:

  1. Tweet: a representative class for tweets.
  2. TweetDate: a class for dealing with date range. This allows us to analyze tweets in a give time range.
  3. TweetsConstants: a class for constants and configuration parameters.
  4. TwitterDataSource: an interface designed to deal with different persistent layers.
  5. TwitterFileDataSource: an implementation of TwitterDataSource interface when persistent layer is raw File.
  6. TwitterMySqlDataSource: an implementation of TwitterDataSource interface when persistent layer is MySql DB.

Runanalysis Package:

  1. RunBayes: runs sentiment analysis on tweets using NaiveBayes class.
  2. RunLDA: runs topic modeling on tweets using LDA class.
  3. RunStatistics: runs basic statistics on tweets using TweetsStatistics class.
  4. RunTextAnalysis: runs text analysis on tweets using TextAnalysis class.
  5. ThreadPool & WorkerThread: a multi-threaded code for running analysis.

Utilities Package:

  1. DayIntervals: a class for reading day interval files and generating a list of day pairs.
  2. GenerateCsv: a class for generating a CSV file for post-processing and visualization steps.
  3. MapUtil: a class for printing a TreeMap data.
  4. Pair: a class for defining pair objects.
  5. SentimentLabel: sentiment labels.
  6. StopWords: a class for building stop words for NLP analysis.
  7. TimeZone: time zone class.
  8. TweetUtils: a helper class which has functionalities for cleaning/normalizing tweets.
  9. ValueComparator: a comparator class.

Tweets Data Schema:

This library requires your twitter data to be stored in a MySql database/table (i.e. politics/tweets). Schema of tweets table is shown below:

Field Type
id int(10) unsigned, PRI
timestamp int(10) unsigned
source varchar(40)
author varchar(20)
lat decimal(10,8)
lng decimal(11,8)
text varchar(140)
created at datetime

If you'd like to read more about this project, you should check Barack Obama or Mitt Romney: that's the question! web page. You can also check our published paper using this ML/NLP framework here: The Predictive Power of Social Media: On the Predictability of U.S. Presidential Elections using Twitter.

If you have any question about the code, contact me @ kDOTjahanbakhshATgmailDOTcom

Licence

Copyright (c) 2013 Black Square Media Ltd. All rights reserved.
(The MIT License)

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
'Software'), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

twitter_mining's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.