Giter Site home page Giter Site logo

ir-project's Introduction

This is the readme page of our IR project.

You can modify this page by editing the README.md file found in the root directory of the project.

ir-project's People

Contributors

croo avatar asieraduriz avatar

Watchers

James Cloos avatar  avatar  avatar  avatar  avatar

ir-project's Issues

Spell check: Soundex

For Twitter has most words which more sound like the actual vocabulary words. So ONLY Lavenshtein algo might not work best as rearranging will in many cases not result in a valid word, let alone a not intended word (as the vocabulary is large with complex, not-so-frequently used words as well, and this might just be a normal, daily-use-word with a drastically variant spelling).

dealing with words that appear with a 'not' - negations!

for tweets like "..is not funny", "..such non sense", "..he is no good",
we switch the positive and negative values of the words next to the 'not-words' (in this case: not, non, no). so
if originally positive/negative values of "good" is 0.8/0.3. then in this case, it becomes 0.3/0.8

Classifiers

  1. Simple linear classifier: Uses the weights of the words derived from (SentiWordNet + Synonyms) to classify the tweets as positive or negative.
    Example:
    tweet: happy cake
    it averages the positive values of happy and cake, and the same for negative values. then find out how positive or negative the tweet as a whole is.
    For new words appearing in the test data, the overall positive/negative score is given to the tweet, if the majority of the words included in it are already in the vocabulary.
  2. Naive Bayes classifier. The training data of N tweets. For training data, the tweets are classified manually. Then the new words present in the tweets are processed individually and a probability is assigned to it. Learn new words and add in the vocabulary.
  3. Both together. Ensemble of classifiers. For agreed results, no learning and straightforward classification. But for conflicted results, iterations and learning happens.

Which of the three classification methods most efficient? (Precision, recall)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.