Giter Site home page Giter Site logo

pln-fing-udelar / pghumor Goto Github PK

View Code? Open in Web Editor NEW
52.0 13.0 20.0 93.47 MB

Is This a Joke? Humor Detection in Spanish Tweets

License: Apache License 2.0

Python 96.32% CSS 0.07% HTML 2.18% JavaScript 1.00% Shell 0.43%
humor humor-detection spanish-tweets freeling-servers tweets nlp machine-learning svm classifier humorous-accounts

pghumor's Introduction

pgHumor: Humor detection in Spanish tweets

This thesis is about deciding if a tweet written in Spanish is humorous or not, applying Supervised Machine Learning. It was carried out by Matías Cubero and Santiago Castro, and supervised by Guillermo Moncecchi and Diego Garat. For detailed information, see the final report.

Abstract

Looking at this tweet:

— Yesterday, when leaving work I ran over a unicorn.

— No way, you got job?

which is the translated version of this one:

— Ayer, al salir del trabajo atropellé a un unicornio.

— No jodas, ¿tenés trabajo?

Make us think: what makes it funny? What is Humor? What generates laughter? This project tries to approach this. Theory does exist, however none manages to be completely accurate.

16,488 tweets where fetches from humorous accounts and 22,875 from non-humorous (news, philosophical phrases and interesting facts). A web app and an Android app were made so people could give their opinion about which ones are really humorous. 33,531 votes were received from early September to the end of October 2014 (thanks!). It turned out to be little humor in humorous accounts:

Humor ratio according to the people

This classifier was built based on features that search for informality, certain kind of format, topics that cause psychological tension, among others. It uses techniques such as SVM, kNN, Decision Trees and Naïve Bayes. It achieves a precision of 83.6% and a recall of 68.9% over the created corpus.

A demo was also developed to show the obtained results.

Additional work

We want to thank Diego Serra and Ignacio Acuña, who carried out their High Performance Computing course project about this job, supervised by Sergio Nesmachnow, with the aim of improving the performance of the algorithms when computing the features values. It can be seen in the hpc-entrega tag. The continuation of their line of work is in the hpc branch.

Installation

The main dependencies are:

  • Python 2.7 (and some libraries; please see the code)
  • MySQL
  • Freeling (SVN revision number 2588)

Setup

corpus.sql and chistesdotcom.sql dumps must be loaded.

In the file clasificador/config/environment.py write the Twitter API credentials and the database related information. An example of this files is the following:

# coding=utf-8
from __future__ import absolute_import, division, print_function, unicode_literals

import os

# Twitter API credentials
os.environ['CONSUMER_KEY'] = '--CONSUMER KEY--'
os.environ['CONSUMER_SECRET'] = '--CONSUMER SECRET--'
os.environ['ACCESS_KEY'] = '--ACCESS KEY--'
os.environ['ACCESS_SECRET'] = '--ACCESS SECRET--'

os.environ['DB_HOST'] = 'localhost'
os.environ['DB_USER'] = 'pghumor'
os.environ['DB_PASS'] = '--PASSWORD--'
os.environ['DB_NAME'] = 'corpus'
os.environ['DB_NAME_CHISTES_DOT_COM'] = 'chistesdotcom'

Export and save the env variable of Freeling too:

FREELINGSHARE=/usr/local/share/freeling
echo "export FREELINGSHARE=$FREELINGSHARE" >> ~/.bashrc

Run

Start Freeling servers (to compute the features):

./freeling.sh start

And then execute:

clasificador/main.py

To stop the Freeling servers:

./freeling.sh stop

Help

clasificador/main.py --help

Server mode

clasificador/main.py --servidor

To test it:

curl --data-urlencode texto="This is a test" localhost:5000/evaluar

Run tests

./tests.sh

Citation

If you use this work in research, please cite us:

@inproceedings{castro2016joke,
  title={Is This a Joke? Detecting Humor in Spanish Tweets},
  author={Castro, Santiago and Cubero, Mat{\'\i}as and Garat, Diego and Moncecchi, Guillermo},
  booktitle={Ibero-American Conference on Artificial Intelligence},
  pages={139--150},
  year={2016},
  organization={Springer}
}

pghumor's People

Contributors

bryant1410 avatar drdub avatar matu1104 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pghumor's Issues

material-camera

Hi there.

I was wondering if you can tell me how to add your fork: https://github.com/xmartlabs/material-camera/tree/feature/camera-changes from camera-changes branch to gradle.

I can't get it to work. Is there anything that you can do to make it work - register it with Mavern etc?

How do you incorporate it. I tried jitpack but it doesn't work.

Please forgive that I wrote the message here because your materical camera fork doesn't permit creating issues.

License for pghumor?

Hi,

Are you considering putting the code under a free sofware license?

PS: Congrats on the prize at Iberamia 2016!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.