Giter Site home page Giter Site logo

scpei / foursqtweets Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ynloveacg/foursqtweets

0.0 1.0 0.0 1.43 MB

For people who check in on FourSquare at a particular place, are there any common interests (e.g. The type of users they are following on Twitter) between them? Can I find some patterns and characteristics for a place?

Python 100.00%

foursqtweets's Introduction

FoursqTweets

For people who check in on FourSquare at a particular place, are there any common interests (e.g. The type of users they are following on Twitter) between them? Can I find some patterns and characteristics for a place?

How it works:

There are four python scripts to collect and manipulate data. After running the four scripts, a txt file will be created for generating a tag-cloud image on Tagul.com. Now I will go through the steps of the workflow.

Step1: Choose one category you want to search, and define the searching criteria (latitude and longitude, Intent: browse, radius, Category) on Foursquare API "Search Venues" page, and copy the url into the script. In the modified version, you can choose three different categories.

Step2: Run script "si601 project_yni.py" to find the top3 venues with checkin accounts in a certain category. The script will create a database called "si601-project_yni.db" with three tables for the three venus. For each venue, the script will find the mayor and at most 20 users who left a tip there(Foursquare API can only show at most 20 tips for a venue), and it will look for he user's twitter account then write at most 10 users' data into that venue's table with users full name, their foursquare id and twitter account names .

Step3: Run the script "si601project_yni_twitter.py". The script will read the selected table in the database, pull out the users' twitter account name, get descriptions of the users and at most 10 of the friends they began following recently, and write them into a .txt file in the naming format of "top_venue_(venue number)_(twitter account).txt". If the twitter account is invalid, it will print out the error message and won't create the .txt file for the user.

Step4: Run the script "si601project_yni_wordcount.py". The script will read the txt files of twitter users descriptions, count each word matching the regular expression "[\w]+", sort the words according to frequency, create a database for each venue (venue1/2/3.db), create tables named after the Twitter accounts, and write the word and count into the it. In this step I faced a problem when I was using MRjob, since I have to read and write every txt file and I need to remove the "'"for each word. I finally solved the problem by replaying MRjob with another python script to automatically read every txt file created in prior steps and write the results into the database.

Step5: Run the script "si601project_yni_analysis.py". This script will read the venue database, create a list to put in all the "unique word" of each description, filter out meaningless words, and count the number of rest of the words in each venue. The reason why I didn't count each word directly from all the descriptions is to avoid bias caused by a particular user. For example, if the word "football" appears 15 times in the descriptions collected for a twitter user, it may lead to misinterpretation that the 10 users for a venue have a common interest of football. So only counting unique word in descriptions for a user and see the overlap for the 10 users can better reflect if their interests overlap in some ways. The script "si601project_yni_analysis.py" will generate two txt files for each venue, one is for creating a tagcloud on Tagul, with words repeated according to their frequency in the analysis; another one is for plain reading, with word and the number of its frequency.

foursqtweets's People

Contributors

ynloveacg avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.