Giter Site home page Giter Site logo

fjordan / mortar-examples Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sourcec0de/mortar-examples

1.0 0.0 0.0 1.18 MB

Mortar Project with examples for several different public data sets and data types/formats

Home Page: http://help.mortardata.com/

License: Apache License 2.0

Python 38.49% PigLatin 59.93% R 1.58%

mortar-examples's Introduction

Welcome to Mortar!

Mortar is a platform-as-a-service for Hadoop. With Mortar, you can run jobs on Hadoop using Apache Pig and Python without any special training.

Getting Started

To get started follow the Mortar Example Tutorial.

Help

For lots more help and tutorials on running Mortar, check out the Mortar Help site.

Examples

airline_travel: CSV data from Bureau of Labor Statistics

The airline_travel pigscript takes data from the Bureau of Transportation Statistics and uses it to find out how airlines perform when we normalize for the airports they fly from and to.

coffee_tweets: JSON data from Twitter

The coffee_tweets pigscript answers the question "Which US state contains the highest concentration of coffee snobs?". It analyzes and aggregates twitter data from the twitter-gardenhose, looking for telltale signs of coffee snobbery in tweets.

common_crawl_trending_topics: Dataset of technology news webpages taken from the Common Crawl

The common_crawl_trending_topics pigscript finds single-word trending topics by month from a corpus of technology news webpages (techcrunch, gigaom, and allthingsd). It does this by calculating the frequency of each word in each month, finding the "frequency velocity" from month to month, and selecting the words with the highest frequency velocity in each month.

excite: Search log data from excite! search engine

The excite pigscript shows an example of loading search engine logs from the excite! search engine and joining them up to a users table. This is a common pattern for web log analysis.

millionsong: Million song dataset

Two pigscripts explore the publicly-available Million Song Dataset.

The first, top_density_songs finds the songs with the most beats per second in the 1MM song dataset. Code to REALLY FAST music!

The second, hottest_song_of_the_decade figures out which song is the hottest for each decade of data in the million song dataset.

nasa_logs: Apache logs from NASA

The nasa_logs pigscript is an example of parsing Apache logs to find the most-served resources by date. It takes a sample of two month's worth of logs from NASA Kennedy Space Center's web server in 1995 and finds for each date the number of requests served, the number of bytes served, and the top 10 resources served (images are filtered out since most of the requests are just for icons). It can take a parameter ORDERING equal to either 'num_requests', to rank resources by the number of requests served, or 'num_bytes', to rank resources by number of bytes served.

twitter_sentiment: JSON data from Twitter

The twitter_sentiment pigscript finds which words are most likely to appear in tweets expressing a "postive sentiment" and which words are most likely to appear in tweets expressing a "negative sentiment". It calculates these likelihoods by looking at the frequency of a word in the corpus of positive/negative tweets diveded by the frequency of that word in the corpus of all processed tweets. The words that cause tweets to be classified as positive/negative (ex. "awesome", "disappointing") in the first place are excluded from the associations, so you can see what caused the sentiments instead of the sentiments themselves. The tweets are taken from the twitter-gardenhose.

Advanced Examples

Twitter Pagerank

A separate Mortar project, twitter-pagerank shows how to embed Pig in a Jython controlscript. This project runs Pagerank, an algorithm that uses several iteration steps, on a subset of the Twitter follower graph. The result is a list of who influential people on Twitter tend to follow. There is a tutorial on the Mortar help site which walks through the twitter-pagerank project.

mortar-examples's People

Contributors

chrisngan24 avatar gkarora avatar jspacker avatar markroddy avatar redcat9 avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.