Giter Site home page Giter Site logo

playfair-projects's Introduction

Playfair Projects Build Status

This is where we'll be organizing our pitching/writing/etc process for all of the Lede Program's Playfair projects.

Introduction

Playfair is a way for both current and previous students to stay on top of analysis, visualization, and all the arcane corners of data. Our goals are:

  • Create quality portfolio pieces through a process that involves feedback and editorial oversight.
  • Give and receive feedback to make our projects collectively better, along with making sure we're functional members of society.
  • Create documentation of our processes so that others can learn from what we've created (and the really intense can reproduce and check our work)

Sounds good?

The Pitching/Submitting Process

See an example user directory with a project inside in the examples folder

You can read detailed instructions in PITCHING.md and SUBMISSION.md, but a summary is below.

  1. Submit a pitch with [Pitch] in the issue title.
  2. Receive and respond to and editorial feedback. Example
  3. Get your pitch approved. Example
  4. Create a story issue.
  5. Post updates and get feedback and post more updates
  6. When you're done with your content, send a pull request that includes your story issue number mine was #2
  7. If everything looks good, your pull request will be approved and the story issue closed

You must submit and complete a checklist as the first step to pitching/story submission/pull request-ing.

Story submission format

You can read detailed instructions in SUBMISSION.md. After you fork the repository, create your directories and complete your story, be sure to include the following in your pull request.

  • STORY.yml, which includes your title and summary
  • STORY.md, your actual content (or links to it)
  • README.md, summary of how you did your project
  • DIARY.md, a disorganized mess keeping track of your process
  • any images or code you used in the process of writing your story/building your visual

playfair-projects's People

Contributors

barjacks avatar bisaha avatar djlee0202 avatar gcgruen avatar jowang0319 avatar jsoma avatar junebugseo avatar kbennion avatar kromreig avatar mercybenzaquen avatar mercye avatar miaomiaorepo avatar oargueso avatar paolorivas avatar radhikapc avatar raschuetz avatar snajmabadi avatar spm2164 avatar sz2472 avatar thisss avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

playfair-projects's Issues

NYPD stop-and-frisk numbers drop, racial bias remains

NYPD stop-and-frisk [PITCH]

I want to look at how stop-and-frisk numbers have changed and whether racial bias has decreased as well.

Data source: NYPD, 2015

Got caught up in formatting data, that's why the graph is still so ugly.

ugly_graph-960px

Questions:

  • How many total stops are there? How do they distribute over boroughs?
  • How do stop-and-frisks evolve over time?
  • Does race relate to being stopped?
  • Do officers explain the reason for stopping someone?
  • Does race relate to being frisked?
  • Was an arrest made?
  • Was the person summonsed?
  • What was the the offense suspect was summonsed for?
  • What type of physical police force is most abundant?
  • How do the three most common types of police force relate to race?

Headlines:

  • People of color get searched more often by NYPD
  • NYPD stop-and-frisk data screwed: No scheme to gather offense data

Remarks/Problems:

  • Can you make multiples in matplotlib?
  • How to make a pie chart 2D instead of 3D?
  • Unfortunately you only realize how uninteresting your data is, once you are done analyzing it...

Design choices

  • Axis + Axis Labels: Myriad Pro, 14 pt
  • Text:
    • Font: Shree Devanagari 714
    • Headline: Bold, 24 pt, 85% greyscale
    • Descriptive Text: Regular, 12-18 pt, 65% greyscale
    • Annotation + Source: 10-12 pt, 65% greyscale
  • Color:
    • Purple Navy: #404e7c
    • Sunset Orange: #fe5f55
    • Turquoise: #4ce0d2
    • Sandstorm: #e8d245
    • Soap: #c6caed

Pitch issue checklist

  • My pitch issue has [Pitch] in the title
  • My pitch issue has [Data request] in the title if I need additional data
  • My pitch issue links to my data set
  • My pitch issue explains what I'd like to explore in the data set
  • My pitch issue includes multiple possible headlines
  • My pitch issue includes any links/images I'm using as inspiration
  • My pitch issue includes appropriate labels in the title such as [Pitch] and maybe [Data Request]
  • I have received two comments of peer feedback on my pitch issue
  • I have received editorial feedback on my pitch issue
  • My pitch has been approved

A better NBA contracts evaluation, maybe

Since points per game is barbarous and terrible, I decided to do some charts evaluating NBA contracts using better, more advanced stats. In this one I used VORP, which means value over replacement player. A replacement player is so bad that they are worth -2, and are actively making their team lose. I then highlighted the top VORP players by name and top contracts by dollar amount to see where they intersected, and where the really bad outliers were. Here it is:
nba_player_values_vorp_v1_2_1080px

The data sets I used are from basketball-reference.com, the contracts being here and the advanced stats being here

Testing out checkboxes

Story Issue

  • My pitch has been approved (see PITCHING.md)
  • I have created a story issue
  • My story issue links to my pitch issue
  • I have included an update of my visualization/story in a comment
  • I have received two comments of peer feedback

Directory Format

  • There is a folder in /projects/ that is my full name
  • Inside of that folder I have another folder just for this project

Visualization

  • My visualization has a title
  • I have cited by data source on the visualization
  • My visualization has my name/web site/twitter handle on it (only needs one)
  • I have exported my visualization as a png file

STORY.md

  • I have included a STORY.md
  • I have included a front matter section with the a title and summary
  • I have properly linked my image, with a format like ![](image.png)
  • If necessary, I have written text and such

Other files to include

  • README.md, linking to my data set (if possible) and giving a brief overview of how I accomplished my task
  • DIARY.md, the notes you were keeping for yourself while working on the projects (problems, solutions)
  • Any IPython/Jupyter Notebook files
  • Any PDF files you exported along the way
  • Any images or other files from your process

Pull request

  • My pull request links to my story issue

The Football Scraper and Transfer Visualizer

I have written a scraper that takes player data from the Swiss football league from Transfermarkt.com and makes four graphs on how the new players of the various teams compare to the players the teams sold: Experience, Size, Goals, Assists.

The charts looks like this:

image

And the code looks like this:
https://github.com/barjacks/fussball/blob/master/Super_League_Transfers.ipynb

Now I want to expand the scraper to the German, maybe also English football leagues. And add two more graphs: Points per game avg and age.

*_UPDATE:_

Adapting my Football Scraper to the Bundesliga

Most active Agents, most Successful Shoes? 10 Infographs on Bundesliga

1 Graph of total market value of the Bundesliga

  • Graph of Markt

*5 Graphs on players *

  • Graphs of new vs. old players. (1) Games, (2) Age, (3) Goals, (4) Assists, (5) Height.

** 2 Graphs on Shoes **

  • Graph by number of games of transfers
  • Graph of number of goals scored

*2 Graphs on Most Active Agents *

  • Graph by amount of players
  • Graph by amount of player value

Source: www.transfermarkt.com

Sports Misery Index

The inspiration for this piece comes from the fact that I'm from Cleveland, and Cleveland just this year won its first major league championship since 1964. Also, from articles like this from ESPN or this from some rando at Forbes where they attempt to track the same thing. They basically assign points to a city based on consecutive team seasons spent without winning their respective championships, and then reset the count to zero whenever any team from a city wins a championship.

My deal with this is that

  1. Not all seasons are created equal. A season in which your team goes to the playoffs is better than a season where your team is in last place. And
  2. Cities with multiple teams do no necessarily have overlapping fan bases. Just because the Cleveland basketball team won the championship, doesn't mean that all misery is dispelled. Football fans that do not care about basketball still have nothing since 1964. This is especially apparent in a city like Chicago with multiple teams in one sport. The Cubs haven't won the World Series in over 100 years but the White Sox last won one in 2005, and Cubs fans are almost never White Sox fans. The total sports misery of the city should still reflect that reality.

My methodology will be similar to ESPN, assigning points to seasons in which teams do not win championships and grouping them by city, and then attempting to increase the sophistication of the results. First, taking into account the contribution of each team in a city to the total city misery, and then adjusting that amount by qualitative measures like made it to playoffs, bottom of the league, etc.

Those who sweat together, eat together

I was trying to figure out a way to use the Harmonized European Time Survey that Soma sent out on Wednesday. Because eating was one of the categories that seemed to vary the most between countries, that's the one I looked at. At first, I just graphed them all, but then I noticed there were some cultural patterns between countries. I tried to see if I could find a way to make the length of the work week, the latitude, or the longitude to work as a proxy for the cultural differences happening, but it turned out that average temperature worked pretty well. I used the average high temperatures (since that would have the biggest impact on lunchtime behavior, which was the biggest spike on the graphs) from Weatherbase.

So I believe it's more of a correlation than anything, but it still made for an interesting graph!

Here's a preview of what I've been doing:
eating-3-3600

DJL: Shoppers Clicking Behavior regarding eShopping Banner

Interesting facts of eShopping Banners

Banner Ads are big opportunities to interact with online shoppers in their path to purchase.
Therefore, understanding the efficiency of banners is critical for online retailers as they attracts different amount of visitors. I am going to study website Banner Ads.

Sample Website Banner Composition

The website is providing 12 banners at Front Page, Search Page, and Category Page.
mainpage
category

Data Set

By following Steve's advice(Thank you, Steve), I was able to get the sample data from the online shopping website company by simply emailing them. They provided result of 8 advertisement campaign with no additional information regarding product. The data was provided only for academic purposes.

Based on the data provided by the mall, I am planning to analyze the relationship between clicking rate, banner type as well as time. The following is the data which I got from the shopping mall.
image

Data Dictionary

As shown above, there are 7 columns in the dataset.
The following are explanation of the Data Column.

  1. Project : Advertisement of the product
  2. Banner number: Location of the banner (Explained above.)
  3. Date passed : the number of date passed after Ads been posted
  4. Impression : the number of ad popped to customers
  5. Click : the number of click to banner
  6. CTR : Click / impression (i.e. Efficiency)

Pitch issue checklist

  • My pitch issue has [Pitch] in the title
  • My pitch issue has [Data request] in the title if I need additional data
  • My pitch issue links to my data set
  • My pitch issue explains what I'd like to explore in the data set
  • My pitch issue includes multiple possible headlines
  • My pitch issue includes appropriate labels in the title such as [Pitch] and maybe [Data Request]
  • I have received two comments of peer feedback on my pitch issue
  • I have received editorial feedback on my pitch issue
  • My pitch has been approved

GDP, Life Expectancy, Population

I have access to a dataset of the GDP, Life Expectancy and Populations of all countries in the world from 1950s to 2013. I would like to make a small multiples version of the dataset, with each multiple having a comparsion of one of the parameters (GDP, Life Expectancy, Population) with its performance on par with the continent and the world.

Data Source : The Country-GDP-Timeseries.csv

Inspirations :

Story issue checklist

My pitch was #13

  • My pitch has been approved (see PITCHING.md)
  • My story issue links to my pitch issue
  • I link to my finalized (ish) data source(s)
  • I've included a brief summary of my story
  • I've included some possible headlines or findings
  • I've included some links or images as inspiration (if you have any)
  • I have received two comments of peer feedback
  • I've included an update of my visualization/story in a comment
  • I have received two comments of peer feedback after posting an update
  • I have received editorial feedback

DJL: Large Residential Tax, Is it right?

1. Summary.

The purpose of this analysis is to understand the statistics regarding the size(sq.meter) of apartments in South Korea metrocities using Python.

Inspired link : https://www.washingtonpost.com/world/after-decades-of-economic-growth-south-korea-is-the-land-of-apartments/2013/09/15/9bd841f8-1c55-11e3-8685-5021e0c41964_story.html

2. Data Analysis

There are 8 metropolitan cities in South Korea. There are apartments in other locations/provinces, however, most apartments are concentrated in Metro Cities and because the amount of data is relatively small, they were ignored from the analysis. (Kyungi province was later added.)

According to the Korean Government, apartments larger than 914 sq ft are considered "large residential property" for tax purpose. (The calculation of tax is more complex, but they charge property tax 0.15% more or less.) I decide to figure out the number of such large apartments in our data set and its ratio to the entire market.

(1) The ratio of the large residential property in South Korea

![output-graph_01](https://cloud.githubusercontent.com/assets/19519829/17150940/663b13f8-533f-11e6-97be-15053abdbfc4.png)

output-graph_01_01

(2) Factors Correlated to the Large Apartment Ratio
Tested correlation of two factors: city budget and average GDP of the city toward large apartment ratio.

The ratio of Large Apartments were relatively consistent regardless of the city, and that the number of large apartments correlated positively with the city budget, but very small relationship was found between the city GDP and number of large apartments.

![output-graph_02_modified](https://cloud.githubusercontent.com/assets/19519829/17112883/f5faee02-5275-11e6-838f-c7f9896b4018.png) #

Data location, Study Material Location, and Origin of Codes
All Right reserved. Source, Data, Coding Method to orginal location of link mentioned below.

Story issue checklist

My pitch was (use the number): #19

  • My pitch has been approved (see PITCHING.md)
  • My story issue links to my pitch issue
  • I have included an update of my visualization/story in a comment
  • I have received two comments of peer feedback
  • I have received editorial feedback

Rents Increase VS. Airbnb in NYC

I was inspired by this interactive reports, mapping the relationship between the rents rise in Berlin and Airbnb bussiness expansion: http://www.airbnbvsberlin.de/

I want to do the same about New York City. It might include: the number of Airbnb listings in NYC, the distribution of prizes of the NYC Airbnb offers, the top 10 providers of Airbnb,top 10 streets and the number of apartments rental advertisements......

I will begin with graphing the Airbnb data, then the rents changes, then try to find a relationship between them.

Here is the data I may use:
rents data from streeteasy.com or zillow.com or .......
airbnb: http://insideairbnb.com/new-york-city/
http://insideairbnb.com/get-the-data.html

Datasets I have:
image

Create a fan map for the local soccer and icehockey teams

Is it true that football is an urban thing while ice hockey is the country bumpkin's game?

I recently received the ZIP-codes for every long term card for the games of Bern's biggest sport clubs "YB" (football) and "SCB" (ice hockey). I will harmonize this data, put it on a map and analyse it further.

Possible stories:

  • Your surrounding tells you who to fan for
  • Portrait of a fan that spends his money for football and ice hockey.
  • Reportage from the community with the highest subscription per capita

Social Media Popularity Over Time

Data Source: NYT Articles API

Track the popularity of several social media websites over time (popularity is based on the number of mentions in the NYT)

Potentially would like to look at:

  • Facebook
  • Instagram
  • Twitter
  • Tumblr
  • MySpace (May be interesting to include in order to show a point of transition)
  • Snapchat

Inspiration:http://blog.blprnt.com/blog/blprnt/is-twitter-the-new-internet
http://firstlook.blogs.nytimes.com/2009/07/30/visualizations-the-art-of-times-apis/?_r=0

3256480403_1bf499ae5b_b

Mass shooting in US

A visualization about the mass shooting incidents around the US from 2013-2016

Energy transition in Europe

Germany is hyped for its energy transition: the aim to meet 80% of its energy demand with renewables and cut primary energy demand by 50%, both by 2050.
Moreover, by 2022 the government wants to abandon nuclear energy. Countries like Japan, the US and Britain look at Germany's progress and there's the general perception of 'If the Germans can make it, we can, too!'

However if you look at other European countries, Germany is not leading the Top10 in Renewable Energy , but countries like Norway and Sweden are doing way better. Already now the exceed by far the goal the EU set: 20% of energy consumption from renewables by 2020 and use 80% and 50% renewable energy, respectively. Why aren't they the role models?

Idea for graphic series: Shed light on how European countries compare on renewable energy production and to see whether Germany is or is not a good role model

Background: This goes well with my actual work -- the magazine I work for will have energy transition as the next edition's issue. As the editorial team, we figured we know vaguely about how Germany compares to others, but not in detail, which is why we planned on digging into it.

Questions:

  • What is each countries total energy demand? (total)
  • What is household energy consumption? (per capita + share of total)
  • How do countries compare in renewables vs non-renewables as energy source?
    [two sided bar graph]
  • How has usage of renewables changed over time? (compare 2000 and latest data)
    [slope graph]
  • How has the energy price changed over time?
  • How does energy production distribute over energy sources?
    [multiples?]
  • Does energy demand relate to GDP and use of renewable energy? (maybe include non-EU-countries)
    [scatterplot, x = GDP, y = energy demand, bubble size = share of renewable energy in energy mix]
  • How does energy production/use relate to subsidies? (select 5-10 interesting countries based on previous analysis)
    [colored scatterplot, one color per energy source, one markertype per country]

Headlines:

  • For a role model on renewables, look at Norway, not Germany
  • The difference that you NOT make -- why saving energy per household doesn't matter
  • third headline yet to be found...

Data Sources:

Further links:

Remarks:

  • I find it difficult to come up with headline suggestions, without having analyzed the data

Pitch issue checklist

  • My pitch issue has [Pitch] in the title
  • My pitch issue has [Data request] in the title if I need additional data
  • My pitch issue links to my data set
  • My pitch issue explains what I'd like to explore in the data set
  • My pitch issue includes multiple possible headlines
  • My pitch issue includes appropriate labels in the title such as [Pitch] and maybe [Data Request]
  • I have received two comments of peer feedback on my pitch issue
  • I have received editorial feedback on my pitch issue
  • My pitch has been approved

Visualization of Gun Violence

CONTENT:
I want to make a graphic related to gun violence, what are the gun related incidents we already known? How about those that we don't? How often do people talk about gun control? Comparing by state, by country... will gun control really solve the safety problem?

DATASET:

IDEA:

  • Thinking about to use Twitter API to get all tweets related to the "gunControl" tag and see its relationship with mass shooting.. not really sure whether I can get the dataset or not!

SKETCH:
wechat_1469455590

No Cold War, No Space Race ! Dramatic Reduction in the Number of New NASA Research Facilities in the US

The world despised the cold war era. But let's look at the brighter side. One is the presence of two balancing polarities of power distribution. The other, no doubt, is the space race. Launch of Sputnik spawned a new era of competition in spaceflight capabilities, which went on to initiate lunar missions by both the parties. The US found success whereas USSR floundered despite the repeated attempts. The rivalry for supremacy intensified, so the need for space research facilities.

Growing up in the cold war era, closely watching the USSR and US competing for the number 1 position in space race, I have always been curious about the expansion of NASA. For the same reasons,
I picked up the NASA Laboratory facilities data set from Data.gov: http://catalog.data.gov/dataset/agency-data-on-user-facilities.

In the past, I had analyzed the data using pandas and created basic graphs. This time, in addition to reworking on the graphs that I had created earlier, more graphs have been added to visually represent the distribution and growth trend of NASA Laboratory facilities in the USA. What I have discovered is astounding. The number of new facilities has been drastically reduced from 60s to now. In 60s, around 152 new labs have been started operating, whereas in 2010-2016 the number has come down to just a few.

An Insight Into the Growth of NASA Laboratory Facilities

  • -NASA runs a total of 397 lab facilities, Nasa Intelsat runs 17, Department of Defence(DOD) runs 7 labs, Department of Energy runs 12 labs, Raytheon runs 5, and Orbital Sciences Corporation (osc) runs only one NASA lab facility.
  • - The highest number of labs are located in the state : Alabama
  • - The lowest number of labs are located in the state : Arizona
  • - The highest number of labs started during the period: 1960 - 69
  • - The number of facilities started during this period is : 152
  • - The agency that has the highest number of labs: NASA 2 (I guess it stands for NASA Glenn Research Facilities. I will confirm)
  • - During 2010- 2016 only two new lab facilities has been started. They are DYNAVAC THERMAL VACUUM CHAMBER and General Vibration Lab (GVL).
  • - Roslin HIcks handles the highest number of facilities. Under him there are 136 labs.
  • - Marshall Space Flight Center has the highest number of lab facilities. The number is again 136.
  • - There are 388 active NASA lab facilities in the US.
  • - There are 22 inactive lab facilities in the US.

Preview (Earlier Versions)

capture
fin

My files are at https://github.com/radhikapc/DataStudio/tree/master/homework2/nasa-set

Revised Version of the Graphics

nasa-fund-bar

Story Issue Checklist

My pitch was (use the number): #14
( 7/30/16 I am toying around with the idea of changing my pitch ! :-) )

  • My pitch has been approved (see PITCHING.md)
  • My story issue links to my pitch issue
  • I link to my finalized (ish) data source(s)
  • I've included a brief summary of my story
  • I've included some possible headlines or findings
  • I've included some links or images as inspiration (if you have any)
  • I have received two comments of peer feedback
  • I've included an update of my visualization/story in a comment
  • I have received two comments of peer feedback after posting an update
  • I have received editorial feedback

Bearable L Train Pain - Williamsburg’s Rental Price Reaction to L Train Shutdown Announcement

pitch #46

Williamsburg’s Rental Price Reaction is to L Train Shutdown Announcement was not so dramatic according to StreetEasy data.

Median asking rent for entire Brooklyn and Williamsburg have been going to pretty much same direction over past six years. After announcement was made on January 2016, rent started to decrease in Williamsburg while that of entire Brooklyn slightly rose.

There are so many other factors that might effect on rental prices in Williamsburg other than L train service. Moreover, it is hard to capture meaningful market reactions since actual repair will start in far future.

homework3_ver1

Does a Military Coup affect Economic Growth of a Country?

I discovered a dataset on Military Coups across the world, [thanks to Jeremy Singer Vine's Data is Plural.] - Two political science professors at the University of Kentucky are compiling a dataset of coup attempts. So far, the dataset covers both successful and unsuccessful attempts from 1950 to late 2015.

A coup as defined by this dataset is "illegal and overt attempts by the military or other elites within the state apparatus to unseat the sitting executive," and successes as episodes in which the perpetrators control power for at least 7 days. ".

During those 65+ years, coup plotters have been foiled about half the time, with 236 victories and 238 failures.

The impact of these coups on economic growth has not been studied much. We could try seeing the impact of this on a parameter such as GDP to see if they do have an impact or not.

I will annotate the GDP data I have made for my story earlier - along with this data on successful coups from here. The idea will be to produce visuals of small multiples with successful and unsuccessful coups and their impact on economic growth. These visuals can demonstrate whether or not coups have an impact on economic growth or is it the sluggish economic growth that causes these coups to occur in the first place.

How people run a 7,323 km marathon

Top marathon runners run at the amazing speed of 21 km/h. But how are the other runners doing and what has their age to do with their performance? I wanted to plot their age and speed to figure out. Ages range from 19 to 61 years old.

homework2-960px

Story issue checklist

My pitch was (use the number): NaN

  • My pitch has been approved (see PITCHING.md)
  • My story issue links to my pitch issue
  • I have included an update of my visualization/story in a comment
  • I have received two comments of peer feedback
  • I have received editorial feedback

It doesn't matter how terrible your movies are...

I used data from Box Office Mojo to look for the 50 highest grossing movies worldwide and ratings from Rotten Tomatoes to see if movies have to be rated highly to make a whole lot of money. This story is inspired by fivethirtyeight's Adam Sandler piece.

Sorry the chart looks terrible - tips on how to beautify warmly welcome.

movies_illustrator

NBA salary charts [Assistance Request]

Approved from Pitch #1

I'll be building several static scatterplot visualizations about player performance in the NBA (pandas + Illustrator). I'll see how it looks with points along with a few other statistics - blocks, turnovers, etc. Will look kind of like a nicer version of this:

screen shot 2016-07-21 at 4 26 08 pm

Data sets:

If anyone wants to help out let me know, I think there are some crazy math-y things we could do with the stats that I don't quite get.

Tips for the story part:

  • Be sure to mention your pitch in the first line - by typing a "#" it will automatically provide you with a list of issues you can link to. My original pitch was at #1, so I can type #1 to automatically link to my pitch.
  • Now that you're approved, you can go hunting for people to help you out

The Epochal Phenomenon of Pokeman Go: Visualization of Search and Download Trends

The GitHub repo: http://localhost:8888/notebooks/Downloads/Lede/20Julylede/02-classwork/homework1/inspirational/homework_1_1_radhika.ipynb

In the past one week, all the social media feeds, from my friends, pages I subscribe to, and news, were about the legendary phenomenon of Pokemon Go. I am neither an app nor a gaming enthusiast. However, it's not surprising that someone naturally becomes curious when the entire world begins to talk about something unprecedented. So I set out to grab dataset from all the available sources, such as Google trends, Github.

First, I have decided to emulate the graphs given at https://www.inverse.com/article/18272-4-pokemon-go-graphs-that-show-just-how-big-the-game-is. This link was given in the critiques.pdf in the 02-classwork.zip foder.

pokemon_trend.pdf
subreddit.pdf

Then, I picked up some of the trendy search terms, such as 'clinton', 'trump', 'islamic state', 'got' and ran the code to discover what's going on.
google_trend.pdf

Additionally, I have tried to discover the fluctuations in download trend for your favourite apps: Whatsapp, Facebook, Tinder, and now Pokemon Go.

app_trend.pdf

My pitch was (use the number): #14

  • My pitch has been approved (see PITCHING.md)
  • My story issue links to my pitch issue
  • I link to my finalized (ish) data source(s)
  • I've included a brief summary of my story
  • I've included some possible headlines or findings
  • I've included some links or images as inspiration (if you have any)
  • I have received two comments of peer feedback
  • I've included an update of my visualization/story in a comment
  • I have received two comments of peer feedback after posting an update
  • I have received editorial feedback

[DJL] The Size of Residential Apartments in South Korea Cities

The purpose of this analysis is to understand interesting statistical facts regarding the size of apartments in South Korea Cities. According to the Korean Government, residential properties larger than 900 sq ft are considered "large residential property" for tax purpose. I decide to figure out the number of such large apartments in the data set and its ratio to the entire market.

Large Apartment Ratio =
(Number of Large residential properties larger than 914 sq ft )/(Number of Entire Residential Property)

Inspired link : https://www.washingtonpost.com/world/after-decades-of-economic-growth-south-korea-is-the-land-of-apartments/2013/09/15/9bd841f8-1c55-11e3-8685-5021e0c41964_story.html

South Korea consists of 10 provinces & 8 metropolitan cities. Most apartments are concentrated in Metro Cities and Kyungi, a Province Area near Seoul.

image

8 Metropolitan Cities

Seoul (1), Busan (2), Daegu (3), Incheon (4), Gwangju (5), Daejeon (6), Ulsan(7), Saejong(8)

Suburban province containing majority of Apartments in South Korea:

Kyungi (A)

Which movie stars are worth what they are paid?

Box Office Mojo has data on the top grossing movies of all time, as well as the top-earning actors at the box office. Using data from their site, I'd like to write a story that calculates the actual value of each actor, using data on their salaries for each film versus how much each film makes.

In the absence of this data, I will use Forbes' list of the highest-paid actors and see which actors have consistently produced good movies (using Rotten Tomatoes ratings) and which actors are overrated.

Frozen Four Team Comparison and Predictions for Upcoming Season

I want to examine the Frozen Four (Division l college hockey) teams and compare and contrast things like their top lines' scoring averages, goalie save percentages, penalty kills, special teams, coaching staff, league comparisons, how many players were drafted to the NHL and where they were drafted. Then I want to take that data and look at the returning players on each roster to predict how good each of the four teams will be entering this upcoming 2016-17 season.

I have already created a data set combining each teams' 2015-16 stats and have began analyzing it.

Frozen Four Stats.xlsx

Possible Headlines: Frozen Four Top Lines Analysis
Frozen Four Top Lines Showdown
Frozen Four Top Lines Closer Look

Minimum wages and the cost of food

Story #85
When I went back home 2 months ago, I felt like a 4 year old. I had completely lost sense of the cost of food. And not because I had spent almost 4 years using US currency but because my country is now on the verge of hyperinflation. What I would have paid for an apple back in high school can't even buy a bite of an apple nowadays.

I thought it would be interesting to plot this in a graph. Plot how the price of an apple (I am just using apple as an example) has changed in the last 10, 5 years? Compare the cost of an apple with the minimum wage throughout the years (which has increased with the years for all the negative reasons). Or even comparing cost of food and minimum wage with that of other countries.

I have been looking for data, but haven't found anything worth using yet. I have emailed some of my HS professors to ask for help. I also contacted the owner of a supermarket to ask if they have a log of food prices that they can give me. He said he will get back to me.

So far, no specific data set, but in the process of getting one.

Rise and fall of music genres

It's always fun to see what genres are in the top hits. I think it'd be interesting to chart the rise and fall of music genres and annotate people who were at the beginning of genre's popularity. Has technology made today's tastes more diverse? The LA Times has one, but I don't think it's the best (you can't see it at once and it only goes through 2010).

Billboard has an archive of the top 100 lists, so that could help find the popular music (there's also an unofficial API I haven't taken a look at), and I know Spotify lists genres. It would be cool if I could find something with liner notes as well, so I could annotate popular songwriters/producers that contribute to the rise of a genre as well.

Pitch issue checklist

  • My pitch issue has [Pitch] in the title
  • My pitch issue has [Data request] in the title if I need additional data
  • My pitch issue links to my data set
  • My pitch issue explains what I'd like to explore in the data set
  • My pitch issue includes multiple possible headlines
  • My pitch issue includes appropriate labels in the title such as [Pitch] and maybe [Data Request]
  • I have received two comments of peer feedback on my pitch issue
  • I have received editorial feedback on my pitch issue
  • My pitch has been approved

Visualization of NBA points per game vs. salary

I found NBA player salaries on ESPN's site, and I'd like to link it with some performance data. I'm certain Stephen Curry is underpaid for how many points he scores per game.

It would be like this chart (but MUCH NICER) but with salary on the y-axis and points on the x-axis

screen shot 2016-07-21 at 4 26 08 pm

I need data for NBA player performance though.

Tips about how to pitch:

  • I put [Pitch] and [Data request] in the title to suggest the label "Pitch" and "Data request" - submitters can't label issues, only owners can.
  • By adding "Data request" people know that I'm looking for data, and they might want to hunt for it.
  • Providing an example of what you'd like to do is great, I uploaded this photo by dragging it onto the text area. Easy-peasy.

The geography of terrorist attacks since January, 2016

I want to make a chart comparing the death tolls of attacks in Europe, U.S. to the deadly attacks that struck North America and Central Asia. The goal is to show hotbeds of terrorism activities and the brutal consequences on civilians.
I hope to accomplish something like the WP's graphic:
http://postgraphics.tumblr.com/post/147505803793/how-terrorism-in-the-west-compares-to-terrorism
data source: https://www.ihs.com/products/janes-terrorism-insurgency-intelligence-centre.html

Sleepy Tourists

Story issue checklist

My pitch was (use the number): #4

  • My pitch has been approved (see PITCHING.md)
  • My story issue links to my pitch issue
  • I have included an update of my visualization/story in a comment
  • I have received two comments of peer feedback
  • I have received editorial feedback

I wanted to look at the countries of origin of tourists in hotels in and around my hometown Bern. The finding: Most people are coming from Switzerland. Second is Germany.

europroblem_bearbeitet

Dead Trees in East New York

I want to talk about how many dead trees there are in East New York. The median number of dead trees in a neighborhood is 43, while East New York has nearly 300, almost double the neighborhood with the next greatest amount of dead trees.

However, I'd like to look at a few other parameters before going forward with a "EAST NEW YORK HAS SO MANY DEAD TREES" story, such as maybe tree species diversity, recent history of disease, perhaps compare the 2015 tree census data to the 1995 data and see where things stand.

Data from: https://data.cityofnewyork.us/Environment/2015-Street-Tree-Census-Tree-Data/uvpi-gqnh

dead trees median: 43.0
dead trees mean: 53.7616580311

dead_trees_eny

Pitch issue checklist

  • My pitch issue has [Pitch] in the title
  • My pitch issue has [Data request] in the title if I need additional data
  • My pitch issue links to my data set
  • My pitch issue explains what I'd like to explore in the data set
  • My pitch issue includes multiple possible headlines
  • My pitch issue includes appropriate labels in the title such as [Pitch] and maybe [Data Request]
  • I have received two comments of peer feedback on my pitch issue
  • I have received editorial feedback on my pitch issue
  • My pitch has been approved

Evolution of marathon runners

I'd like to plot the evolution of marathon runners in a specific race called “Course de l'Escalade” in the last three years. Maybe I could try to extend it to 5 or 10 years, but the data isn't easy to handle (missing data for years 2011 and 2012 and changes in the data format over time).

If I can gather enough data, I'd like to see if these marathon runners improve over time, and maybe if some decline after a certain age.

I'd like to see similar graphs about how racers evolve, maybe you know some examples?

Pitch issue checklist

  • My pitch issue has [Pitch] in the title
  • My pitch issue has [Data request] in the title if I need additional data
  • My pitch issue links to my data set
  • My pitch issue explains what I'd like to explore in the data set
  • My pitch issue includes multiple possible headlines
  • My pitch issue includes appropriate labels in the title such as [Pitch] and maybe [Data Request]
  • I have received two comments of peer feedback on my pitch issue
  • I have received editorial feedback on my pitch issue
  • My pitch has been approved

How accessible is Madrid's subway network?

Based on data from Madrid subway's website, my aim is to visualize how easy/difficult it is for people with mobility problems (i.e., people on a wheelchair, elderly people) or other kind of needs (parents carrying their babies in a stroller or travelers with heavy luggage) to make their way through Spain's capital subway network. My inspiration is this map of NYC subway's accesible stations (source: https://subwayrecord.wordpress.com/2015/03/26/the-mtas-accessibility-gap/), which I would like to reproduce for Madrid:
subway-wheelchair-map-1433769981

Since the data are not directly available and scraping Metro de Madrid's website is not possible, I just built the database for the 275 stations manually, including information about the kind of equipment they have (escalators, elevators).

Exoplanets

I want to do a graphic showing all of the exoplanets discovered to date! An exoplanet is a planet that exists outside of our own solar system, and these days we know about hundreds of them. I want to make a graphic you can mouse over to highlight groups of planets that are similar in various ways, eg. earth-like or jupiter-like or even super-mega planets. Then, if you click on a planet, that planet and the rest of the planets in its system along with its star will snap together to form a solar system, with stats and everything! I drew inspiration from this:
exoplanets_large
And actually from this: Theories of Everything, Mapped from one of our Friday sessions.
My main dataset would be the NASA exoplanet archive, but I would probably also need various other things from the Kepler stellar archive to make sense of the scales and colors of things.

Also I will need to know js, so I might have to table this for another week and come up with a new pitch for this week. But this is what I was thinking about already.

Williamsburg rent price reaction to L train shutdown announcement

This January, MTA announced that L train will be closed for 18 months starting from 2019.
I want to see if rental housing market reacted to this announcement.
I specifically want to look at Williamsburg since 75% of the population in the area are renting out their apartment and large portion of them are commuting to Manhattan using L train.

Data used: (http://streeteasy.com/blog/download-data/)
Streeteasy, Median asking rent for studios / Median asking rent for 1 bedroom apartment
(I'm not sure which represents the dominant rental housing type in Williamsburg)

homework2

GDP and Population Over The World

A picture about what the most popular temperature for biking in Bay area.

image

Story issue checklist

My pitch was (use the number): #26

  • My pitch has been approved (see PITCHING.md)
  • My story issue links to my pitch issue
  • I link to my finalized (ish) data source(s)
  • I've included a brief summary of my story
  • I've included some possible headlines or findings
  • I've included some links or images as inspiration (if you have any)
  • I have received two comments of peer feedback
  • I've included an update of my visualization/story in a comment
  • I have received two comments of peer feedback after posting an update
  • I have received editorial feedback

Visualizing Michael Jackson's Earnings 1979-2015

Inspiration:
http://www.forbes.com/sites/kevinmurnane/2016/03/08/brilliant-data-visualization-brings-the-history-of-hip-hop-to-life/#7f48ca5a2213

http://blogs-images.forbes.com/kevinmurnane/files/2016/03/2pac-song-list.jpg

http://dandenney.com/tinkerings/spotify-top-100
http://poly-graph.co/billboard/
Date Source :

Billboard API no longer exists :( https://www.quora.com/unanswered/Why-did-Billboard-com-close-its-developer-platform

https://github.com/guoguo12/billboard-charts may work to access data

I need help finding historical data which Spotify does not seem to store.

Pitch issue checklist

  • My pitch issue has [Pitch] in the title
  • My pitch issue has [Data request] in the title if I need additional data
  • My pitch issue links to my data set
  • My pitch issue explains what I'd like to explore in the data set
  • My pitch issue includes multiple possible headlines
  • My pitch issue includes any links/images I'm using as inspiration
  • I have received two comments of peer feedback on my pitch issue

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.