To solve Statistics problem with R
Problem statement for Stats:
Statistical Computing Org published Flight records in and out from US between 1987 – 2008. Its more than 12 GB of data that cannot be loaded into memory for processing. Main idea is how to play with data to get valuable insights. http://stat-computing.org/dataexpo/2009/the-data.html
Though this course was specific to Statistics, I read R in parallel so I can utilize knowledge of both to come up with solutions.
Following information needs to be analyzed using R:
- Standard deviation: Check the standard deviation of distance travelled by American Airlines (AA)
- Plot using R: Draw a boxplot of unique carrier distance with distance.
- Plot using R: Direction of relationship between arrDelay and DepDelay by drawing a scattered plot.
- Probability: What is the probability that a flight which is landing/taking off is “WN” Airlines
- Probability: What is the joint probability of a flight getting cancelled and is supposed to travel less than 2000 miles given that the flight is “AA” Airlines.
- Prediction: Suppose arrival delays of flights belonging to “AA” are normally distributed with mean 15 minutes and standard deviation 3 minutes. If the “AA” plans to announce a scheme where it will give 50% cash back if their flights are delayed by 20 minutes, how much percentage of the trips “AA” is supposed to lose this money.
- Prediction: Assume that 65% of flights are diverted due to bad weather through the Weather System. What is the probability that in a random sample of 10 flights, 6 are diverted through the Weather System.
- Linear regression: Do linear regression between the Arrival Delay and Departure Delay of the flights.
- Multiple liner regression: Perform a multiple linear regression between the Arrival Delay along with the Departure Delay and Distance travelled by flights.