Use the attached "Adult" data set (http://arcchive.ics.uci.edu/ml/datasets/Census+Income) of census data collected to predict income for the following steps.
The basic idea is to use the apply() function (Chapter 9) to clean the data, and the split-apply-combine pattern (Chapter 10) to analyze it.
-
Similar to last week, replace '-' with spaces, where appropriate, using the apply() function.
-
Determine how to deal with missing values (if any) and use apply() to make the changes.i
-
Use apply() with Use Defined Functions (UDFs) to analyze missing values, similar to page 178 (if appropriate).
-
Use the grouping and aggregation methods in Chapter 10 to analyze data vs. income in several different ways.
FOR EXAMPLE: education vs. income, job vs. income, job & education vs. income... etc. (This is NOT an exhaustive list. I expect you to do more).
Remember to document your steps and reasoning using markdown cells.