Project for Coursera Getting and Cleaning Data
README: **** Human Activity Recognition Using Smartphones Dataset
Author: _[Name withheld during grading process] _ Date : June 16, 2014
Data Source:
Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz, Human Activity Recognition on Smartphones using a Multiclass Hardware-Friendly Support Vector Machine, International Workshop of Ambient Assisted Living (IWAAL 2012). Vitoria-Gasteiz, Spain, Dec 2012
Data Retrieval:
https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip
Project Objective:
Create an R script that transforms the retrieved data for subsequent analysis, in two steps:
STEP 1
- Merge the "training" and the "test" datasets
- For each "measurement", the mean and standard deviation values are recorded
- Descriptive names are given to each of the "activities"
- Descriptive variable names are provided
- Descriptive labels are given to the dataset(s)
STEP 2
Create a "tidy dataset" with the "average" (only) of each activity and each "subject"
Definition of Terms:
"training dataset" –70% of observations selected at random to build model(s)
Subjects _(n1=21):_ # 1 3 5 6 7 8 11 14 15 16 17 19 21 22 23 25 26 27 28 29 30
"test dataset" –30% of observations withheld for subsequent model testing
Subjects _(n2= 9): _# 2 4 9 10 12 13 18 20 24
_"subject" _– 30 volunteers, aged 19-48, each wearing a smartphone on their waist
"activities" – Six physical activities are recorded:
-
WALKING
-
WALKING_UPSTAIRS
-
WALKING_DOWNSTAIRS
-
SITTING
-
STANDING
-
LAYING
"measurements" – A 561-feature vector with time and frequency domain variables for each subject and each activity with respect to:
- Triaxial acceleration from the accelerometer (total acceleration)
- Estimated body acceleration.
- Triaxial Angular velocity from the gyroscope.
"tidy dataset" – one that adheres to principles established in Hadley Wickham article,
"Tidy Data", retrieved from http://vita.had.co.nz/papers/tidy-data.pdf on June 16, 2014
run_analysis.RData** --- **How the Script Works
My R working directory is /Users/jerry/Desktop/JHU DS Certif/UCI HAR Dataset; it contains all necessary files.
STEPS
- To download the data, I changed https:// to http://
- Read in 3 test files, 3 training files, a features file and an activity labels file
- Used cbind and rbind to create a data table data, 10299 x 563
- Used grep to reduce to data table with mean or stdmoments_data, 10299 x 81
- Removed the 13 columns with Freq mean_std_data, 10299 x 68
- Reshape2 bombed; used reshape package to melt and casttidy_averages, 180 x 68
- Applied plyr package to revalue activity levels
- Randomly sampled tidy_averages to test all is well
- Extended max.print to 999999 to assure output file would not be cut short
- Used capture.output to print tidy_averages as a file UCI.HAR.JP.tidy.averages.txt