The run_analysis.R script does the following:
- Merges the training and the test sets to create one data set.
- Extracts only the measurements on the mean and standard deviation for each measurement.
- Uses descriptive activity names to name the activities in the data set
- Appropriately labels the data set with descriptive activity names.
- Creates a second, independent tidy data set with the average of each variable for each activity and each subject.
Some assumptions if you are running this R Script
- This directory is already present --> "UCI HAR Dataset"
- The relevant data files are also present within this directory
- Expecting that the "reshpae", "reshape2" package and the plyr package is installed.
- If not, then you will have to install it before running this script
- Importing the Test Data
- Import the training data
- Merge the training and the test sets to create one data set
- From the Merged data, extract only those measurements which contain the mean and standard deviation for each measurement
- Use descriptive activity names to name the activities in the data set AND appropriately label the data set with descriptive variable names
- Read in the Activity Labels file
- Include the Activity Names as a column in the main data frame
- Including the Subject ID's as a column in the main data frame
- Now the main data frame contains the following: a) 1st Column ----> Subject ID's b) 2nd Column ----> Activity Names c) Remaining 81 columns ----> all measures containing mean or std deviation values
- creates an independent tidy data set with the average of each variable for each activity and each subject
- Create an output file for writing out the tidy data output
The name of the resulting clean dataset is: tidy_data.txt
. It contains one row for each subject/activity pair and columns for subject, activity, and each feature that was a mean or standard deviation from the original dataset.