- Uses R version 3.03
- Needs reshape2 library to be installed
- To read in the final tidy data set in R, use
read.table("final.txt", header = TRUE, sep = "\t")
- Full Experiment here
- Data also can found here
- Experiment [README](https://github.com/QuasiGuru/Samsung-Activities/blob/master/UCI HAR Dataset/README.txt)
-
README.MD -- this file
-
CodeBook.MD -- data dictionary for cleaned up final data
-
final.txt -- cleaned up tidy data set
-
run_analysis.R -- file used to create tidy data set
-
UCI HAR Dataset/README.txt -- description of original experiment and files
-
UCI HAR Dataset/features_info.txt -- list of variables for features of original experiment
-
UCI HAR Dataset/features.txt -- list of all features
-
UCI HAR Dataset/activity_labels.txt -- list of activities
-
UCI HAR Dataset/train/X_train.txt -- training set
-
UCI HAR Dataset/train/y_train.txt -- training labels
-
UCI HAR Dataset/train/subject_train.txt -- list of users for training set
-
UCI HAR Dataset/test/X_train.txt -- testing set
-
UCI HAR Dataset/test/y_train.txt -- testing labels
-
UCI HAR Dataset/test/subject_train.txt -- list of users for testing set
- Download data from one of the links above and extract the files as shown above
- Place run_analysis.R in working directory (as suggested above)
- Run run_analysis.R
- Load the features (561 rows) from features.txt into a data frame
- Retrieve a list of the records that measure mean or standard deviation
- Clean up the feature name to make them cleaner and more descriptive
- Load the features (6 rows) from activity_labels.txt into a data frame
- Load the features (7352 rows of 561 columns) from X_train.txt into a data frame
- Rename the column names to the names from the features list
- Choose only the columns based upon the list found in Step 1 that measure mean or standard deviation
- Load the subject ids (7352 rows) from subject_train.txt into a data frame and add that column to the training data frame
- Load the training activity ids (7352 rows) from y_train.txt into a data frame and add that column to the training data frame
- Since we consider training and testing to be the same, the same process is used with the files X_test.txt, subject_test.txt, y_test.txt (all 2947 rows)
- Merge the training and testing data frames
- Add the appropriate activity name based upon the activity id
- Delete the activity id column (unneeded)
- Calculate the average values of the means/standard deviations for each actity and each subject
- Print the resulting data frame to file (tab delimited)