carollei926-GettingAndCleaningData

GettingAndCleaningData

##Repo for Coursera course Getting And Cleaning Data

This ReadMe file will explain the content of the course project for course "Getting and Cleaning Data" and how the rest of the files in this repo work/connect with each other

Course Project Content

Data source: https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip

Full description of the dataset can be obtained here: http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones

You should create one R script called run_analysis.R that does the following.

Merges the training and the test sets to create one data set.
Extracts only the measurements on the mean and standard deviation for each measurement.
Uses descriptive activity names to name the activities in the data set
Appropriately labels the data set with descriptive variable names.
From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.

After you have finished step 5: Please upload the tidy data set created in step 5 of the instructions to the course project page. Please upload your data set as a txt file.

Process happened in run_analysis.R

Preparation- download and unlip files from above-mentioned URL and save to working directory.
Merges the training and the test sets to create one data set.

Merge the X_test and X_train data by raw to get a dataset with all subjects' test and train data.
Merge the Y_test and Y_train data by raw to get a dataset with all subjects' test and train activiies.
Create a group_flag variabe for both Subject_test and Subject_train, with value of "test" or "train" to indicate the research group the subject belongs to. Merge the Subject_test and Subject_train data by raw to get a dataset with all subjects' id and group.

Extracts only the measurements on the mean and standard deviation for each measurement.

Use grep() to get the index of feature names that has "mean" or "std" in it (using general formatting).
Apply the index to test and train data (X) to get the only data related to mean or standard deviation.

Uses descriptive activity names to name the activities in the data set

Rename X with meaningful feature using the index of subsetted feature name we just got in step 2-1.

Appropriately labels the data set with descriptive variable names.

Rename variables with meaningful names for subject(S) and Activity(Y) datasets
Column bind to join Subject(S), Activity(Y), and measure data(X) datasets into one "TidyDataSet", in which each row means one activity of one subject.

From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.

Subset a "partial" dataset from "TidyDataSet," with only the numeric measurement data (originally coming from dataset X) in order to prepare for getting average.
Use aggregate() to get the mean of each activity on each subject, group, and activity level (the original course project requires means of each variable on each activity, each subject level, but in order to withhold the group level data in the output dataset, and adding group level doesn't change the result, therefore I included the group level in the list()). The output dataset is named "TidyDataSet_avg".

After-steps

Create a "Deliverable" folder in the working directory
Use write.table() and row.names = FALSE to save a copy of "TidyDataSet_avg" (in .txt format) to [working directory]/Deliverable folder.

Description of Codebook

Codebook name : Codebook_TidyDataSet_avg.pdf
Codebook lists the following:

Source of original dataset
Data manipulation/analysis program: run_analysis.R
Summary of target dataset (TidyDataSet.txt)
List of variable name, type, length, and description

carollei926 / carollei926-gettingandcleaningdata Goto Github PK

carollei926-gettingandcleaningdata's Introduction

carollei926-GettingAndCleaningData

GettingAndCleaningData

This ReadMe file will explain the content of the course project for course "Getting and Cleaning Data" and how the rest of the files in this repo work/connect with each other

Course Project Content

Process happened in run_analysis.R

Description of Codebook

carollei926-gettingandcleaningdata's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent