UCI_HAR_Project

Project for Coursera Getting and Cleaning Data

README: **** Human Activity Recognition Using Smartphones Dataset

Author: _[Name withheld during grading process] _ Date : June 16, 2014

Data Source:

Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz, Human Activity Recognition on Smartphones using a Multiclass Hardware-Friendly Support Vector Machine, International Workshop of Ambient Assisted Living (IWAAL 2012). Vitoria-Gasteiz, Spain, Dec 2012

Data Retrieval:

https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip

Project Objective:

Create an R script that transforms the retrieved data for subsequent analysis, in two steps:

STEP 1

Merge the "training" and the "test" datasets
For each "measurement", the mean and standard deviation values are recorded
Descriptive names are given to each of the "activities"
Descriptive variable names are provided
Descriptive labels are given to the dataset(s)

STEP 2

Create a "tidy dataset" with the "average" (only) of each activity and each "subject"

Definition of Terms:

"training dataset" –70% of observations selected at random to build model(s)

       Subjects _(n1=21):_ # 1 3 5 6 7 8 11 14 15 16 17 19 21 22 23 25 26 27 28 29 30

"test dataset" –30% of observations withheld for subsequent model testing

       Subjects _(n2= 9): _# 2 4 9 10 12 13 18 20 24

_"subject" _– 30 volunteers, aged 19-48, each wearing a smartphone on their waist

"activities" – Six physical activities are recorded:

WALKING
WALKING_UPSTAIRS
WALKING_DOWNSTAIRS
SITTING
STANDING
LAYING

"measurements" – A 561-feature vector with time and frequency domain variables for each subject and each activity with respect to:

Triaxial acceleration from the accelerometer (total acceleration)
Estimated body acceleration.
Triaxial Angular velocity from the gyroscope.

"tidy dataset" – one that adheres to principles established in Hadley Wickham article,

"Tidy Data", retrieved from http://vita.had.co.nz/papers/tidy-data.pdf on June 16, 2014

run_analysis.RData** --- **How the Script Works

My R working directory is /Users/jerry/Desktop/JHU DS Certif/UCI HAR Dataset; it contains all necessary files.

STEPS

To download the data, I changed https:// to http://
Read in 3 test files, 3 training files, a features file and an activity labels file
Used cbind and rbind to create a data table data, 10299 x 563
Used grep to reduce to data table with mean or stdmoments_data, 10299 x 81
Removed the 13 columns with Freq mean_std_data, 10299 x 68
Reshape2 bombed; used reshape package to melt and casttidy_averages, 180 x 68
Applied plyr package to revalue activity levels
Randomly sampled tidy_averages to test all is well
Extended max.print to 999999 to assure output file would not be cut short
Used capture.output to print tidy_averages as a file  UCI.HAR.JP.tidy.averages.txt

jpomaha / uci_har_project Goto Github PK

uci_har_project's Introduction

UCI_HAR_Project

uci_har_project's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent