Giter Site home page Giter Site logo

gettingclosingdata's Introduction

Getting and Cleaning Course Project

Background

One of the most exciting areas in all of data science right now is wearable computing - see for example this article . Companies like Fitbit, Nike, and Jawbone Up are racing to develop the most advanced algorithms to attract new users. The data linked to from the course website represent data collected from the accelerometers from the Samsung Galaxy S smartphone.

A full description is available at the site where the data was obtained:

http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones

Purpose:

The purpose of this project is to demonstrate the ability to collect, work with, and clean a data set. The goal is to prepare tidy data that can be used for later analysis.

Here are the data for the project:

https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip

Data Explanation

The features selected for this database come from the accelerometer and gyroscope 3-axial raw signals tAcc-XYZ and tGyro-XYZ. These time domain signals (prefix 't' to denote time) were captured at a constant rate of 50 Hz. Then they were filtered using a median filter and a 3rd order low pass Butterworth filter with a corner frequency of 20 Hz to remove noise. Similarly, the acceleration signal was then separated into body and gravity acceleration signals (tBodyAcc-XYZ and tGravityAcc-XYZ) using another low pass Butterworth filter with a corner frequency of 0.3 Hz.

Subsequently, the body linear acceleration and angular velocity were derived in time to obtain Jerk signals (tBodyAccJerk-XYZ and tBodyGyroJerk-XYZ). Also the magnitude of these three-dimensional signals were calculated using the Euclidean norm (tBodyAccMag, tGravityAccMag, tBodyAccJerkMag, tBodyGyroMag, tBodyGyroJerkMag).

Finally a Fast Fourier Transform (FFT) was applied to some of these signals producing fBodyAcc-XYZ, fBodyAccJerk-XYZ, fBodyGyro-XYZ, fBodyAccJerkMag, fBodyGyroMag, fBodyGyroJerkMag. (Note the 'f' to indicate frequency domain signals).

These signals were used to estimate variables of the feature vector for each pattern:
'-XYZ' is used to denote 3-axial signals in the X, Y and Z directions.

  • tBodyAcc-XYZ
  • tGravityAcc-XYZ
  • tBodyAccJerk-XYZ
  • tBodyGyro-XYZ
  • tBodyGyroJerk-XYZ
  • tBodyAccMag
  • tGravityAccMag
  • tBodyAccJerkMag
  • tBodyGyroMag
  • tBodyGyroJerkMag
  • fBodyAcc-XYZ
  • fBodyAccJerk-XYZ
  • fBodyGyro-XYZ
  • fBodyAccMag
  • fBodyAccJerkMag
  • fBodyGyroMag
  • fBodyGyroJerkMag
The set of variables that were estimated from these signals are:
  • mean(): Mean value
  • std(): Standard deviation
  • mad(): Median absolute deviation
  • max(): Largest value in array
  • min(): Smallest value in array
  • sma(): Signal magnitude area
  • energy(): Energy measure. Sum of the squares divided by the number of values.
  • iqr(): Interquartile range
  • entropy(): Signal entropy
  • arCoeff(): Autorregresion coefficients with Burg order equal to 4
  • correlation(): correlation coefficient between two signals
  • maxInds(): index of the frequency component with largest magnitude
  • meanFreq(): Weighted average of the frequency components to obtain a mean frequency
  • skewness(): skewness of the frequency domain signal
  • kurtosis(): kurtosis of the frequency domain signal
  • bandsEnergy(): Energy of a frequency interval within the 64 bins of the FFT of each window.
  • angle(): Angle between to vectors.
Additional vectors obtained by averaging the signals in a signal window sample. These are used on the angle() variable:
  • gravityMean
  • tBodyAccMean
  • tBodyAccJerkMean
  • tBodyGyroMean
  • tBodyGyroJerkMean
  • The complete list of variables of each feature vector is available in 'features.txt'

    Script Overview

    The script called run_analysis.R peforms the following:

    1. Downloads the data the data from the URL provided above and extracts to the working directory set in the ‘dirData’

    2. Merge the training and the test sets to create one data set.

      1.subject_train.txt, Y_train.txt and X_train.txt found in the train folder are loaded to tables and are merged by column.

      2 subject_test.txt, Y_test.txt and X_test.txt found in the test folder are loaded to tables and are merged by column.

      3 both of the data tables are then merged by row.

    3. Extract only the measurements on the mean and standard deviation for each measurement.

      1 Load the column names from the features.txt included in the data set

      2 Determine Required Columns with strings “mean()” or “std()” included in their names

      3 Remove unnecessary columns from the Total data frame

    4. Use descriptive activity names to name the activities in the data set

      1 Load the activity labels from the data set activity_labels.txt

      2 Replace the Activity Factors with the corresponding labels.

    5. Appropriately labels the data set with descriptive variable names.

      1 Remove "()" from the original column names

      2 Replace original variable labels with the new one below

    6. From the data set in step 4, create a second, independent tidy data set with the average of each variable for each activity and each subject.

      1 Melt the Data Set with the Subject and Activity as ID

      2 Recast with the mean calculated for each group

      3 Save the result to tidy.txt in the declared working directory “dirResults”

gettingclosingdata's People

Contributors

acabaya avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.