GCD-ProgAss

Program Assignment for the Coursera course "Getting and Cleaning Data"

This repo contains files for the program assignment of the Getting and Cleaning Data course (from John Hopkins Blomberg School of Public Health).

The files requested for the assignment are:

run_analysis.R , which produces the file tidy.txt required
codebook.txt , which describes the contents of the file tidy.txt

The web page with the specifications of the assignments is at: https://class.coursera.org/getdata-006/human_grading/index

For reproducibility purposes (in case the web page in future will not be available), you can find:

in the folder "specs" : the content of the web page above specified
in the folder "data" : a .zip file with the input data for run_analysis.R

In the folder "docs" a zip file with several documentation I found on the web about the arguments of this course (and general related to R): feel free to grab it, if you're interested.

NOTES abour run_analysis.R

##CONFIGURATION

The script was written with RStudio 0.98.977, on a Windows PC. In order to be executed, it requires:

the package "reshape2"
the working directory in R must contain:
- the files features.txt and activity_labels.txt
- the subfolder "test", with the files X_test.txt, Y_test.txt, subject_test.txt
- the subfolder "train", with the files X_train.txt, Y_train.txt, subject_train.txt

Also, to work on non-windows machines, I suppose that for the filepaths "\" must be changed with "/".

LOGIC

This picture is worth than 1000 words... (Thanks to David Hood!) :-)

##OPERATION

The program does the following (in this order) :

read file features.txt (it contains the measure labels)
prepare column names (substituting parentheses with undercore) for the measures tables
read the two measures tables (X_test.txt and X_train.txt), using column names prepared with step 2.
extract only measures which have 'mean' or 'std' in their column names
read activities codes (Y_test.txt and Y_train.txt) related to the measures
read subjects (subject_test.txt and subject_train.txt related to the measures
put Subject and Activity together with the measures
put together testing and training data
read activity labels (activity_labels.txt) and add them to the measures
compute the average for all columns (respect to the "heading" columns: SubjectID,ActivityID,Activity)
output the result in the file tidy.txt (in the current working directory)

You can easily look at the file tidy.txt with any ascii editor (NOT Notepad on Windows), or (using R) with the statement: yourvar <- read.table("tidy.txt", header=TRUE)

This URL helped me a lot in doing this markdown file: https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet

mbranco2 / gcd-progass Goto Github PK

gcd-progass's Introduction

GCD-ProgAss

NOTES abour run_analysis.R

LOGIC

gcd-progass's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent