Giter Site home page Giter Site logo

getting-and-cleaning-data-project's Introduction

getting-and-cleaning-data-project

Coursera getting and cleaning data project

In this final project our pourpose is create a report via a information downloaded from https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip

We have done some tasks like

  1. Merges the training and the test sets and create only one.
  2. Extracts part of the dataset.
  3. Change the activity numbers with its descriptive tags.
  4. Also change the data labels in order to clarify its meaning.
  5. Create a new dataset using factors like the activities and compute the average of each variable.

Variables used:

  • local_path --> for the original dataset
  • data_set_folder --> used as base for the other paths as resource base
  • test --> path for test data
  • train --> path for training data
  • data_test --> loaded test data
  • data_train --> loaded training data
  • merged_data --> contains merged training and test data
  • headers --> path for headers and will be reused to contain the used headers
  • merged_data --> contains only the data related to the colums selected
  • data_test_ac --> contains test activities
  • data_train_ac --> contains training activities
  • cool_labels --> contins the labels for the activities
  • merged_activities --> contains the union of data_test_ac and data_train_ac
  • subjects_test_path --> contains test subjects
  • subjects_train_path --> contains training activities
  • merged_subjects --> contains the training and test subjects combination
  • final_data --> is the subjects, activities and data of interest in the same data frame
  • final_data$Activity <- activity factor
  • final_data$Subject_ID --> subject factor
  • data_melted --> data melted for the report
  • report --> contains the dataset using the factors and the average of each variable.

Code explanation

First i check if the dowload has been done. If not, i download the data and unzip it. Then I build paths for the original datasets, I load it and merge both training and test.

One time this is done, is time for the headers and select which are the data of interest (the data that contains the words 'std' and 'mean'). Also the heders must be merged and the data must be limited to these headers, the rest is descarted.

After I load the activities files (again test and training). The activities and the headers have or not names onoly numbers or not good readable names. The next step is solve this with replacements. In the activity_labels.txt file we have the activity number to name traduction.

In the last step we create a new dataset based on the last dataset created in the other steps. These new dataset contains with the average of each variable for each activity and each subject. Finally when it's done we save the new dataset in a file called TIDY.csv.

getting-and-cleaning-data-project's People

Contributors

angel-fernandez avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.