Giter Site home page Giter Site logo

cleaningdataproject's Introduction

README

Getting and Cleaning Data Project

Content

I will lay out each step of my code here. If you look at the R script, it has a number of what looks like headings. The content in this README will directly follow those headings, and help explain each step of what it does. To find out names of things and what they mean, please check the code book given in the repo. Here is how I will lay out the content from here:

Test data frame

Train data frame

Put the data frames together

Renaming activity values

simplifying the data

Finally, writing our tidy data set

Test Data Frame

The first thing done in this section is that I read in the data from the text files in the folder "UCI HAR Dataset". This folder (required to download for assignment) must be in your working directory for the script to run properly. Once this is downloaded, the files can be read into R. The next thing I did was make the vector feature_labels. This vector contains all of the names for the variables in the newly made X_test data frame. The reason that I also pasted the numbers 1 through 561 on the columns (THIS IS IMPORTANT) is because there were repeated column names in the features data set. The odd thing is that even though there were repeated names, the data inside of those variables were not the same. So to make up for this possible mistake made by the original makers of the data, I added numbers onto each variable that matches their column name to make each column distinct. After this I made the names for the subject and activity columns, and finally made the data frame for the test variables.

Train Data Frame

This section is almost identical to the "Test Data Frame", except we are now working with the data set of training data. However, nothing really needs to be treated differently in the code because this data set is using the same variables and layout.

Put the Data Frames Together

This section is simple. Using the rbind function we can put the two data frames together by rows.

Renaming Activity Values

This section changes the values of activity that were previously labeled as numbers into characters containing information about the activity. This helps to make the data frame more descriptive than it previously was before. The mapvalues function works perfectly for this purpose.

Simplifying the Data

This section is reliant on the package dplyr being installed. First using the select() function I was able to take out any column variables that didn't have to do with the mean or standard deviation of a variable. Next, using the summarise() function I took the mean of each variable based on the subject and the activity they were doing to make our data set clean and concise.

Finally, Writing Our Tidy Data Set

This section almost explains itself. Using write.table() I wrote the final data frame into a text file to make our tidy data set that can be read in easily by R for future work. To read this into R again, just use the code:

data <- read.table("./run_analysis.txt", header = TRUE)

View(data)

cleaningdataproject's People

Contributors

obewanjacobi avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.