Giter Site home page Giter Site logo

getdata's Introduction

get-data: assignment for course in getting and cleaning data

Background to the analysis

There is a substantial body of research on making computing functionality ready at hand in everyday life and work, labelled variously as "pervasive computing" or ubiquitous computing. One part of this movement is making software and devices that are "context aware", whether physically, emotionally or socially. One strand of that is to infer physical activity states from movement sensors. For example, is the user driving, or asleep? What is the best way to deal with incoming messages or calls in that case?

Focus of data preparation

Prepare data to facilitate the recognition of phyiscal activity states from smartphone accelerometers and gyroscopes (which can be used to estimate angular acceleration). The recognition work could be attempted using machine learning techniques, for example.

Source data

The source data is given by the assignment.

The structure of the source data is not completely explicit. By comparing contents and data sizes I have inferred:

  • activity_labels.txt contains an encoding of activities (walking, etc.) into integer IDs, that is used in the observation files. One row per participant.

  • features.txt contains an ordered list of features, that is (derived) sensor measures. Sensor values are listed in the same order, in the data files. One feature per row.

  • test and train directories containing respectively test and training data. These folders have the same structure. For train:

    • subject_train.txt gives the participant numeric ID (1-30) for each observation, one row per observation.

    • y_train.txt gives the activity numeric ID (see above) for each observation, one row per observation.

    • X_train.txt gives the values of features, for each observation. One row per observation, features space-separated and in the order listed in features.txt (see above).

Output

The publishers of the data have already processed the raw sensor output, and I have chosen to tidy the rocessed data, only.

We are instructed to focus on the sensor measures for means and standard deviations. I have chosen to focus on the "first order" means and standard deviations, and to leave out the angle measures that are based on them.

The script outputs:

  • tidy.txt - the subsetted measures for merged training and testing data.

  • meltedmeans.txt - mean values of the subsetted measures, by participant, activity and variable name. This is the file delivered as part of the assignment.

Here is the code book for meltedmeans.txt.

Use of tools

  • R v3.1.1 and RStudio v0.98 were used.

  • plyr and reshape2 were used to tidy the data.

getdata's People

Contributors

philallen117 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.