Giter Site home page Giter Site logo

data_mining_group_assignment's Introduction

data_mining_group_assignment

Table of Contents

Introduction

This repo contains public resources for the Data Mining group assignment.

As the provided data for this assignment is confidential, all data sets, images, and models will be inaccessible from this repo.

InfluxDB and Grafana are included in the Docker stack for database storage and visualization purposes.

Postgres and Adminer are also included for those who are unfamiliar with Influx.

If you're using Docker, execute build.sh to get started.

Data

The provided data set should be placed in the data directory., and renamed to initial_dataset.csv.

The initial_processing script performs some initial processing on the data. It renames the columns, drops unneeded columns, converts channels and sites to factors, and factorizes the channel ID-key pairs.

DataScience

The datascience container offers both R and Python packages. A list of R and Python dependencies can be found in the Dockerfile. See requirements.txt for the list of Python packages and their respective versions. See packages.txt for the list of R packages and their respective versions.

As the docker-compose.yml file shows, this repo extends from the rocker/tidyverse image which already includes the tidyverse collection and RStudio server.

If you opt to use Docker, you can view the Makefile for relevant Docker commands. The make run command will allow users to execute shell commands within the datascience container. The make enter command will allow users to directly enter the container by starting a new shell.

InfluxDB

InfluxDB is a time series database. For those who are unfamiliar, more information can be found at influxdata.com. InfluxDB can be combined with Grafana to analyze and visualize the data. View the .env.example file to configure your InfluxDB & Grafana versions and ports.

CSV files can be easily imported to your InfluxDB instance using the csv-to-influxdb package.

PostgreSQL

Postgres can be used instead of Influx if required.

Database dumps can be imported and exported using make dbimp and make dbexp respectively. Dumps can be found in the data directory.

make csvimp will import CSV files into the database. Ensure that the CSV data is placed in the data directory. Supply the f argument to specify the file to use. E.g. make csvimp f=processed_dataset.

The following settings can be configured in your .env file:

Name Default Value
DB_NAME project
DB_USER user
DB_PASSWORD pass
DB_ROOT_PASSWORD password
DB_HOST postgres
DB_PORT 5432

data_mining_group_assignment's People

Contributors

jbris avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.