Giter Site home page Giter Site logo

workshop_2018_dsgo's Introduction

DSGO 2018 - Machine Learning With R + H2O Workshop

Get ready to learn how to predict credit defaults with R + H2O!

Program

  • Data is Credit Loan Applications to a Bank.

  • Objective is to assess Risk Of Default, prevent bad loans, save bank lots of $$$

  • Best Kagglers got 0.80 AUC with more 100's of manhours, feature engineering, combining more data sets

  • We'll get 0.74 AUC in 30 minutes of coding (+1.5 hour of explaining)

Data

  • Kaggle Competition: Home Credit Default Risk

  • Data is large (166MB unzipped, 308K rows, 122 columns)

  • Will work with sampled data 20% to keep manageable

Machine Learning With H2O

The goal of Machine Learning with H2O is to get you experience with:

  1. The R programming language

  2. h2o for machine learning

  3. lime for feature explanation

  4. recipes for preprocessing

Becoming A Data Science Rockstar

  • This 3 hour workshop will teach you some of the latest tools & techniques for Machine Learning in business

  • With this said, you will spend 5% of your time on modeling (machine learning) & 95% of your time:

    • Managing projects
    • Collecting & working with data (manipulating, combining, cleaning)
    • Visualizing information - showing the size of problems and what is likely contributing
    • Communicating results in terms the business cares about
    • Recommending actions that improve the business
  • Further, your organization will be keenly aware of what you contribute financially. You need to show them Return on Investment (ROI). They are making an investment in having a data science team. They expect tangible results.

  • Important Actions:

    • Attend my talk on the Business Science Problem Framework tomorrow. The BSPF is the essential system that enables driving ROI with data science.

    • Take my DS4B 201-R course. This teaches you a 10-Week Program that has cut data science projects in half for consultants and has progressed data scientists more than any other course they've take. You will get 20% OFF (expires after DSGO conference).


Installation Instructions

Option 1: RStudio IDE Desktop + Install R Packages

Step 1: Install R and RStudio IDE
Step 2: Open Rstudio and run the following scripts
pkgs <- c("h2o", "tidyverse", "rsample", "recipes", "lime")
install.packages(pkgs)

Test H2O - You may need the Java Developer Kit

library(h2o)
h2o.init()

If H2O cannot connect, you probably need to install Java.

Step 3: Load the Project From GitHub

Wait for instructions from Matt.

The URL for the GitHub project is:

https://github.com/business-science/workshop_2018_dsgo

Option 2: If You Have Docker Installed

Step 0: Docker Installation (Takes Time)

Skip this step if you already have Docker Community Edition installed

Docker Community Edition Installation Instructions

Step 1: Run the DSGO Workshop Docker Image

In a terminal / command line, run the following command to download and install the workshop container. This will take a few minutes to load.

docker run -d -p 8787:8787 -v "`pwd`":/home/rstudio/working -e PASSWORD=rstudio -e ROOT=TRUE mdancho/workshop_2018_dsgo
Step 3: Fire Up RStudio IDE in your Browser

Go into you favorite browser (I'll be using Chrome), and enter the following in the web address field.

localhost:8787
Step 4: Log into RStudio Server

Use the following credentials.

  • User Name: rstudio
  • Password: rstudio
Step 5: Load the Project From GitHub

Wait for instructions from Matt.

The URL for the GitHub project is:

https://github.com/business-science/workshop_2018_dsgo


Further Resources

workshop_2018_dsgo's People

Contributors

mdancho84 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.