Giter Site home page Giter Site logo

dsci_571_sup-learn-1's Introduction

DSCI 571: Supervised Learning I

Welcome to DSCI 571, an introductory supervised machine learning course! In this course we will focus on basic machine learning concepts such as data splitting, cross-validation, generalization error, overfitting, the fundamental trade-off, the golden rule, and data preprocessing. You will also be exposed to common machine learning algorithms such as decision trees, K-nearest neighbours, SVMs, naive Bayes, and logistic regression using the scikit-learn framework.

2020-21 instructor: Varada Kolhatkar

Course Learning Outcomes

By the end of the course, students are expected to be able to:

  • describe supervised learning and identify what kind of tasks it is suitable for;
  • explain common machine learning concepts such as classification and regression, data splitting, overfitting, parameters and hyperparameters, and the golden rule;
  • identify when and why to apply data pre-processing techniques such as imputation, scaling, and one-hot encoding;
  • describe at a high level how common machine learning algorithms work, including decision trees, K-nearest neighbours, and naive Bayes;
  • use Python and the scikit-learn package to responsibly develop end-to-end supervised machine learning pipelines on real-- world datasets and to interpret your results carefully.

Deliverables

The following deliverables will determine your course grade:

Assessment Weight
Lab Assignment 1 15%
Lab Assignment 2 15%
Lab Assignment 3 15%
Lab Assignment 4 15%
Quiz 1 20%
Quiz 2 20%

Class Meetings

We will be meeting three times every week: twice for lectures and once for the lab.

Lecture format

Lectures of this course will be a combination of pre-recorded videos and class discussions and activities. You are expected to watch the videos before the lecture. We'll spend the lecture time in group activities and Q&A sessions.

Lecture Schedule

Lecture Topic Datasets Resources and optional readings
Motivation and course information
  • Indian Liver Patient Records
  • House Sales in King County
  • IMDB movie reviews
  • 1 Terminology, baselines, decision trees
  • House Sales in King County
  • Canada US cities toy dataset
  • 2 ML fundamentals
  • Canada US cities toy dataset
  • 3 kNNs, SVM RBF
  • Canada US cities toy dataset
  • Spotify Song Attributes
  • 4 Preprocessing and pipelines
  • Spotify Song Attributes
  • California Housing
  • 5 Categorical features and text features
  • The adult census dataset
  • 6 Hyperparameter optimization, optimization bias
  • The adult census dataset
  • 7 Naive Bayes
  • SMS Spam Collection Dataset
  • Conditional probability visualization
    8 Logistic Regression, multi-class classification
  • SMS Spam Collection Dataset
  • Installation

    We are providing you with a conda environment file which is available here. You can download this file and create a conda environment for the course and activate it as follows.

    conda env create -f env-dsci-571.yaml
    conda activate 571
    

    In order to use this environment in Jupyter, you will have to install nb_conda_kernels in the environment where you have installed Jupyter (typically the base environment). You will then be able to select this new environment in Jupyter.

    Note that this is not a complete list of the packages we'll be using in the course and there might be a few packages you will be installing using conda install later in the course. But this is a good enough list to get you started.

    Reference Material

    Books

    Online courses

    Misc

    Policies

    Please see the general MDS policies.

    Recommend Projects

    • React photo React

      A declarative, efficient, and flexible JavaScript library for building user interfaces.

    • Vue.js photo Vue.js

      🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

    • Typescript photo Typescript

      TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

    • TensorFlow photo TensorFlow

      An Open Source Machine Learning Framework for Everyone

    • Django photo Django

      The Web framework for perfectionists with deadlines.

    • D3 photo D3

      Bring data to life with SVG, Canvas and HTML. 📊📈🎉

    Recommend Topics

    • javascript

      JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

    • web

      Some thing interesting about web. New door for the world.

    • server

      A server is a program made to process requests and deliver data to clients.

    • Machine learning

      Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

    • Game

      Some thing interesting about game, make everyone happy.

    Recommend Org

    • Facebook photo Facebook

      We are working to build community through open source technology. NB: members must have two-factor auth.

    • Microsoft photo Microsoft

      Open source projects and samples from Microsoft.

    • Google photo Google

      Google ❤️ Open Source for everyone.

    • D3 photo D3

      Data-Driven Documents codes.