Giter Site home page Giter Site logo

uvads / ds-3001 Goto Github PK

View Code? Open in Web Editor NEW
14.0 5.0 88.0 147.33 MB

DS 3001: Foundations of Machine Learning

Home Page: https://uvads.github.io/DS-3001/

R 1.02% HTML 29.12% Jupyter Notebook 67.31% Python 1.17% JavaScript 1.38%
data-science education machine-learning

ds-3001's Introduction

DS 3001: Foundations of Machine Learning

Your Foundations of Machine Learning Tour Guides:

Brian's Office Hours: In person in my office Elson (165) or Virtual on Discord - Monday 1-3pm

Ariful office hours: TBD

Course Materials: Foundations of Machine Learning Repo

Subject Area and Catalog Number: Data Science, DS 3001

Year, Term and Time: 2023, Fall, T/Th 3:30-4:45

Class Title: Foundations of Machine Learning

Level: Undergraduate

Credit Type: Grade (A-F)


A Little Bit About the Course

What is Data Science, why is it becoming so important and what is needed to be successful in this field? We will explore these questions throughout the course through a variety of topics with data always at the center. In all things, we focus on creative thinking, not blind implementation. If you cannot answer why you are doing something, not only will you discover no new knowledge but you will also create new problems versus solving them.

The course centers on lab-based work and employs a team-based pedagogy, meaning much of the work in the course can and should be completed in collaboration with your classmates. Though very applied, we also include theoretical content and will have discussion sessions depending on the topic for any given week.

I am also aware that students come from different backgrounds and have a variety of skill sets that they bring to the class. There will be plenty of opportunities for those that would like to take advantage of extra discussions during office hours, TA support sessions, and team-based work that is designed in such a way that we can all learn from each other.

Throughout the course, we will endeavor to “live the life of a data scientist” allowing you to not only be directly taught but gain a sense of what it would be like to be working as a data scientist. You’ll be asked to discover new knowledge and share it with your peers and learn how to use the larger ecosystem of data science to your benefit.

What you’ll learn along the way

Data Science is incredibly broad and dynamic. The topics below are designed to reinforce this perspective and help you understand the field’s core tenements and what is demanded from practicing data scientists. The key is for you to gain a sense of the scope of Data Science, what is needed to contribute to the community, and feel comfortable incorporating these techniques into your work moving forward. Specific learning objectives are below:

Be able to describe the field of Data Science and its emerging sub-fields
Gain experience working in teams to solve Data Science problems
Gain experience communicating Data Science products
Articulate the advantages and disadvantages of selected ML approaches
Be able to select appropriate ML models given problems and data types
Understand the importance of and methods for evaluating ML models
Understand the negative outcomes associated with ML/AI bias and how they can be avoided

The course will move rather quickly and can be demanding at times. However, if we all work together to support each other you’ll be amazed how much you learn at the end of the semester!

How You’ll Know You Are Learning

On any given week, the course will require reviewing short video lectures and completing readings prior to coming to class. These lectures and readings will then be implemented in the lab portion of the course which will be conducted during the scheduled class period. Lab sessions will include a variety of activities but mostly be centered on team-oriented coding assignments. Students can also use lab sessions to work on mid-term and final projects when needed.

Ethical data scientist reflection (10%) – Based on in-class discussion and readings 
students will write a personal reflection of how ethics and data science interact and 
how we all can think about these concerns moving forward. This is critical to the work of 
a data scientist as the models we build have the potential to impact the lives of thousands 
of people so we must use caution and constraint whenever possible.

Quizzes (15%) – Short occasional (5 or 6) quizzes, will be auto-graded, so you will get instant 
feedback. In order to ensure we are all meeting the learning objectives from week to week short 
quizzes will be given. You will be allowed as many chances to complete the quiz as needed and 
they will be open note however students are to work independently. 

Labs (60%) – On most weeks we will have in-class labs/assignments. These are designed to allow you 
to practice the skills being presented in class. While they should be submitted individually you are 
encouraged to work with your peers as much of the best learning can come from your peers. 
You’ll need to create publishable markdown documents for every lab and submit them along 
with the raw code file and link each week (Groups).   

Final project (15%) – The course will culminate in a final project that will involve 
working with a dataset of your choice, giving a presentation, submitting well-annotated 
code to include summary information in report form. This is an open-ended project 
designed to allow groups to choose a topic of interest from the semester to
explore deeper and share with the class.

Tech Stack

VS Code/Google Collab/Rivana - You'll need to have the software loaded and ready to go day one, but we will help if needed. 
Miniconda/Conda - [Mini-Recommended](https://docs.conda.io/en/latest/miniconda.html)
Zoom - Virtual Option for Office Hours
Github - Almost all course materials (will post on Canvas as needed)
Canvas - Submission of assignments and class-wide communications
Discord - Low latency comms for groups and class.

Discord Invite Click

Overview for Install Miniconda and VS Code

Materials That Will Aid in Your Learning:

The books below are essentially a starter Machine Learning Library. I will use all of these references at difference points during the class, but try to use free options. Unfortunately the main book for the class Python Machine Learning with Pytorch and Scikit-Learn is not free. Everything else is either free or can be found for around 15 dollars. There's also references to Python style guides and tutorials.


* A. [Weapons of Math Destruction](https://www.amazon.com/Weapons-Math-Destruction-Increases-Inequality/dp/0553418815)
* B. [Evaluating Machine Learning Models – O’Reilly Digital via UVA Library](https://www.oreilly.com/library/view/temporary-access/)<br>
* C. [Python Machine Learning with PyTorch and Scikit-Learn, 4th Edition](https://sebastianraschka.com/blog/2022/ml-pytorch-book.html)
* D. [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/)
* E. [Python Tutorial](https://docs.python.org/3/tutorial/index.html)
* F. [Python Style Guide](https://peps.python.org/pep-0008/)
* G. [Applied Predictive Learning, Kuh and Johnson](http://appliedpredictivemodeling.com/toc) - Expensive but very good/Also R based
* H. [Machine Learning Engineering](http://www.mlebook.com/wiki/doku.php)- Free PDF Version
* I. [Mathematics for Machine Learning](https://mml-book.github.io/) Free PDF Version


## Schedule of Topics 

***NOTE: depending on student interest, the syllabus can be adjusted to accommodate additional topics

| Week 	| Theme 	| Topics 	| Lab 	| Reading/Repo (Prior to Class) 	|
|:---:	|:---:	|:---:	|:---:	|:---:	|
| Week 1 Aug 20th	| What is this “Data Science” that you speak of and tech stack 	| - Assessment - Videos: DS Overview and History 	| - Find DS Dream Job - Create your first project,  load the dataset, visualize using the code provided what questions could this data answer? 	|  Synchronous: Short Lab  	|
| Week 2 Aug 27th	|  Getting back up to “coding speed” 	| 'Dataframing' with pandas functions 	| [- Group Case Study - Questions + PsuedoCode + Code + Functions  = High Quality Data Science](02_R_function_basics/02_Lecture_Python.ipynb) | D. Chpts 2 and 3 |
| Week 3 Sep 3rd	| How to share nicely  	| Using Quarto to Create HTML Docs	|  (03_knitr_Comms) 	|  [Documentation](https://quarto.org/docs/output-formats/html-basics.html) 	|
| Week 4 Sep 10th	| Introduction to ML Concepts I	|Language of ML	| [Case Studies](https://github.com/UVADS/DS-3001/tree/main/04_ML_Concepts_I_Foundations)|C. Chpt 1 |
| Week 5 Sep 17th | Introduction to ML Concepts II | Data Preparation:kNN 	|[ML Concepts](https://github.com/UVADS/DS-3001/tree/main/05_ML_Concepts_II_Data_Prep)|C. Chpt 4 and pages 98-103  | 
| Week 6  Sep 24th	| Introduction to ML Concepts III	| Machine Learning Process:kNN|[ML Concepts  ](https://github.com/UVADS/DS-3001/tree/main/06_ML_Concepts_II_KNN) |C. pages 98-103 |
| Week 7 Oct 1st| Fall Break no Tuesday Class| | | 
| Week 8 Oct 8th | Introduction to ML Concepts IV	| Evaluation	| [Evaluation Lab](https://github.com/UVADS/DS-3001/tree/main/07_ML_Eval_Metrics) | All of B. and G.- Chapter 11 	|
| Week 9 Oct 15th | Nature's Perfect ML analogy: Trees Part I	|  Classification: Decisions Trees 	| [ Decision Trees](https://github.com/UVADS/DS-3001/tree/main/08_DT_Class)	| TBD and G. Chapter 14.1-14.3 	|
| Week 10 Oct 22nd | Nature's Perfect ML analogy: Trees Part II  	| Regression: Decision Trees  	| [Predicting Income for Big Brother]	| F. Chapter 5 and G. Chapter 8 	|
| Week 11 Oct 29th |Extra Decision Tree Week	 | | | TBD	|
| Week 12 Nov 5th |  Let's gather together... but separately | Kmeans|	NBA Scout for the worst team in the league| 	|
| Week 13 Nov 12th |  Wisdom of the Crowd		|Ensemble Methods: Random Forest	| |  TBD  |
| Week 14 Nov 19th | Do the next right thing…ethics |  Bias in AI Discussion -Simple methods for identifying bias   - Protected Classes	|[Fairness Overview & Ethical Reflections](https://github.com/UVADS/DS-3001/tree/main/14_ML_Bias) 	| Weapons of Math Destruction  	|
| Week 15 Nov 26th | Final Project Prep	|[Final Project Overview](https://github.com/UVADS/DS-3001/blob/main/final_project_overview.md)	| | Ethical Reflection Due |
| Week 16 - Final TBD	| Final Projects Presentations	| [Final Project Overview](https://github.com/UVADS/DS-3001/blob/main/final_project_overview.md) 	|  	|  	|

## A few Policies that will Govern the Class

Grading Policies: Courses carrying a Data Science subject area use the following grading system: A, A-; B+, B, B-; C+, C, C-; D+, D, D-; F.  The symbol W is used when a student officially drops a course before its completion or if the student withdraws from an academic program of the University.

Grading Scale: 

 - 93-100 A
 - 90-92 A- 
 - 87-89 B+
 - 83-86 B 
 - 80-82 B- 
 - 77-79 C+ 
 - 73-76 C 
 - 70-72 C- 
 - <70 F

University of Virginia Honor System: All work should be pledged in the spirit of the Honor System at the University of Virginia. The instructor will indicate which assignments and activities are to be done individually and which permit collaboration. The following pledge should be written out at the end of all quizzes, examinations, individual assignments, and papers:  “I pledge that I have neither given nor received help on this examination (quiz, assignment, etc.)”.  The pledge must be signed by the student. For more information, visit www.virginia.edu/honor.


Special Needs:  The University of Virginia accommodates students with disabilities. Any SCPS student with a disability who needs accommodation (e.g., in arrangements for seating, extended time for examinations, or note-taking, etc.), should contact the Student Disability Access Center (SDAC) and provide them with appropriate medical or psychological documentation of his/her condition. Once accommodations are approved, just follow up with me concerning any logistics and implementation of accommodations.  Please try to make accommodations for test-taking at least 14 business days in advance of the date of the test(s). Students with disabilities are encouraged to contact the SDAC: 434-243-5180/Voice, 434-465-6579/Video Phone, 434-243-5188/Fax. Further policies and statements are available at www.virginia.edu/studenthealth/sdac/sdac.html

Technical Support Contacts

    Login/Password: [email protected]
    UVaCollab: [email protected]
    BbCollaborate Support: http://www.tinyurl.com/uvabbc

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.