Course materials for General Assembly's Data Science course in Washington, DC (3/18/15 - 6/3/15).
Instructors: Kevin Markham and Brandon Burroughs
Monday | Wednesday |
---|---|
3/18: Introduction and Python | |
3/23: Git and Command Line | 3/25: Exploratory Data Analysis |
3/30: Visualization and APIs | 4/1: Machine Learning and KNN |
4/6: Bias-Variance and Train/Test Split | 4/8: Kaggle Titanic (Part 1) |
4/13: Web Scraping, Tidy Data, Reproducibility | 4/15: Linear Regression |
4/20: Logistic Regression and Confusion Matrix | 4/22: ROC and Cross-Validation |
4/27: Project Presentation #1 | 4/29: Kaggle Titanic (Part 2) |
5/4: Naive Bayes | 5/6: Natural Language Processing |
5/11: Decision Trees | 5/13: Ensembles |
5/18: Clustering and Regularization | 5/20: Advanced scikit-learn |
5/25: No Class | 5/27: Databases and SQL |
6/1: Course Review | 6/3: Project Presentation #2 |
- 3/30: Deadline for discussing your project idea(s) with an instructor
- 4/6: Project question and dataset (write-up)
- 4/27: Project presentation #1 (slides, code, visualizations)
- 5/18: First draft due (draft of project paper, code, visualizations)
- 5/25: Peer review due
- 6/3: Project presentation #2 (project paper, slides, code, visualizations, data, data dictionary)
- Course project requirements
- Public data sources
- Kaggle competitions
- Examples of student projects
- Peer review guidelines
- Office hours will take place every Saturday and Sunday.
- Homework will be assigned every Wednesday and due on Monday, and you'll receive feedback by Wednesday.
- Our primary tool for out-of-class communication will be a private chat room through Slack.
- Homework submission form (also for project submissions)
- Gist is an easy way to put your homework online
- Feedback submission form (at the end of every class)
- Install the Anaconda distribution of Python 2.7x.
- Install Git and create a GitHub account.
- Once you receive an email invitation from Slack, join our "DAT5 team" and add your photo.
- Choose a Python workshop to attend, depending upon your current skill level:
- Beginner: Saturday 3/7 10am-2pm or Thursday 3/12 6:30pm-9pm
- Intermediate: Saturday 3/14 10am-2pm
- Practice your Python using the resources below.
- Codecademy's Python course: Good beginner material, including tons of in-browser exercises.
- DataQuest: Similar interface to Codecademy, but focused on teaching Python in the context of data science.
- Google's Python Class: Slightly more advanced, including hours of useful lecture videos and downloadable exercises (with solutions).
- A Crash Course in Python for Scientists: Read through the Overview section for a quick introduction to Python.
- Python for Informatics: A very beginner-oriented book, with associated slides and videos.
- Code from our beginner and intermediate workshops: Useful for review and reference.
- Introduction to General Assembly
- Course overview (slides)
- Brief tour of Slack
- Checking the setup of your laptop
- Python lesson with airline safety data (code)
Homework:
- Python exercises with Chipotle order data (listed at bottom of code file)
- Work through GA's excellent introductory command line tutorial and then take this brief quiz.
- Read through the course project requirements and start thinking about your own project!
Optional:
- If we discovered any setup issues with your laptop, please resolve them before Monday.
- If you're not feeling comfortable in Python, keep practicing using the resources above!
Homework:
- Command line exercises with SMS Spam Data (listed at the bottom of Introduction to the Command Line)
- Note: This homework is not due until Monday. You might want to create a GitHub repo for your homework instead of using Gist!
Optional:
- Browse through some example student projects to stimulate your thinking and give you a sense of project scope.
Resources:
- This Command Line Primer goes a bit more into command line scripting.
- Read the first two chapters of Pro Git to gain a much deeper understanding of version control and basic Git commands.
- Watch Introduction to Git and GitHub (36 minutes) for a quick review of a lot of today's material.
- GitRef is an excellent reference guide for Git commands, and Git quick reference for beginners is a shorter guide with commands grouped by workflow.
- The Markdown Cheatsheet covers standard Markdown and a bit of "GitHub Flavored Markdown."