Giter Site home page Giter Site logo

scikit-learn-tips's Introduction

๐Ÿค–โšก Daily scikit-learn tips

New tips are posted on LinkedIn, Twitter, and Facebook every weekday!

๐Ÿ‘‰ Sign up to receive 5 tips by email every week ๐Ÿ‘ˆ

List of all tips

Click to view the Jupyter notebook for a tip, or click to discuss the tip on LinkedIn:

# Description Links
1 Use ColumnTransformer to apply different preprocessing to different columns
2 Seven ways to select columns using ColumnTransformer
3 What is the difference between "fit" and "transform"?
4 Use "fit_transform" on training data, but "transform" (only) on testing/new data
5 Four reasons to use scikit-learn (not pandas) for ML preprocessing
6 Encode categorical features using OneHotEncoder or OrdinalEncoder
7 Handle unknown categories with OneHotEncoder by encoding them as zeros
8 Use Pipeline to chain together multiple steps
9 Add a missing indicator to encode "missingness" as a feature
10 Set a "random_state" to make your code reproducible
11 Impute missing values using KNNImputer or IterativeImputer
12 What is the difference between Pipeline and make_pipeline?
13 Examine the intermediate steps in a Pipeline
14 HistGradientBoostingClassifier natively supports missing values
15 Three reasons not to use drop='first' with OneHotEncoder
16 Use cross_val_score and GridSearchCV on a Pipeline
17 Try RandomizedSearchCV if GridSearchCV is taking too long
18 Display GridSearchCV or RandomizedSearchCV results in a DataFrame
19 Important tuning parameters for LogisticRegression

You can interact with all of these notebooks online using Binder:

Note: Some of the tips do not include any code, and can only be viewed on LinkedIn.

Who creates these tips?

Hi! I'm Kevin Markham, the founder of Data School. I've been teaching data science in Python since 2014. I create these tips because I love using scikit-learn and I want to help others use it more effectively.

How can I learn scikit-learn from scratch?

Watch my free video series, Introduction to Machine Learning in Python with scikit-learn. There are 10 videos totaling 4.5 hours, and each video has a corresponding Jupyter notebook. Here's the detailed list of topics that I cover in the series.

Due to changes in the scikit-learn API, a small percentage of the code shown in the videos is out-of-date. However, the code in the Jupyter notebooks is all up-to-date.

How can I get better at scikit-learn?

Take my online course, Machine Learning with Text in Python. It includes 14 hours of video lessons, detailed lesson notebooks, homework assignments with included solutions, access to a Slack team, and more. Here's the detailed list of topics that I cover in the course.

The course is not free, but you can preview a small portion of the course by watching my PyCon 2016 tutorial.

Do you have any other tips?

Yes! In 2019, I posted 100 pandas tricks. I also created a video featuring my top 25 pandas tricks.

ยฉ 2020 Data School. All rights reserved.

scikit-learn-tips's People

Contributors

justmarkham avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.