Giter Site home page Giter Site logo

dsc-xgboost-dc-ds-071519's Introduction

XGBoost

Introduction

Now that you are familiar with gradient boosting, you'll learn about the top gradient boosting algorithm currently in use -- XGBoost!

Objectives

You will be able to:

  • Compare XGBoost to other boosting algorithms

What is XGBoost?

Gradient boosting is one of the most powerful concepts in machine learning right now. As you've seen, the term gradient boosting refers to a class of algorithms, rather than any single one. The version with the highest performance right now is XGBoost, which is short for eXtreme Gradient Boosting.

XGBoost is a stand-alone library that implements popular gradient boosting algorithms in the fastest, most performant way possible. There are many under-the-hood optimizations that allow XGBoost to train more quickly than any other library implementations of gradient boosting algorithms. For instance, XGBoost is configured in such a way that it parallelizes the construction of trees across all your computer's CPU cores during the training phase. It also allows for more advanced use cases, such as distributing training across a cluster of computers, which is often a technique used to speed up computation. The algorithm even automatically handles missing values!

Installing XGBoost

XGBoost is an independent library that provides implementations in C++, Python, and other languages. Luckily, the open-source community has had the good sense to make the Python API for XGBoost mirror that of sklearn, so using XGBoost feels no different than using any other supervised learning algorithm from sklearn. The only downside is that it does not come packaged with sklearn, so we must install it ourselves. conda makes this quite easy.

All you need to do is run the command conda install py-xgboost in your terminal, and conda will take care of the rest!

Use cases

XGBoost has risen to prominence by being the go-to algorithm for winning competitions on Kaggle, a competitive data science platform. It is so common to see XGBoost cited as an algorithm used by the winners of Kaggle competitions that it has become a bit of a running joke in the community. This page contains an (incomplete) list of all the recent competitions with place winners that used XGBoost for their solution!

XGBoost is a great choice for classification tasks. It provides best-in-class performance compared to other classification algorithms (with the exception of Deep Learning, which we'll learn more about soon).

Takeaways

When approaching a supervised learning problem, you should always use multiple algorithms, and compare the performances of the various models. There will always be use cases where some classes of models tend to outperform others. However, there are some models that generally outperform all the others -- XGBoost is at the top of this list! Make sure that this is an algorithm you're familiar with, as there are many situations where you'll find it quite useful!

You can find the full documentation for XGBoost here.

Summary

In this lesson, we learned about what XGBoost is, and why it is so powerful and useful to Data Scientists!

dsc-xgboost-dc-ds-071519's People

Contributors

mike-kane avatar sumedh10 avatar loredirick avatar

Watchers

James Cloos avatar Kevin McAlear avatar  avatar Mohawk Greene avatar Victoria Thevenot avatar Belinda Black avatar Bernard Mordan avatar raza jafri avatar  avatar Joe Cardarelli avatar The Learn Team avatar Sophie DeBenedetto avatar  avatar  avatar Matt avatar Antoin avatar Alex Griffith avatar  avatar Amanda D'Avria avatar  avatar Nicole Kroese  avatar Kaeland Chatman avatar Lisa Jiang avatar Vicki Aubin avatar Maxwell Benton avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.