Giter Site home page Giter Site logo

seaneldrin1 / automated-manual-comparison Goto Github PK

View Code? Open in Web Editor NEW

This project forked from alteryx/automated-manual-comparison

0.0 1.0 0.0 319.22 MB

Automated vs Manual Feature Engineering Comparison. Implemented using Featuretools.

Home Page: https://towardsdatascience.com/why-automated-feature-engineering-will-change-the-way-you-do-machine-learning-5c15bf188b96

License: BSD 3-Clause "New" or "Revised" License

Jupyter Notebook 98.95% Python 0.63% HTML 0.42% Shell 0.01%

automated-manual-comparison's Introduction

Manual vs Automated Feature Engineering Comparison

The traditional process of manual feature engineering requires building one feature at a time by hand informed by domain knowledge. This is tedious, time-consuming, error prone, and perhaps most importantly, specific to each dataset, which means the code will have to be re-written for each problem.

Automated feature engineering with Featuretools allows one to create thousands of features automatically from a set of related tables using a framework that can be easily applied to any problem.

Featuretools

Highlights

Featuretools offers us the following benefits:

  1. Up to 10x reduction in development time
  2. Better predictive performance
  3. Interpretable features with real-world significance
  4. Fits into existing machine learning pipelines
  5. Ensures data is valid in time-series problems

Automated feature engineering will change the way you do machine learning by allowing you to develop better predictive models in a fraction of the time as the traditional approach.

Article

For the highlights of the project, check out "Why Automated Feature Engineering Will Change the Way You Do Machine Learning" on Towards Data Science (Link)

Results

Each of the 3 projects in this repository demonstrates different benefits of using automated feature enginering.

  1. Loan Repayment Prediction: Build Better Models Faster

Given a dataset of 58 millions rows spread across 7 tables and the task of predicting whether or not a client will default on a loan, Featuretools delivered a better predictive model in a fraction of the time as manual feature engineering. The features built by Featuretools are also human-intrepretable and can give us insight into the problem:

  1. Retail Spending Prediction: Ensure Models Use Valid Data

When we have time-series data, we traditionally have to be extremely careful about making sure our model only trains on valid data. Often, a model will work in development only to completely fail in deployment because the training data was not properly filtered based on the time. Featuretools can take care of time filters automatically, allowing us to focus on other aspects of the machine learning pipeline and delivering better overall predictive models:

  1. Engine Life Prediction: Automatically Create Meaningful Features

In this problem of predicting how long an engine will run until it fails, we observe that Featuretools creates meaningful features which can inform our thinking about real-world problems as seen in the most important features:

Scaling with Dask

For an example of how Featuretools can scale - either on a single machine or a cluster - see the Featuretools on Dask notebook.

Feature Labs

Featuretools

Featuretools is an open source project created by Feature Labs. To see the other open source projects we're working on visit Feature Labs Open Source. If building impactful data science pipelines is important to you or your business, please get in touch.

Contact

Any questions can be directed to [email protected]

automated-manual-comparison's People

Contributors

willkoehrsen avatar gsheni avatar seaneldrin1 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.