Giter Site home page Giter Site logo

predict_salary's Introduction

title author date output
Show me how to make more than 60k a year
Weining Hu
November 27, 2016
html_document

Everyone wish to have a decent job in the future with good salary, but what are the parameters that will actually influence our income in the future? Is it your occupation, living area, age, or marriage status that determine your income? In this report, I will briefly introduce my really rough steps when I participated in the microsoft machine learning competition and try to learn from the dataset to findout what are some important factors in determining your future annual income.

Data Exploration

Most of the data science tasks start with data exploration where we make visualizations or statistical tests to get some intuitive ideas for the dataset. Let's start with some factors that I personally think would affect the income.

First, we could assume that as a person becomes more experienced, it's more likely that he/she would make more money in his/her career. The following graph shows the age group distributions with different response variables (morethan60kyr). As we could see from the graph, between the age group of 0-20, siginificantly more people make less than 60k a year while it's the opposite for age group of 35-55. This fact revealed from the plot agrees with my previous assumptions.

title

So what we could get out of this image? We could see that the distribution

title

###Preprocessing In this step, I mainly apply a 'one-hot-encoding' to transform the categorical data into numeric data. Instead of apply 'ordinal mapping', which may bring extra information to the data, 'one-hot-encoding' would help preserve the original information.

Ensemble Selection

I finally decided to apply XGboost for my model training due to its awesome performance regarding accuracy, efficiency and robustness. The importance output gave us the key factors for gradient tree boosting when it initially made the splits. We could see from the below graph that title

What could we get?

predict_salary's People

Contributors

weininghu1012 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.