Giter Site home page Giter Site logo

dsc-1-11-03-multiple-linear-regression-nyc-career-ds-062518's Introduction

Multiple Linear Regression

Introduction

In this lecture, you'll be introduced to the multiple linear regression model. We'll start with an introductory example using linear regression, which you've seen before, to act as a segue into multiple linear regression. We'll conclude this lecture by looking at a real data example.

Objectives

You will be able to:

  • Understand and explain the use cases for linear and multiple regression analyses
  • Describe multiple regression with >1 predictors

From simple to multiple linear regression

You have previously learned about linear regression models. In these models, what you try to do is fit a linear relationship between two variables. Let's refresh our memory with the example below. Here, we are trying to find a relationship between seniority and monthly income. It is definitely reasonable to assume that, on average, people with more seniority have a higher income than people who are newer to the job market and have less experience. A potential relationship could look like the plot below. The monthly income is shown in 1000 USD.

#import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(1234)
sen = np.random.uniform(18, 65, 100)
income = np.random.normal((sen/10), 0.5)
sen = sen.reshape(-1,1)

fig = plt.figure(figsize=(7,5))
fig.suptitle('seniority vs. income', fontsize=16)
plt.scatter(sen, income)
plt.plot(sen, sen/10, c = "black")
plt.xlabel("seniority", fontsize=14)
plt.ylabel("monthly income", fontsize=14)
plt.show();

png

Here, seniority is the independent variable, and monthly income is the target variable.

Of course, you know that seniority is not the only factor that drives income. Other factors that can play a role are, among others, number of hours worked, years of education, and city of employment. Now You could create simple linear regression models for each of these factors and monthly outcome as a target, but more interestingly, you can also create a model where all these potential predictors serve as independent variables at once. How does this work? Let's start with an example with two predictors. Let's start by refreshing the expression for our linear model. As a general expression for our linear model, we had that $\hat Y = bX + a$. Applied to this example, this would boil down to:

$\text{estimated monthly income} = slope * seniority + intercept $

adding in years of education as a predictor, you can extend this model to:

$\text{estimated monthly income} = slope_s * seniority + slope_ed * years_of_education + intercept $

What exactly does that look like?

Drawing

As we have two predictors here, the simple line is replaced by a plane. Our $slope_s$ represents the slope in the direction of the axis associated with seniority, our $slope_ed$ represents the slope in the direction of the axis associated with years of education.

And it obviously doesn't stop here! We can add as many predictors as we like. What is important to note, however, is that for models with more than 2 predictors representing a multiple linear model becomes very difficult and even impossible! Still, it is done all the time, as linear models of all sorts are extensively used in many fields!

When thinking of lines and slopes statistically, slope parameters associated with a particular predictor $x_i$ are often denoted by $\beta_i$. Extending this example mathematically, you would write a multiple linear regression model as follows:

$$ \hat y = \hat\beta_0 + \hat\beta_1 x_1 + \hat\beta_2 x_2 +\ldots + \hat\beta_n x_n $$

where $n$ is the number of predictors, $\beta_0$ is the intercept, and $\hat y$ is the so-called "fitted line" or the predicted value associated with the dependent variable.

Summary

Congratulations! You have gained an initial understanding for a multinomial linear regression model. To illustrate the usage of the multiple linear regression model, we'll be using the auto-mpg dataset which contains information on the technical specifications of a variety of cars. In the labs, you'll be practicing your newly gained knowledge using the Boston Housing data again.

dsc-1-11-03-multiple-linear-regression-nyc-career-ds-062518's People

Contributors

loredirick avatar peterbell avatar sik-flow avatar

Watchers

Kevin Ernest Long avatar James Cloos avatar Kevin McAlear avatar  avatar Victoria Thevenot avatar Belinda Black avatar  avatar Joe Cardarelli avatar Sam Birk avatar Sara Tibbetts avatar The Learn Team avatar Sophie DeBenedetto avatar  avatar Jaichitra (JC) Balakrishnan avatar Antoin avatar Alex Griffith avatar  avatar Amanda D'Avria avatar  avatar Nicole Kroese  avatar  avatar  avatar Lisa Jiang avatar Vicki Aubin avatar Maxwell Benton avatar  avatar

Forkers

adomkay slieb74

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.