Ridge and Lasso
At this point we've seen a number of criteria and algorithms for fitting regression models to data. We've seen the simple linear regression using ordinary least squares, and its more general regression of polynomial functions. We've also seen how we can arbitrarily overfit models to data using kernel methods or feature engineering. With all of that, we began to explore other tools to analyze this general problem of overfitting versus underfitting. This included train and test splits, bias and variance, and cross validation.
Now we're going to take a look at another way to tune our models. These methods all modify our mean squared error function that we were optimizing against. The modifications will add a penalty for large coefficient weights in our resulting model. If we think back to our case of feature engineering, we can see how this penalty will help combat our ability to create more accurate models by simply adding additional features.
In general, all of these penalties are known as
$L^p$ norm of x
In order to help account for underfitting and overfitting, we often use what are called
The
$||x||p = \big(\sum{i} x_i^p\big)^\frac{1}{p}$
1. Ridge (L2)
One common normalization is called Ridge Regression and uses the
The ridge coefficients minimize a penalized residual sum of squares:
$ \sum(\hat{y}-y)^2 + \lambda\bullet w^2$
Write this loss function for performing ridge regression.
import numpy as np
def ridge_loss(y, y_hat, coeff_weights, lam = 0.8):
#Your code here
return None
2. Lasso (L1)
Another common normalization is called Lasso Regression and uses the
The ridge coefficients minimize a penalized residual sum of squares:
$ \sum(\hat{y}-y)^2 + \lambda\bullet |w|$
Write this loss function for performing ridge regression.
def lasso_loss(y, y_hat, coeff_weights, lam = 0.8):
#Your code here
return None
3. Run + Compare your Results
Run a ridge lasso and unpenalized regressions on the dataset below. While we have practice writing the precursors to a full ridge regression, we'll import the package for now. Then, answer the following questions:
- Which model do you think created better results overall?
- Comment on the differences between the coefficients of the resulting models
import pandas as pd
df = pd.read_excel('movie_data_detailed.xlsx')
df.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
budget | domgross | title | Response_Json | Year | imdbRating | Metascore | imdbVotes | |
---|---|---|---|---|---|---|---|---|
0 | 13000000 | 25682380 | 21 & Over | 0 | 2008 | 6.8 | 48 | 206513 |
1 | 45658735 | 13414714 | Dredd 3D | 0 | 2012 | 0.0 | 0 | 0 |
2 | 20000000 | 53107035 | 12 Years a Slave | 0 | 2013 | 8.1 | 96 | 537525 |
3 | 61000000 | 75612460 | 2 Guns | 0 | 2013 | 6.7 | 55 | 173726 |
4 | 40000000 | 95020213 | 42 | 0 | 2013 | 7.5 | 62 | 74170 |
from sklearn.linear_model import Lasso, Ridge, LinearRegression
from sklearn.model_selection import train_test_split
#Perform test train split
#Create Regression Models
#Fill in the appropriate RSS Train and Test Equations below.
# print('Train Error Ridge Model', #RSS Ridge Train)
# print('Test Error Ridge Model', #RSS Ridge Test)
# print('\n')
# print('Train Error Lasso Model', #RSS Lasso Train)
# print('Test Error Lasso Model', #RSS Lasso Test)
# print('\n')
# print('Train Error Unpenalized Linear Model', #RSS Unpenalized Train)
# print('Test Error Unpenalized Linear Model', #RSS Unpenalized Test)
Altering Alpha
Remember that we can also change our normalization coefficient, alpha, to adjust the strenght of our normalization. Iterate over the set np.linspace(start=0.1, stop=2.5, num=13) in order to find an optimal alpha.
import numpy as np
min_test_error_ridge = []
min_test_error_lasso = []
optimal_ridge_alpha = 0
optimal_lasso_alpha = 0
#**********Your code here****************
print('Minimum Ridge Test RSS: {}, Best alpha: {}'.format(min_test_error_ridge, optimal_ridge_alpha))
print('Minimum Lasso Test RSS: {}, Best alpha: {}'.format(min_test_error_lasso, optimal_lasso_alpha))
Minimum Ridge Test RSS: [], Best alpha: 0
Minimum Lasso Test RSS: [], Best alpha: 0