A Comparative Study of Ridge, LASSO and Principal components Regression on dealing with multicollinearity.
One of the statistical techniques that is often employed and has applications in all aspects of daily life is linear regression. In regression, the goal is to correlate the variation in one or more response variables with proportional change in one or more explanatory factors to explain the variation in the response variables. They are deemed to be orthogonal if there is no linear relationship between these explanatory variables. Several of the explanatory variables will fluctuate in quite comparable ways if the variables are not orthogonal. This issue, known as multicollinearity, is one that frequently arises in regression analysis. When two or more explanatory variables are highly (but not perfectly) correlated with one another, it makes challenging to interpret the strength of each variable's effect because in the presence of multicollinearity the OLS estimators are not precisely estimated. In the first part of this paper, we discuss the multicollinearity problem in linear regression model, present the technique to identify the problem, look for its causes and consequences. After that we explore ways to handle multicollinearity such as Ridge Regression, Lasso Regression and Principal Components regression and discuss the theory beyond them. In addition, we attempted a case study and applied those methods, and we compare which among the OLS, RR, LAS, and PCR should be an alternative when fitting a model with multicollinearity. MSE, RMSE and R squared being the comparison factor, the results showed that RR, LAS and PCR have mean square error less than the OLS while RR and LASSO performs well than PCR