The Quick (and Ultimate) Guide to Regularization

By on

Click to learn more about author Ram Tavva.

May it be in statistics or mathematics or finance – particularly in machine learning and inverse problems – regularization is any modification that one makes in a learning algorithm that is intended to not reduce its training error but its generalization error. In layman’s language, we can say the word “regularize” simply means to make things acceptable or regular.

So, in what sort of situations do we use regularization? Sometimes, we encounter a situation where our machine learning algorithm fails to perform well on testing data but works well on training data exceptionally. This is a situation where regularization comes into play and solves the issues. It discourages learning a flexible or more complex model in order to prevent overfitting. This technique can be used in such a way that it will allow us to maintain all variables or features in the model by reducing the magnitude of the variables. Hence, regularization maintains accuracy as well as generalization of the model.

In the same way cancer treatment destroys the cancer cells rather than affecting the healthy cells in the body, the regularization approach attacks the noise alone without affecting the signal.

What Is the Main Principle Behind Regularization?

The principle behind regularization is that it works by adding a penalty or complexity term to the complex model.

Considering the simple linear regression equation: y = β0+β1×1+β2×2+β3×3++βnxn +b, where y represents the value to be predicted and X1, X2, …Xn represents the features for Y. Also, β0,β1,…..βn represents the weights or magnitude attached to the features, respectively. Here, b represents the intercept. Linear regression models try to optimize the value of β0 and b so as to minimize the cost function.

So, the equation for the cost function for the linear model is as stated below:

Ridge regularization and Lasso regularization are the two most powerful techniques that are generally used for creating parsimonious models in the presence of a huge number of features. These forms of regularization work on the premise that smaller weights lead to simpler models, which in return helps in preventing overfitting of a model. So to obtain a smaller weight matrix, these techniques add a regularization term along with the loss to obtain the cost function, which can be stated as Cost Function = Loss + Regularization Term.

What Is Ridge Regularization?

Ridge regularization is one of the regularization techniques of linear regularization in which a small amount of bias is introduced so that we can get better long-term predictions, and it is used to reduce the complexity of the model. It is also known as L2 regularization.

In this technique, the amount of bias added to the model is known as Ridge regularization penalty and the cost function is altered by adding the penalty term to it. We can calculate it by multiplying the lambda to the squared weight of every individual feature, which says the equation for the cost function in ridge regularization will be:

In the above equation, the penalty term regularizes the coefficients of the model, which reduces the amplitudes of the coefficients and finally decreases the complexity of the model. As we will see from the above equation, if the values of λ tend to zero, the equation becomes the value function of the rectilinear regularization model. Hence, for the minimum value of λ, the model will resemble the rectilinear regularization model.

A general linear or polynomial regularization will fail if there’s high collinearity between the independent variables, so to unravel such problems, Ridge regularization is often used. It helps to unravel the issues if we have more parameters than samples.

What Is Lasso Regularization?

Lasso regularization, also known as L1 regularization, is another technique of regularization that scales back the complexity of the model. (Here’s a brief explanation of the differences between L1 and L2 regularization.)

The working principle of Lasso regularization is almost similar to Ridge regularization except that the penalty term contains only the weights rather than the square of weights. Moreover, it stands for Least Absolute and Selection Operator. Hence, the Lasso regularization can help us to decrease the overfitting in the model as well as increase the feature selection. The general mathematical equation for the value function of Lasso regularization will be:

Implementing Ridge and Lasso Regularization Using Python Programming

Step 1: The first and foremost step is to import the necessary libraries, which include pandas, numpy, matplotlib, sklearn.linear_model, sklearn.model_selection, and finally mean from statistics library. The train_test_split and cross_val_score are imported from the sklearn.model_selection library. The Ridge regularization coefficients minimize a penalized residual sum of squares whereas the Lasso is a linear model that estimates sparse coefficients.

Step 2: Once we are done with importing the libraries, the next step is to load the dataset. For this particular explanation, we have used the kc_house_data.csv, which consisted of historic data of houses sold between May 2014 and May 2015.

The dataset can be loaded using the pd.read_csv command and then displayed for a better visualization as shown below.

Step 3: It is necessary to drop null values from the dataset data.dropna() as null values may lead to error and less accuracy in model prediction.

Every dataset consists of some columns that are irrelevant or nonsensical and can be omitted. The same case applies as we shall be dropping the “id, date, zipcode” columns from our dataset. Now, it’s time to separate the dependent and independent variables. The below code snippet will make you understand it in a much better way.

Step 4: In order to verify whether our model is working well or not, we need to divide our data into training and testing sets with test_size=0.25.

Step 5: The cross_val_score helps to estimate the expected accuracy of our model on out of training data. Its added advantage is that one need not set aside any data to obtain this metric, and still we can train our model on all of the available data.

So, to find the cross_val_score of our model, we have followed a few hierarchical steps that are explained in a detailed manner in the below snippet.

Step 6: Now, we are ready to build and fit our Regression Models starting with Ridge regularization at the foremost. print(ridgeModelChosen.score(X_test, y_test) will evaluate the model accordingly.

Step 7: The same thing follows for implementing Lasso regularization in our model. First we have to calculate the cross_val_score, then build and fit the model, and finally, evaluate our model.

Step 8: The last and one of the most vital steps is to compare both the regularization models with the help of graphical representation. Let’s see how we can do that.

Step 9: Finally, we shall be plotting the scores with “Regularization Models” marked on the x-axis and “Score” on the y-axis.

Leave a Reply