What is L1 and L2 regularization
L1 and L2 regularization are two popular techniques used to prevent overfitting in machine learning models.
What is Overfitting?
Overfitting occurs when a model is too complex and learns the noise in the training data, resulting in poor performance on new, unseen data.
L1 Regularization (Lasso Regression)
L1 regularization, also known as Lasso regression, adds a penalty term to the loss function to discourage large weights. The penalty term is proportional to the absolute value of the weights.
Mathematically:
Loss function = (Sum of squared errors) + α * (Sum of absolute values of weights)
where α is the regularization strength.
L2 Regularization (Ridge Regression)
L2 regularization, also known as Ridge regression, adds a penalty term to the loss function to discourage large weights. The penalty term is proportional to the square of the weights.
Mathematically:
Loss function = (Sum of squared errors) + α * (Sum of squares of weights)
where α is the regularization strength.
Key Differences
When to Use Each
Hyperparameter Tuning
Both L1 and L2 regularization require tuning the regularization strength (α) to achieve optimal results. This can be done using techniques like cross-validation.
why we add penalty to lasso regression, explain the loss function concept with the penalty?
Let's break down the concept of loss functions and penalties in Lasso regression.
What is a Loss Function?
A loss function measures the difference between the model's predictions and the actual true values. The goal of training a model is to minimize the loss function.
Mean Squared Error (MSE) Loss Function
One common loss function is Mean Squared Error (MSE), which calculates the average squared difference between predicted and actual values:
MSE Loss Function = (1/n) * Σ(y_true - y_pred)^2
where:
- y_true: actual true values
- y_pred: model's predictions
- n: number of data points
What is the Penalty in Lasso Regression?
In Lasso regression, we add a penalty term to the loss function to discourage large weights. The penalty term is proportional to the absolute value of the weights:
Lasso Loss Function = MSE Loss Function + α * Σ|weights|
where:
- α: regularization strength (hyperparameter)
- Σ|weights|: sum of absolute values of model weights
Why Add a Penalty?
We add a penalty to the loss function for several reasons:
How Does the Penalty Affect the Model?
The penalty term affects the model in several ways:
By adding a penalty term to the loss function, Lasso regression encourages sparse models, reduces overfitting, and improves interpretability.