## Overview
Ridge Regression is a regularized version of Linear Regression that adds an L2L2 penalty to the loss function. This helps prevent overfitting by reducing the magnitude of the regression coefficients.
## Key Components
1. **Linear Regression Equation**
- The model predicts Y using: $Y = w_0 + w_1X_1 + w_2X_2 + ... + w_nX_n$
- Where:
- w0, w1, ..., wn are regression coefficients
- X1,X2,...,Xn are input features
2. **Loss Function (Mean Squared Error with L2 Regularization)**
- Ridge Regression modifies the standard MSE loss function by adding a penalty: $Loss = \sum (Y_{\text{actual}} - Y_{\text{predicted}})^2 + \lambda \sum w_i^2$
- Where:
- λ(alpha) controls regularisation strength
- Higher λ values shrink coefficients towards zero
1. **Optimization (Gradient Descent or Normal Equation)**
- Finds the best coefficients while minimizing the loss
## How It Works
1. **Compute Predictions**
- Uses the linear equation to predict values
2. **Calculate Loss**
- Combines MSE with the regularization term
3. **Optimize Weights**
- Uses Gradient Descent or Normal Equation to minimize loss
4. **Control Overfitting**
- Adjust λ\lambda (alpha) to balance bias and variance
## Implementation Example
```python
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
# Initialize Ridge Regression model
ridge_reg = Ridge(alpha=1.0)
# Train model
ridge_reg.fit(X_train, y_train)
# Make predictions
predictions = ridge_reg.predict(X_test)
mse = mean_squared_error(y_test, predictions)
print(f'MSE: {mse:.2f}')
```
## Advantages
- Reduces overfitting compared to standard Linear Regression
- Works well with multicollinear data (highly correlated features)
- Stabilizes model predictions
- Helps when there are many irrelevant features
## Disadvantages
- Does not perform automatic feature selection (unlike Lasso)
- Still assumes linear relationships
- Cannot shrink coefficients to exactly zero
## Hyperparameters
1. **Regularization Strength (`alpha`)**
- Controls the balance between bias and variance
- Higher α\alpha → More regularization (simpler model)
- Lower α\alpha → Less regularization (closer to Linear Regression)
2. **Solver (`solver`)**
- `auto`: Automatically selects the best solver
- `saga`: Works well for large datasets
- `cholesky`: Efficient for small datasets
## Best Practices
1. **Tune Alpha (`alpha`)**
- Use Grid Search or Cross-Validation to find the best value
2. **Feature Scaling**
- Standardize features for better performance
3. **Handle Multicollinearity**
- Ridge Regression is effective when features are correlated
## Common Applications
- Stock market prediction
- Medical data analysis
- Financial risk modeling
- Sales forecasting
- Climate modeling
## Performance Optimization
1. **Use Cross-Validation**
- Helps determine the best α\alpha value
2. **Feature Engineering**
- Remove highly redundant features
3. **Compare with Lasso Regression**
- [[Lasso Regression]] can be better if feature selection is needed
## Evaluation Metrics
- Mean Squared Error (MSE)
- Root Mean Squared Error (RMSE)
- R² Score