Ridge Regression - ML Pathway

## Overview Ridge Regression is a regularized version of Linear Regression that adds an L2L2 penalty to the loss function. This helps prevent overfitting by reducing the magnitude of the regression coefficients. ## Key Components 1. **Linear Regression Equation** - The model predicts Y using: $Y = w_0 + w_1X_1 + w_2X_2 + ... + w_nX_n$ - Where: - w0, w1, ..., wn are regression coefficients - X1,X2,...,Xn are input features 2. **Loss Function (Mean Squared Error with L2 Regularization)** - Ridge Regression modifies the standard MSE loss function by adding a penalty: $Loss = \sum (Y_{\text{actual}} - Y_{\text{predicted}})^2 + \lambda \sum w_i^2$ - Where: - λ(alpha) controls regularisation strength - Higher λ values shrink coefficients towards zero 1. **Optimization (Gradient Descent or Normal Equation)** - Finds the best coefficients while minimizing the loss ## How It Works 1. **Compute Predictions** - Uses the linear equation to predict values 2. **Calculate Loss** - Combines MSE with the regularization term 3. **Optimize Weights** - Uses Gradient Descent or Normal Equation to minimize loss 4. **Control Overfitting** - Adjust λ\lambda (alpha) to balance bias and variance ## Implementation Example ```python from sklearn.linear_model import Ridge from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error # Initialize Ridge Regression model ridge_reg = Ridge(alpha=1.0) # Train model ridge_reg.fit(X_train, y_train) # Make predictions predictions = ridge_reg.predict(X_test) mse = mean_squared_error(y_test, predictions) print(f'MSE: {mse:.2f}') ``` ## Advantages - Reduces overfitting compared to standard Linear Regression - Works well with multicollinear data (highly correlated features) - Stabilizes model predictions - Helps when there are many irrelevant features ## Disadvantages - Does not perform automatic feature selection (unlike Lasso) - Still assumes linear relationships - Cannot shrink coefficients to exactly zero ## Hyperparameters 1. **Regularization Strength (`alpha`)** - Controls the balance between bias and variance - Higher α\alpha → More regularization (simpler model) - Lower α\alpha → Less regularization (closer to Linear Regression) 2. **Solver (`solver`)** - `auto`: Automatically selects the best solver - `saga`: Works well for large datasets - `cholesky`: Efficient for small datasets ## Best Practices 1. **Tune Alpha (`alpha`)** - Use Grid Search or Cross-Validation to find the best value 2. **Feature Scaling** - Standardize features for better performance 3. **Handle Multicollinearity** - Ridge Regression is effective when features are correlated ## Common Applications - Stock market prediction - Medical data analysis - Financial risk modeling - Sales forecasting - Climate modeling ## Performance Optimization 1. **Use Cross-Validation** - Helps determine the best α\alpha value 2. **Feature Engineering** - Remove highly redundant features 3. **Compare with Lasso Regression** - [[Lasso Regression]] can be better if feature selection is needed ## Evaluation Metrics - Mean Squared Error (MSE) - Root Mean Squared Error (RMSE) - R² Score