Lasso Regression - ML Pathway

## Overview Lasso Regression (Least Absolute Shrinkage and Selection Operator) is a regularized version of Linear Regression that adds an L1L1L1 penalty to the loss function. It not only helps prevent overfitting but also performs automatic feature selection by shrinking some coefficients to zero. ## Key Components 1. **Linear Regression Equation** - The model predicts Y using: $Y = w_0 + w_1X_1 + w_2X_2 + ... + w_nX_n$ - Where: - w0,w1,...,wn are regression coefficients - X1,X2,...,Xn are input features 2. **Loss Function (Mean Squared Error with L1 Regularisation)** - Lasso Regression modifies the standard MSE loss function by adding a penalty: $Loss = \sum (Y_{\text{actual}} - Y_{\text{predicted}})^2 + \lambda \sum |w_i|$ - Where: - λ\(alpha) controls the strength of regularisation - The L1 penalty encourages sparsity, effectively setting some coefficients to zero 1. **Optimisation (Gradient Descent or Coordinate Descent)** - Finds the best coefficients by minimising the loss while imposing the L1 penalty ## How It Works 1. **Compute Predictions** - Uses the linear equation to predict values 2. **Calculate Loss** - Combines MSE with the regularisation term 3. **Optimise Weights** - Uses Gradient Descent or Coordinate Descent to minimise the loss 4. **Feature Selection** - Some coefficients are shrunk to zero, effectively removing irrelevant features ## Implementation Example ```python `from sklearn.linear_model import Lasso from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error # Initialize Lasso Regression model lasso_reg = Lasso(alpha=0.1) # Train model lasso_reg.fit(X_train, y_train) # Make predictions predictions = lasso_reg.predict(X_test) mse = mean_squared_error(y_test, predictions) print(f'MSE: {mse:.2f}')` ``` ## Advantages - Performs automatic feature selection by shrinking some coefficients to zero - Reduces overfitting by penalising large coefficients - Can help with high-dimensional datasets with many irrelevant features - Suitable for situations where you suspect many features are irrelevant ## Disadvantages - Sensitive to the choice of λ (alpha) - May be too aggressive in eliminating important features if λ is too large - Assumes linear relationships between the input features and output ## Hyperparameters 1. **Regularization Strength (`alpha`)** - Controls the strength of the L1 penalty - Larger α → More regularization, more feature selection - Smaller α → Less regularization, closer to linear regression 2. **Solver (`solver`)** - `auto`: Automatically chooses the solver - `saga`: Works well for large datasets and supports both L1 and L2 penalties - `lsqr`: Suitable for larger problems ## Best Practices 1. **Tune Alpha (`alpha`)** - Use Grid Search or Cross-Validation to find the optimal α\alphaα value 2. **Feature Scaling** - Standardize features for better performance and more efficient optimization 3. **Feature Selection** - Use Lasso to automatically remove irrelevant features 4. **Cross-Validation** - Helps determine the best model parameters ## Common Applications - Predicting house prices (feature selection from multiple features) - Selecting important financial features for risk modeling - Gene selection in biological studies - Sparse modeling in signal processing - Medical data analysis for disease prediction ## Performance Optimisation 1. **Use Cross-Validation** - Helps identify the best α\alphaα value and prevent overfitting 2. **Feature Engineering** - Remove highly correlated features before applying Lasso 3. **Use ElasticNet** - For a combination of L1 and L2 regularisation, use ElasticNet when you want the benefits of both Lasso and Ridge ## Evaluation Metrics - Mean Squared Error (MSE) - Root Mean Squared Error (RMSE) - R² Score