## Overview Support Vector Machines (SVM) is a powerful supervised learning algorithm used for classification and regression tasks. It finds the optimal hyperplane that maximizes the margin between different classes. ## Key Components 1. **Hyperplane** - A decision boundary that separates classes 2. **Support Vectors** - Data points closest to the hyperplane that influence its position 3. **Margin** - The distance between the hyperplane and the nearest support vectors 4. **Kernel Trick** - Transforms non-linearly separable data into higher dimensions for better separation ## How It Works 1. **Define the Hyperplane** - Finds the best decision boundary that maximizes the margin 2. **Identify Support Vectors** - Determines key data points influencing the hyperplane 3. **Apply Kernel Trick (if needed)** - Converts data into a higher-dimensional space for better separation 4. **Optimize the Cost Function** - Uses techniques like Quadratic Programming or Gradient Descent ## Implementation Example ```python from sklearn.svm import SVC from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # Initialize model svm = SVC(kernel='rbf', C=1.0, gamma='scale') # Train model svm.fit(X_train, y_train) # Make predictions predictions = svm.predict(X_test) accuracy = accuracy_score(y_test, predictions) print(f'Accuracy: {accuracy:.2f}') ``` ## Advantages - Effective for high-dimensional data - Works well with small datasets - Robust to overfitting with proper regularization - Can handle both linear and non-linear classification using kernels ## Disadvantages - Computationally expensive for large datasets - Difficult to tune hyperparameters - Less interpretable compared to decision trees ## Hyperparameters 1. **Kernel Function (`kernel`)** - `linear`: Works well for linearly separable data - `rbf`: Best for non-linearly separable data - `poly`: Polynomial kernel for complex decision boundaries 2. **Regularization Parameter (`C`)** - Controls trade-off between margin size and misclassification - High `C` → Low bias, may overfit - Low `C` → High bias, better generalization 3. **Gamma (`gamma`)** (for RBF and Polynomial kernels) - Defines how far an influence reaches - High `gamma` → More complex model, risk of overfitting - Low `gamma` → Simpler model, risk of underfitting ## Best Practices 1. **Feature Scaling** - Normalize data for better performance 2. **Kernel Selection** - Use `rbf` for non-linear problems - Use `linear` when data is already well-separated 3. **Hyperparameter Tuning** - Use Grid Search or Randomized Search for best `C` and `gamma` values ## Common Applications - Image classification - Text categorization - Medical diagnosis - Spam detection - Handwriting recognition ## Performance Optimization 1. **Reduce Overfitting** - Tune `C` and `gamma` for better generalization 2. **Speed Up Computation** - Use Linear SVM (`kernel='linear'`) for large datasets - Use Approximate SVM (SGDClassifier) for very large data 3. **Feature Engineering** - Use PCA or feature selection to remove redundant features ## Evaluation Metrics - Accuracy - Precision - Recall - F1 Score - ROC-AUC curve