Synthetic Minority Oversampling Technique (SMOTE) is a [[data augmentation]] technique for imbalanced datasets.
SMOTE was first described by [Nitesh Chawla](https://www3.nd.edu/~nchawla/), et al. in their 2002 paper named for the technique titled [SMOTE: Synthetic Minority Over-sampling Technique](https://arxiv.org/abs/1106.1813).
SMOTE
1. take a random sample from the minority class,
2. randomly select one of the k-nearest neighbors using [[k-nearest neighbors regression|KNN]] to identify nearby points
3. generate a synthetic data point between the two examples in the feature space
SMOTE can be used to generate as many synthetic data points as needed. Typically, SMOTE is used in conjunction with random undersampling of the majority class.
> [!Tip]- Additional Resources
> - [SMOTE for Imbalanced Classification with Python](https://machinelearningmastery.com/smote-oversampling-for-imbalanced-classification/) | Machine Learning Mastery