In the realm of machine learning, class imbalance is a pervasive challenge that can distort model performance and lead to biased predictions. The Synthetic Minority Over-sampling Technique (SMOTE) is a popular strategy employed to mitigate this issue by artificially augmenting the minority class in a dataset. This publication explores a comprehensive study conducted on 31 imbalanced datasets, with the minority class percentages ranging from 1.4% to 19.3%. We rigorously analyzed the impact of SMOTE on 15 different binary classification models trained across these datasets. By comparing the performance of models trained with and without SMOTE, this analysis aims to reveal detailed insights into how SMOTE consistently improves certain metrics, such as recall, and its overall influence on model outcomes. Our findings provide robust evidence of SMOTE's effectiveness, offering valuable insights for practitioners seeking to enhance their models for more equitable results.
To understand the impact of the Synthetic Minority Over-sampling Technique (SMOTE) on machine learning models, we conducted an extensive comparison using 15 binary classification models across 31 imbalanced datasets. Each model was trained twice on every dataset: once with SMOTE applied to balance class distribution, and once without it. The primary goal was to observe and quantify the difference SMOTE makes in handling class imbalances and to evaluate its efficacy across varying conditions.
We focused on five key performance metrics to assess the impact of SMOTE: accuracy, precision, recall, F1-score, and AUC score. For each metric, predictions from models trained with and without SMOTE were compared to determine which scenario yielded better performance. This was quantified by calculating the percentage of instances where the application of SMOTE led to superior outcomes compared to the no-SMOTE scenario.
The results of the paired t-tests were as follows:
Metric | Statistic | P-Value |
---|---|---|
accuracy | 5.984707 | 1.455182e-06 |
f1_score | -2.644405 | 1.289257e-02 |
precision | 5.616030 | 4.095170e-06 |
recall | -6.775046 | 1.635219e-07 |
auc_score | 1.427832 | 1.636681e-01 |
The following bar chart shows the direction of the change per metric:
(placeholder for bar chart)
To rigorously determine whether the differences observed between the SMOTE and no-SMOTE scenarios were statistically significant, we applied the paired t-test to the aggregated results. This test was essential to validate the impact of SMOTE beyond observational metrics by confirming whether the changes were due to the technique or could be attributed to random variations in model performance.
For each metric, we calculated the mean scores across all 15 models for every dataset. This approach allowed us to treat the mean of these scores as a single observation per dataset for each metric. We then compared these aggregated mean scores between the SMOTE and no-SMOTE scenarios using a paired t-test, which is particularly suited for this type of analysis where two related samples are compared (in our case, the same models and datasets with and without SMOTE).
There are no datasets linked
There are no datasets linked