Credit Card Fraud Detection Using Machine Learning: A Case Study on Imbalanced Data
DOI:
https://doi.org/10.61173/6hn7wg02Keywords:
Credit card fraud, machine learning, Class imbalance, Fraud detection, XGBoostAbstract
Credit card fraud remains a serious problem for financial institutions and consumers worldwide. The rapid growth of online and cashless payments offers opportunities for fraudulent activity. Although fraudulent activity is not common, it can cause serious economic losses and damage clients’ trust. Traditional rule-based systems often fail to capture complex and evolving fraud patterns, making machine learning (ML) methods a promising alternative. This paper takes the case study of the widely used Kaggle credit card fraud detection dataset as an example, which contains 284,807 transactions, of which only 492 cases of fraud (<0.2%). This paper explored data preprocessing, category imbalance processing, and a variety of machine learning models, including logical regression, decision trees, random forests and eXtreme Gradient Boosting (XGBoost). The results show that the integrated method (especially XGBoost) performs better than the baseline model, with the recipient operating feature curve (AUC) below AUC of 0.98, which achieves a better balance between recall and accuracy. In addition to technical results, the study highlights its commercial implications for reducing financial risks, improving efficiency and customer trust. The results show that integrating machine learning into fraud detection systems has both analytical and strategic advantages, although continuous updates and real-time adaptation are still essential.