Abstract

Fraud in financial services—especially account opening fraud—poses major operational and reputational risks. Static rules struggle to adapt to evolving tactics, missing novel patterns and generating excessive false positives. Machine learning promises adaptive detection, but deployment faces severe class imbalance: in the NeurIPS 2022 BAF Base benchmark used here, fraud prevalence is 1.10%. Standard metrics (accuracy, f1_weighted) can look strong while doing little for the minority class. We compare Logistic Regression, SVM (RBF), Random Forest, LightGBM, and a GRU model on N = 1,000,000 accounts under a unified preprocessing pipeline. All models are trained to minimize their loss function, while configurations are selected on a stratified development set using validation-weighted F1-score f1_weighted. For the four classical models, class weighting in the loss (class_weight ∈{None,‘balanced’}) is treated as a hyperparameter and tuned. Similarly, the GRU is trained with a fixed class-weighted CrossEntropy loss that up-weights fraud cases. This ensures that both model families leverage weighted training objectives, while their final hyperparameters are consistently selected by the f1_weighted metric. Despite similar AUCs and aligned feature importance across families, the classical models converge to high-precision, low-recall solutions (1–6% fraud recall), whereas the GRU recovers 78% recall at 5% precision (AUC =0.8800). Under extreme imbalance, objective choice and operating point matter at least as much as architecture.

Affiliated Institutions

Related Publications

Publication Info

Year
2025
Type
article
Volume
13
Issue
12
Pages
290-290
Citations
0
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

0
OpenAlex

Cite This

Qiang Shen, Yijun Gao, Qingqing Mao et al. (2025). Objective over Architecture: Fraud Detection Under Extreme Imbalance in Bank Account Opening. Computation , 13 (12) , 290-290. https://doi.org/10.3390/computation13120290

Identifiers

DOI
10.3390/computation13120290