Learning from Imbalanced Data | RDL Research Database

Abstract

With the continuous expansion of data availability in many large-scale, complex, and networked systems, such as surveillance, security, Internet, and finance, it becomes critical to advance the fundamental understanding of knowledge discovery and analysis from raw data to support decision-making processes. Although existing knowledge discovery and data engineering techniques have shown great success in many real-world applications, the problem of learning from imbalanced data (the imbalanced learning problem) is a relatively new challenge that has attracted growing attention from both academia and industry. The imbalanced learning problem is concerned with the performance of learning algorithms in the presence of underrepresented data and severe class distribution skews. Due to the inherent complex characteristics of imbalanced data sets, learning from such data requires new understandings, principles, algorithms, and tools to transform vast amounts of raw data efficiently into information and knowledge representation. In this paper, we provide a comprehensive review of the development of research in learning from imbalanced data. Our focus is to provide a critical review of the nature of the problem, the state-of-the-art technologies, and the current assessment metrics used to evaluate learning performance under the imbalanced learning scenario. Furthermore, in order to stimulate future research in this field, we also highlight the major opportunities and challenges, as well as potential important research directions for learning from imbalanced data.

Keywords

Computer scienceData scienceRaw dataMachine learningArtificial intelligenceBig dataField (mathematics)Data mining

Affiliated Institutions

Stevens Institute of Technology US

Related Publications

Statistical pattern recognition: a review

Anil K. Jain , Peter Duin , Jianchang Mao

The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated...

2000 IEEE Transactions on Pattern Analysis... 6667 citations

Object Detection With Deep Learning: A Review

Zhong‐Qiu Zhao , Peng Zheng , Shou-Tao Xu +1 more

Due to object detection's close relationship with video analysis and image understanding, it has attracted much research attention in recent years. Traditional object detection ...

2019 IEEE Transactions on Neural Networks ... 5019 citations

Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors)

Jerome H. Friedman , Trevor Hastie , Robert Tibshirani

Boosting is one of the most important recent developments in\nclassification methodology. Boosting works by sequentially applying a\nclassification algorithm to reweighted versi...

2000 The Annals of Statistics 6819 citations

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

Alex Wang , Amanpreet Singh , Julian Michael +3 more

Human ability to understand language is general, flexible, and robust. In contrast, most NLU models above the word level are designed for a specific task and struggle with out-o...

2018 3699 citations

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Laith Alzubaidi , Jinglan Zhang , Amjad J. Humaidi +7 more

In the last few years, the deep learning (DL) computing paradigm has been deemed the Gold Standard in the machine learning (ML) community. Moreover, it has gradually become the ...

2021 Journal Of Big Data 6563 citations

Publication Info

Year: 2009
Type: article
Volume: 21
Issue: 9
Pages: 1263-1284
Citations: 8871
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Learning from Imbalanced Data

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

8871

OpenAlex

Cite This

APA Style

                            
                                    Haibo He, 
                                
                                    Edwardo A. Garcia
                                
                            (2009). 
                            Learning from Imbalanced Data. 
                            IEEE Transactions on Knowledge and Data Engineering
                            , 21
                            (9)
                            , 1263-1284.
                            https://doi.org/10.1109/tkde.2008.239

Identifiers

DOI: 10.1109/tkde.2008.239