Title: Using Random Forest to Learn Imbalanced Data Author: Chao Chen, Andy Liaw and Leo Breiman Date: July 2004 Pub: PDF Url: http://www.stat.berkeley.edu/users/chenchao/666.pdf Abstract: In this paper we propose two ways to deal with the imbalanced data classification problem using random forest. One is based on cost sensitive learning, and the other is based on a sampling technique. Performance metrics such as precision and recall, false positive rate and false negative rate, $F$-measure and weighted accuracy are computed. Both methods are shown to improve the prediction accuracy of the minority class, and have favorable performance compared to the existing algorithms.