Sequential and robust data selection in active learning for classification

Active learning has become a popular learning process for classification. By selecting the most beneficial training data, an active classifier achieves better classification accuracy than a passive classifier. In this paper, we first investigate the methods of robustifying optimal active learning processes, via either a sequential approach or taking consideration of the classifiers possibly developed from a misspecified model. A comparison study has been presented for the classifiers obtained by a two-stage learning and a sequential learning as proposed and it indicates that the sequential method generally outperforms its competitor. Then, we further analyze the sensitivities of three different classifiers (linear discriminant classifier, quadratic discriminant classifier, and logistic regression classifier) in active learning for classification purpose. Our analysis reveals that the logistic regression classifier is sensitive to the misspecification involved in the assumed logistic model whereas the linear discriminant classifier is relatively robust to moderate violations of assumed homscedasticity.