Review and evaluation of the concordance measures for assessing discrimination in the logistic regression methods

Concordance statistic (C-statistic), which is equivalent to the area under a receiver operating characteristic curve (AUC), is frequently used to quantify the discriminatory power (the ability of the model to distinguish low and high risk patient) of a risk prediction model developed in the logistic regression framework. Several methods for estimating concordance statistics including both non-parametric and parametric have been proposed in the literature. Despite the several proposals of the C-statistic, it is still unclear to the practical users which approaches should be applied in practice. This paper reviewed and evaluated some commonly used C-statistics by illustrating them using two datasets with different prognostic abilities and an extensive simulation study and compared their results to make some practical recommendations. Several simulation scenarios were considered by varying the sample size, prevalence of the binary outcome, and distribution of prognostic index (or log-odds) derived from the model, to mimic the scenarios in practice. The results revealed that both non-parametric and Kernel-smoothing based methods showed comparable results in most simulation scenarios but performed better than the parametric approach particularly for small sample situation and skewed distribution of the prognostic index. Based on the findings of the study, some practical recommendations are discussed.