{"id":696,"date":"2018-01-27T14:24:54","date_gmt":"2018-01-27T14:24:54","guid":{"rendered":"http:\/\/jsr.isrt.ac.bd\/?post_type=article&p=696"},"modified":"2018-01-27T14:25:00","modified_gmt":"2018-01-27T14:25:00","slug":"comparison-internal-validation-methods-validating-predictive-models-binary-data-rare-events","status":"publish","type":"article","link":"http:\/\/jsr.isrt.ac.bd\/article\/comparison-internal-validation-methods-validating-predictive-models-binary-data-rare-events\/","title":{"rendered":"A comparison of internal validation methods for validating predictive models for binary data with rare events"},"content":{"rendered":"
In clinical research, prediction models for binary data are frequently developed
\nin logistic regression framework to predict the risk of patient’s health status such
\nas death and illness. However, when the outcome is rare, the maximum likeli-
\nhood (ML) based standard logistic regression has been reported to show poor
\npredictive performance by providing over\ffitted model. To overcome this, penal-
\nized maximum likelihood (PML) based logistic models are being widely used in
\nrisk prediction, however, their predictive performance in validation settings is
\nnot well-documented. Several validation approaches, namely split-sample, cross-
\nvalidation, bootstrap validation and its two variants 0.632 and 0.632+, have been
\nwidely used to validate the performance of a prediction model, however, it is also
\nunclear which one of these approaches best for estimating accurate predictive
\nperformance of a rare-outcome model. This paper focused on evaluating pre-
\ndictive performance of PML based logistic model in such validation settings in
\ncomparison with ML based standard model and identifying the effective valida-
\ntion method. An extensive simulation study was performed by creating several
\nscenarios to re ect modeling in dataset with few events. The results revealed that
\nPML based model showed better performance by reducing over\ftting to some ex-
\ntent and increasing discriminatory ability over ML based model, irrespective of
\nvalidation methods under study. Of the validation methods, regular bootstrap
\nand its variants 0.632 and 0.632+, particularly 0.632+, performed well by provid-
\ning nearly accurate and stable estimate of the true predictive performance. We
\nalso illustrated the methods applying them to cardiac data set with few events.<\/p>\n