Using external data to incorporate unmeasured confounders: A plasmode simulation study comparing alternative approaches to impute body mass index in a study of the relationship between osteoarthritis and cardiovascular disease

Background: Administrative databases do not contain Body Mass Index (BMI) information. In proportion-based imputation (PBI) technique, a BMI category is assigned to an individual according to the proportions observed in external survey data. Alternatively, BMI can be imputed using Multiple Imputation (MI).

Objectives: To compare MI with PBI to impute BMI variable in osteoarthritis (OA)cardiovascular disease (CVD) relationship.

Research Design: plasmode simulation study.

Subjects: used publicly available data from the Canadian Community Health Survey (CCHS) cycles 1.1, 2.1, and 3.1.

Measures: BMI was set missing for everyone in the 500 simulated data created from CCHS 3.1 data. Dataset compiled from CCHS cycles 1.1 and 2.1 served as the external data (BMI observed). BMI missing in copies of simulated data was imputed using MI and PBI accessing observed BMI information in external data. After imputation, distribution of BMI variable and the adjusted odds ratio (aOR) estimated from multivariable logistic regression model were compared.

Results: Compared to PBI, MI produced proportions of individuals closer to the known proportions across the BMI categories except for the overweight category. Considering the known aOR of 1.59 (1.36, 1.82), BMI imputed using MI introduced less bias in OACVD association compared to PBI, the aOR was 1.62 (1.39, 1.86) and 1.66 (1.41, 1.90), respectively.

Conclusions: This is the first study to compare MI with PBI in the context of imputing BMI information that is not recorded at the database level. MI was superior to imputation method based on population-level proportions in imputing BMI missing for everyone in the simulated datasets.