• Applying Bayesian network approaches to study health outcomes

      Lee, Sun-Mi; Abbott, Patricia A., Ph.D., M.S. (2003)
      Background. In today's healthcare environment, the proliferation of information systems has facilitated the growth of large clinical and administration databases. Innovative knowledge discovery approaches in large healthcare databases via data mining techniques have been actively used in the analysis of these data. Of particular interest are Bayesian networks, which have recently emerged as powerful data mining algorithms for pattern recognition and classification. Purpose. The purpose of this study was to explore the feasibility of using Bayesian networks (BN) in studying health outcomes. The specific aims were to develop a BN model to identify predictors of limited health service utilization in HIV positive persons and evaluate the model by comparing it to the predictive performance of Naive Bayes (NB) and logistic regression (LG) models. Methods. This study used the HIV Cost and Services Utilization Study dataset consisting of 2,864 HIV positive adults. A total of 36 variables including two service utilization variables (hospitalization and outpatient visits) were selected. HUGIN Researcher(TM) 6.3 was used to develop the BN and NB models; SAS/STAT PROC LOGISTIC was used to develop LG models. Results. The BN model successfully captured relationships explaining complex patterns of human behavior in health service utilization. The area under the receiver operating characteristic curve (AUC) measuring the BN model's discriminatory power when predicting hospitalization was .72 (CI: .70, .74). The AUC of the BN model was statistically higher than that of the NB model (.68; CI: .66, .70), but no higher than that of the LG model (.70; CI: .67, .72) using the 8 variables from a previous study by Shapiro and colleagues (1999b). In a second analysis using the 10 influential variables discovered by the BN approach, the NB and LG performance improved (NB: .74 (CI: .72,.76); LG: .74 (CI: .72, .75)).;Conclusion/implication. The BN approaches contributed to the discovery of the influential predictors that lead to an increase of the models' predictive performance. This study provided new insight in working with large healthcare databases. When attempting to discover unknown relationships that might be missed by traditional analysis methods alone, investigators should consider the use of BNs.