Show simple item record

dc.contributor.authorRueda, Juan-David
dc.date.accessioned2019-06-25T13:39:26Z
dc.date.available2019-06-25T13:39:26Z
dc.date.issued2019en_US
dc.identifier.urihttp://hdl.handle.net/10713/9813
dc.description2019
dc.descriptionPharmaceutical Health Services Research
dc.descriptionUniversity of Maryland, Baltimore
dc.descriptionPh.D.
dc.description.abstractObjective: To compare new alternatives to estimate health care costs in the presence of missing data using methods based on machine-learning (ML). Introduction: Costs must be correctly estimated for value assessment and budget calculations. Problems arise when they are not correctly estimated. Sometimes costs can be biased and lead to wrong decisions that affect population health. Cost estimation is a challenging task and it is more challenging in the presence of missing data. Methods: We used Surveillance, Epidemiology, and End Results program (SEER)-Medicare including patients with multiple myeloma newly diagnosed from 2007-2013. We explored the problem of missing data using different approaches creating artificial missing data. We hypothesized that the use of ML techniques improves the prediction of mean medical total costs in the presence of missingness. ML methods included support vector machines, boosting, random forest, and classification and regression trees. First, we analyzed the problem considering only one dimension, when one variable is missing in a cross-sectional scenario, using generalized linear models as a comparator against ML. Then, we added time as a factor for missingness, utilizing reweighted estimators against ML. Finally, we explored the different levels of censoring and determined how each censoring level affected our cost estimations. In this case, we created multiple linear spline models to establish the effect of censoring on the bias of the estimator. Results: We demonstrated that ML algorithms had better prediction when data were missing completely at random and missing at random. All the methods performed badly in the missing not at random scenario. In the second aim, we showed that ML-based methods predict just as well as reweighted estimators for the five-year total cost of a patient with multiple myeloma. Lastly, we found that ML methods are consistent and robust at low and moderate levels of censoring; however, we failed to prove that they are better than the reweighted estimators. Conclusions: ML-based methods are a good alternative for the prediction of missing cost data in the case of cross-sectional and longitudinal data.
dc.subjectclaims analysisen_US
dc.subjectmissing dataen_US
dc.subjectpredictive modelingen_US
dc.subjectstatistical learningen_US
dc.subject.meshCosts and Cost Analysisen_US
dc.subject.meshMachine Learningen_US
dc.titleApplication of Machine Learning Algorithms for Predicting Missing Cost Data
dc.typedissertationen_US
dc.date.updated2019-06-24T16:04:22Z
dc.language.rfc3066en
dc.contributor.advisorSlejko, Julia
dc.contributor.orcid0000-0002-0907-7106en_US
refterms.dateFOA2019-06-25T13:39:27Z


Files in this item

Thumbnail
Name:
Rueda_umaryland_0373D_153/list ...
Embargo:
2020-01-01
Size:
126.8Kb
Format:
PDF
Thumbnail
Name:
Rueda_umaryland_0373D_11067.pdf
Embargo:
2020-01-01
Size:
4.767Mb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record