Estimating Healthcare Expenditure Using Parametric Change Point Models
Pub. online: 3 December 2024
Type: Data Science In Action
Open Access
Received
1 July 2024
1 July 2024
Accepted
12 October 2024
12 October 2024
Published
3 December 2024
3 December 2024
Abstract
Estimating healthcare expenditures is important for policymakers and clinicians. The expenditure of patients facing a life-threatening illness can often be segmented into four distinct phases: diagnosis, treatment, stable, and terminal phases. The diagnosis phase encompasses healthcare expenses incurred prior to the disease diagnosis, attributed to frequent healthcare visits and diagnostic tests. The second phase, following diagnosis, typically witnesses high expenditure due to various treatments, gradually tapering off over time and stabilizing into a stable phase, and eventually to a terminal phase. In this project, we introduce a pre-disease phase preceding the diagnosis phase, serving as a baseline for healthcare expenditure, and thus propose a five-phase to evaluate the healthcare expenditures. We use a piecewise linear model with three population-level change points and $4p$ subject-level parameters to capture expenditure trajectories and identify transitions between phases, where p is the number of covariates. To estimate the model’s coefficients, we apply generalized estimating equations, while a grid-search approach is used to estimate the change-point parameters by minimizing the residual sum of squares. In our analysis of expenditures for stages I–III pancreatic cancer patients using the SEER-Medicare database, we find that the diagnostic phase begins one month before diagnosis, followed by an initial treatment phase lasting three months. The stable phase continues until eight months before death, at which point the terminal phase begins, marked by a renewed increase in expenditures.
References
Austin PC (2011). An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behavioral Research, 46(3): 399–424. https://doi.org/10.1080/00273171.2011.568786
Bang H, Tsiatis AA (2002). Median regression with censored cost data. Biometrics, 58(3): 643–649. https://doi.org/10.1111/j.0006-341X.2002.00643.x
Başer O, Gardiner JC, Bradley CJ, Given CW (2004). Estimation from censored medical cost data. Biometrical Journal: Journal of Mathematical Methods in Biosciences, 46(3): 351–363. https://doi.org/10.1002/bimj.200210036
Basu A, Polsky D, Manning WG (2011). Estimating treatment effects on healthcare costs under exogeneity: is there a ‘magic bullet’? Health Services and Outcomes Research Methodology, 11(1–2): 1–26. https://doi.org/10.1007/s10742-011-0072-8
Inan G, Wang L (2017). PGEE: an R package for analysis of longitudinal data with high-dimensional covariates. R Journal, 9(1): 393. https://doi.org/10.32614/RJ-2017-030
Klabunde CN, Potosky AL, Legler JM, Warren JL (2000). Development of a comorbidity index using physician claims data. Journal of Clinical Epidemiology, 53(12): 1258–1267. https://doi.org/10.1016/S0895-4356(00)00256-0
Li J, Handorf E, Bekelman J, Mitra N (2016). Propensity score and doubly robust methods for estimating the effect of treatment on censored cost. Statistics in Medicine, 35(12): 1985–1999. https://doi.org/10.1002/sim.6842
Lin D, Feuer E, Etzioni R, Wax Y (1997). Estimating medical costs from incomplete follow-up data. Biometrics, 53(2): 419–434. https://doi.org/10.2307/2533947
Manning WG, Mullahy J (2001). Estimating log models: to transform or not to transform? Journal of Health Economics, 20(4): 461–494. https://doi.org/10.1016/S0167-6296(01)00086-8
Mihaylova B, Briggs A, O’Hagan A, Thompson SG (2011). Review of statistical methods for analysing healthcare resources and costs. Health Economics, 20(8): 897–916. https://doi.org/10.1002/hec.1653
Paulus MT, Claridge DE, Culp C (2015). Algorithm for automating the selection of a temperature dependent change point model. Energy and Buildings, 87: 95–104. https://doi.org/10.1016/j.enbuild.2014.11.033
Reeves J, Chen J, Wang XL, Lund R, Lu QQ (2007). A review and comparison of changepoint detection techniques for climate data. Journal of Applied Meteorology and Climatology, 46(6): 900–915. https://doi.org/10.1175/JAM2493.1
Roth WE (1934). On direct product matrices. Bulletin of the American Mathematical Society, 40(6): 461–468. https://doi.org/10.1090/S0002-9904-1934-05899-3
Wang L, Zhou J, Qu A (2012). Penalized generalized estimating equations for high-dimensional longitudinal data analysis. Biometrics, 68(2): 353–360. https://doi.org/10.1111/j.1541-0420.2011.01678.x
Wijeysundera HC, Wang X, Tomlinson G, Ko DT, Krahn MD (2012). Techniques for estimating health care costs with censored data: an overview for the health services researcher. ClinicoEconomics and Outcomes Research: CEOR, 4: 145. https://doi.org/10.2147/CEOR.S31552