<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">JDS</journal-id>
<journal-title-group><journal-title>Journal of Data Science</journal-title></journal-title-group>
<issn pub-type="epub">1683-8602</issn><issn pub-type="ppub">1680-743X</issn><issn-l>1680-743X</issn-l>
<publisher>
<publisher-name>School of Statistics, Renmin University of China</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">JDS1157</article-id>
<article-id pub-id-type="doi">10.6339/24-JDS1157</article-id>
<article-categories><subj-group subj-group-type="heading">
<subject>Data Science in Action</subject></subj-group></article-categories>
<title-group>
<article-title>Estimating Healthcare Expenditure Using Parametric Change Point Models</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Ghosh</surname><given-names>Indranil</given-names></name><xref ref-type="aff" rid="j_jds1157_aff_001">1</xref><xref ref-type="aff" rid="j_jds1157_aff_002">2</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Zheng</surname><given-names>Qi</given-names></name><xref ref-type="aff" rid="j_jds1157_aff_001">1</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Egger</surname><given-names>Michael E</given-names></name><xref ref-type="aff" rid="j_jds1157_aff_003">3</xref><xref ref-type="aff" rid="j_jds1157_aff_004">4</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Kong</surname><given-names>Maiying</given-names></name><email xlink:href="mailto:maiying.kong@louisville.edu">maiying.kong@louisville.edu</email><xref ref-type="aff" rid="j_jds1157_aff_001">1</xref><xref ref-type="aff" rid="j_jds1157_aff_004">4</xref><xref ref-type="corresp" rid="cor1">∗</xref>
</contrib>
<aff id="j_jds1157_aff_001"><label>1</label>Department of Bioinformatics and Biostatistics, School of Public Health and Information Sciences, <institution>University of Louisville</institution>, Louisville, Kentucky, <country>USA</country></aff>
<aff id="j_jds1157_aff_002"><label>2</label>Department of Biostatistics, <institution>Apellis Pharmaceuticals</institution>, Waltham, Massachusetts, <country>USA</country></aff>
<aff id="j_jds1157_aff_003"><label>3</label>The Hiram C. Polk, Jr., MD Department of Surgery, School of Medicine, <institution>University of Louisville</institution>, Louisville, Kentucky, <country>USA</country></aff>
<aff id="j_jds1157_aff_004"><label>4</label>Brown Cancer Center, <institution>University of Louisville</institution>, Louisville, Kentucky, <country>USA</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>∗</label>Corresponding author. Email: <ext-link ext-link-type="uri" xlink:href="mailto:maiying.kong@louisville.edu">maiying.kong@louisville.edu</ext-link>.</corresp>
</author-notes>
<pub-date pub-type="ppub"><year>2025</year></pub-date><pub-date pub-type="epub"><day>3</day><month>12</month><year>2024</year></pub-date><volume>23</volume><issue>3</issue><fpage>560</fpage><lpage>574</lpage><supplementary-material id="S1" content-type="document" xlink:href="jds1157_s001.pdf" mimetype="application" mime-subtype="pdf">
<caption>
<title>Supplementary Material</title>
<p>R Codes for Key Steps of the Case Study</p>
</caption>
</supplementary-material><history><date date-type="received"><day>1</day><month>7</month><year>2024</year></date><date date-type="accepted"><day>12</day><month>10</month><year>2024</year></date></history>
<permissions><copyright-statement>2025 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.</copyright-statement><copyright-year>2025</copyright-year>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
<abstract>
<p>Estimating healthcare expenditures is important for policymakers and clinicians. The expenditure of patients facing a life-threatening illness can often be segmented into four distinct phases: diagnosis, treatment, stable, and terminal phases. The diagnosis phase encompasses healthcare expenses incurred prior to the disease diagnosis, attributed to frequent healthcare visits and diagnostic tests. The second phase, following diagnosis, typically witnesses high expenditure due to various treatments, gradually tapering off over time and stabilizing into a stable phase, and eventually to a terminal phase. In this project, we introduce a pre-disease phase preceding the diagnosis phase, serving as a baseline for healthcare expenditure, and thus propose a five-phase to evaluate the healthcare expenditures. We use a piecewise linear model with three population-level change points and <inline-formula id="j_jds1157_ineq_001"><alternatives><mml:math>
<mml:mn>4</mml:mn>
<mml:mi mathvariant="italic">p</mml:mi></mml:math><tex-math><![CDATA[$4p$]]></tex-math></alternatives></inline-formula> subject-level parameters to capture expenditure trajectories and identify transitions between phases, where <italic>p</italic> is the number of covariates. To estimate the model’s coefficients, we apply generalized estimating equations, while a grid-search approach is used to estimate the change-point parameters by minimizing the residual sum of squares. In our analysis of expenditures for stages I–III pancreatic cancer patients using the SEER-Medicare database, we find that the diagnostic phase begins one month before diagnosis, followed by an initial treatment phase lasting three months. The stable phase continues until eight months before death, at which point the terminal phase begins, marked by a renewed increase in expenditures.</p>
</abstract>
<kwd-group>
<label>Keywords</label>
<kwd>changepoint models</kwd>
<kwd>healthcare expenditures</kwd>
<kwd>pancreatic cancer</kwd>
<kwd>phase-based expenditure</kwd>
<kwd>SEER-Medicare</kwd>
</kwd-group>
<funding-group><funding-statement>M.E. Egger and M. Kong thank the American Cancer Society for their generous support of this study (CSDG-22-125-01-HOPS). M. Kong also acknowledges the support from the Wendell Cherry Chair in Clinical Trial Research endowment funds at the University of Louisville, along with funding from the National Institute of Health (P30ES030283, R01HL158779, and P20GM155899). Q. Zheng appreciates the support from the National Institute of Health (R21AG070659) and the National Science Foundation (DMS-1952486).</funding-statement></funding-group>
</article-meta>
</front>
<back>
<ref-list id="j_jds1157_reflist_001">
<title>References</title>
<ref id="j_jds1157_ref_001">
<mixed-citation publication-type="journal"> <string-name><surname>Austin</surname> <given-names>PC</given-names></string-name> (<year>2011</year>). <article-title>An introduction to propensity score methods for reducing the effects of confounding in observational studies</article-title>. <source><italic>Multivariate Behavioral Research</italic></source>, <volume>46</volume>(<issue>3</issue>): <fpage>399</fpage>–<lpage>424</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1080/00273171.2011.568786" xlink:type="simple">https://doi.org/10.1080/00273171.2011.568786</ext-link></mixed-citation>
</ref>
<ref id="j_jds1157_ref_002">
<mixed-citation publication-type="journal"> <string-name><surname>Bang</surname> <given-names>H</given-names></string-name>, <string-name><surname>Tsiatis</surname> <given-names>AA</given-names></string-name> (<year>2002</year>). <article-title>Median regression with censored cost data</article-title>. <source><italic>Biometrics</italic></source>, <volume>58</volume>(<issue>3</issue>): <fpage>643</fpage>–<lpage>649</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1111/j.0006-341X.2002.00643.x" xlink:type="simple">https://doi.org/10.1111/j.0006-341X.2002.00643.x</ext-link></mixed-citation>
</ref>
<ref id="j_jds1157_ref_003">
<mixed-citation publication-type="journal"> <string-name><surname>Başer</surname> <given-names>O</given-names></string-name>, <string-name><surname>Gardiner</surname> <given-names>JC</given-names></string-name>, <string-name><surname>Bradley</surname> <given-names>CJ</given-names></string-name>, <string-name><surname>Given</surname> <given-names>CW</given-names></string-name> (<year>2004</year>). <article-title>Estimation from censored medical cost data</article-title>. <source><italic>Biometrical Journal: Journal of Mathematical Methods in Biosciences</italic></source>, <volume>46</volume>(<issue>3</issue>): <fpage>351</fpage>–<lpage>363</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1002/bimj.200210036" xlink:type="simple">https://doi.org/10.1002/bimj.200210036</ext-link></mixed-citation>
</ref>
<ref id="j_jds1157_ref_004">
<mixed-citation publication-type="journal"> <string-name><surname>Basu</surname> <given-names>A</given-names></string-name>, <string-name><surname>Polsky</surname> <given-names>D</given-names></string-name>, <string-name><surname>Manning</surname> <given-names>WG</given-names></string-name> (<year>2011</year>). <article-title>Estimating treatment effects on healthcare costs under exogeneity: is there a ‘magic bullet’?</article-title> <source><italic>Health Services and Outcomes Research Methodology</italic></source>, <volume>11</volume>(<issue>1–2</issue>): <fpage>1</fpage>–<lpage>26</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s10742-011-0072-8" xlink:type="simple">https://doi.org/10.1007/s10742-011-0072-8</ext-link></mixed-citation>
</ref>
<ref id="j_jds1157_ref_005">
<mixed-citation publication-type="journal"> <string-name><surname>Brown</surname> <given-names>ML</given-names></string-name>, <string-name><surname>Riley</surname> <given-names>GF</given-names></string-name>, <string-name><surname>Schussler</surname> <given-names>N</given-names></string-name>, <string-name><surname>Etzioni</surname> <given-names>R</given-names></string-name> (<year>2002</year>). <article-title>Estimating health care costs related to cancer treatment from SEER-Medicare data</article-title>. <source><italic>Medical Care</italic></source>, <volume>40</volume>(<issue>8</issue>): <fpage>IV104</fpage>–<lpage>IV117</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1157_ref_006">
<mixed-citation publication-type="journal"> <string-name><surname>Enewold</surname> <given-names>L</given-names></string-name>, <string-name><surname>Parsons</surname> <given-names>H</given-names></string-name>, <string-name><surname>Zhao</surname> <given-names>L</given-names></string-name>, <string-name><surname>Bott</surname> <given-names>D</given-names></string-name>, <string-name><surname>Rivera</surname> <given-names>DR</given-names></string-name>, <string-name><surname>Barrett</surname> <given-names>MJ</given-names></string-name>, <etal>et al.</etal> (<year>2020</year>). <article-title>Updated overview of the SEER-Medicare data: enhanced content and applications</article-title>. <source><italic>JNCI Monographs</italic></source>, <volume>2020</volume>(<issue>55</issue>): <fpage>3</fpage>–<lpage>13</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1157_ref_007">
<mixed-citation publication-type="journal"> <string-name><surname>Inan</surname> <given-names>G</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>L</given-names></string-name> (<year>2017</year>). <article-title>PGEE: an R package for analysis of longitudinal data with high-dimensional covariates</article-title>. <source><italic>R Journal</italic></source>, <volume>9</volume>(<issue>1</issue>): <fpage>393</fpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.32614/RJ-2017-030" xlink:type="simple">https://doi.org/10.32614/RJ-2017-030</ext-link></mixed-citation>
</ref>
<ref id="j_jds1157_ref_008">
<mixed-citation publication-type="journal"> <string-name><surname>Klabunde</surname> <given-names>CN</given-names></string-name>, <string-name><surname>Potosky</surname> <given-names>AL</given-names></string-name>, <string-name><surname>Legler</surname> <given-names>JM</given-names></string-name>, <string-name><surname>Warren</surname> <given-names>JL</given-names></string-name> (<year>2000</year>). <article-title>Development of a comorbidity index using physician claims data</article-title>. <source><italic>Journal of Clinical Epidemiology</italic></source>, <volume>53</volume>(<issue>12</issue>): <fpage>1258</fpage>–<lpage>1267</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/S0895-4356(00)00256-0" xlink:type="simple">https://doi.org/10.1016/S0895-4356(00)00256-0</ext-link></mixed-citation>
</ref>
<ref id="j_jds1157_ref_009">
<mixed-citation publication-type="journal"> <string-name><surname>Li</surname> <given-names>J</given-names></string-name>, <string-name><surname>Handorf</surname> <given-names>E</given-names></string-name>, <string-name><surname>Bekelman</surname> <given-names>J</given-names></string-name>, <string-name><surname>Mitra</surname> <given-names>N</given-names></string-name> (<year>2016</year>). <article-title>Propensity score and doubly robust methods for estimating the effect of treatment on censored cost</article-title>. <source><italic>Statistics in Medicine</italic></source>, <volume>35</volume>(<issue>12</issue>): <fpage>1985</fpage>–<lpage>1999</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1002/sim.6842" xlink:type="simple">https://doi.org/10.1002/sim.6842</ext-link></mixed-citation>
</ref>
<ref id="j_jds1157_ref_010">
<mixed-citation publication-type="journal"> <string-name><surname>Lin</surname> <given-names>D</given-names></string-name>, <string-name><surname>Feuer</surname> <given-names>E</given-names></string-name>, <string-name><surname>Etzioni</surname> <given-names>R</given-names></string-name>, <string-name><surname>Wax</surname> <given-names>Y</given-names></string-name> (<year>1997</year>). <article-title>Estimating medical costs from incomplete follow-up data</article-title>. <source><italic>Biometrics</italic></source>, <volume>53</volume>(<issue>2</issue>): <fpage>419</fpage>–<lpage>434</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.2307/2533947" xlink:type="simple">https://doi.org/10.2307/2533947</ext-link></mixed-citation>
</ref>
<ref id="j_jds1157_ref_011">
<mixed-citation publication-type="journal"> <string-name><surname>Manning</surname> <given-names>WG</given-names></string-name>, <string-name><surname>Mullahy</surname> <given-names>J</given-names></string-name> (<year>2001</year>). <article-title>Estimating log models: to transform or not to transform?</article-title> <source><italic>Journal of Health Economics</italic></source>, <volume>20</volume>(<issue>4</issue>): <fpage>461</fpage>–<lpage>494</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/S0167-6296(01)00086-8" xlink:type="simple">https://doi.org/10.1016/S0167-6296(01)00086-8</ext-link></mixed-citation>
</ref>
<ref id="j_jds1157_ref_012">
<mixed-citation publication-type="journal"> <string-name><surname>Mihaylova</surname> <given-names>B</given-names></string-name>, <string-name><surname>Briggs</surname> <given-names>A</given-names></string-name>, <string-name><surname>O’Hagan</surname> <given-names>A</given-names></string-name>, <string-name><surname>Thompson</surname> <given-names>SG</given-names></string-name> (<year>2011</year>). <article-title>Review of statistical methods for analysing healthcare resources and costs</article-title>. <source><italic>Health Economics</italic></source>, <volume>20</volume>(<issue>8</issue>): <fpage>897</fpage>–<lpage>916</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1002/hec.1653" xlink:type="simple">https://doi.org/10.1002/hec.1653</ext-link></mixed-citation>
</ref>
<ref id="j_jds1157_ref_013">
<mixed-citation publication-type="other"> <string-name><surname>NCI</surname></string-name> (<year>2014</year>). SEER-medicare: Selecting the appropriate comorbidity SAS macro.</mixed-citation>
</ref>
<ref id="j_jds1157_ref_014">
<mixed-citation publication-type="journal"> <string-name><surname>Paulus</surname> <given-names>MT</given-names></string-name>, <string-name><surname>Claridge</surname> <given-names>DE</given-names></string-name>, <string-name><surname>Culp</surname> <given-names>C</given-names></string-name> (<year>2015</year>). <article-title>Algorithm for automating the selection of a temperature dependent change point model</article-title>. <source><italic>Energy and Buildings</italic></source>, <volume>87</volume>: <fpage>95</fpage>–<lpage>104</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.enbuild.2014.11.033" xlink:type="simple">https://doi.org/10.1016/j.enbuild.2014.11.033</ext-link></mixed-citation>
</ref>
<ref id="j_jds1157_ref_015">
<mixed-citation publication-type="journal"> <string-name><surname>Reeves</surname> <given-names>J</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>J</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>XL</given-names></string-name>, <string-name><surname>Lund</surname> <given-names>R</given-names></string-name>, <string-name><surname>Lu</surname> <given-names>QQ</given-names></string-name> (<year>2007</year>). <article-title>A review and comparison of changepoint detection techniques for climate data</article-title>. <source><italic>Journal of Applied Meteorology and Climatology</italic></source>, <volume>46</volume>(<issue>6</issue>): <fpage>900</fpage>–<lpage>915</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1175/JAM2493.1" xlink:type="simple">https://doi.org/10.1175/JAM2493.1</ext-link></mixed-citation>
</ref>
<ref id="j_jds1157_ref_016">
<mixed-citation publication-type="journal"> <string-name><surname>Roth</surname> <given-names>WE</given-names></string-name> (<year>1934</year>). <article-title>On direct product matrices</article-title>. <source><italic>Bulletin of the American Mathematical Society</italic></source>, <volume>40</volume>(<issue>6</issue>): <fpage>461</fpage>–<lpage>468</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1090/S0002-9904-1934-05899-3" xlink:type="simple">https://doi.org/10.1090/S0002-9904-1934-05899-3</ext-link></mixed-citation>
</ref>
<ref id="j_jds1157_ref_017">
<mixed-citation publication-type="journal"> <string-name><surname>Tramontano</surname> <given-names>AC</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Watson</surname> <given-names>TR</given-names></string-name>, <string-name><surname>Eckel</surname> <given-names>A</given-names></string-name>, <string-name><surname>Sheehan</surname> <given-names>DF</given-names></string-name>, <string-name><surname>Peters</surname> <given-names>MLB</given-names></string-name>, et al. (<year>2019</year>). <article-title>Pancreatic cancer treatment costs, including patient liability, by phase of care and treatment modality, 2000–2013</article-title>. <source><italic>Medicine</italic></source>, <volume>98</volume>(<issue>49</issue>): <elocation-id>e18082</elocation-id>.</mixed-citation>
</ref>
<ref id="j_jds1157_ref_018">
<mixed-citation publication-type="other"> US Department of Labor Bureau of Labor Statistic (2021). Consumer price index data.</mixed-citation>
</ref>
<ref id="j_jds1157_ref_019">
<mixed-citation publication-type="journal"> <string-name><surname>Wang</surname> <given-names>L</given-names></string-name>, <string-name><surname>Zhou</surname> <given-names>J</given-names></string-name>, <string-name><surname>Qu</surname> <given-names>A</given-names></string-name> (<year>2012</year>). <article-title>Penalized generalized estimating equations for high-dimensional longitudinal data analysis</article-title>. <source><italic>Biometrics</italic></source>, <volume>68</volume>(<issue>2</issue>): <fpage>353</fpage>–<lpage>360</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1111/j.1541-0420.2011.01678.x" xlink:type="simple">https://doi.org/10.1111/j.1541-0420.2011.01678.x</ext-link></mixed-citation>
</ref>
<ref id="j_jds1157_ref_020">
<mixed-citation publication-type="journal"> <string-name><surname>Wijeysundera</surname> <given-names>HC</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Tomlinson</surname> <given-names>G</given-names></string-name>, <string-name><surname>Ko</surname> <given-names>DT</given-names></string-name>, <string-name><surname>Krahn</surname> <given-names>MD</given-names></string-name> (<year>2012</year>). <article-title>Techniques for estimating health care costs with censored data: an overview for the health services researcher</article-title>. <source><italic>ClinicoEconomics and Outcomes Research: CEOR</italic></source>, <volume>4</volume>: <fpage>145</fpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.2147/CEOR.S31552" xlink:type="simple">https://doi.org/10.2147/CEOR.S31552</ext-link></mixed-citation>
</ref>
</ref-list>
</back>
</article>
