<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">JDS</journal-id>
<journal-title-group><journal-title>Journal of Data Science</journal-title></journal-title-group>
<issn pub-type="epub">1683-8602</issn><issn pub-type="ppub">1680-743X</issn><issn-l>1680-743X</issn-l>
<publisher>
<publisher-name>School of Statistics, Renmin University of China</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">JDS1165</article-id>
<article-id pub-id-type="doi">10.6339/25-JDS1165</article-id>
<article-categories><subj-group subj-group-type="heading">
<subject>Statistical Data Science</subject></subj-group></article-categories>
<title-group>
<article-title>Mortgage Prepayment Modeling via a Smoothing Spline State Space Model</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Lu</surname><given-names>Haoran</given-names></name><xref ref-type="aff" rid="j_jds1165_aff_001">1</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Cheng</surname><given-names>Huimin</given-names></name><xref ref-type="aff" rid="j_jds1165_aff_002">2</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Wang</surname><given-names>Ye</given-names></name><xref ref-type="aff" rid="j_jds1165_aff_001">1</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Xie</surname><given-names>Yaoguo</given-names></name><xref ref-type="aff" rid="j_jds1165_aff_003">3</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Yan</surname><given-names>Huan</given-names></name><xref ref-type="aff" rid="j_jds1165_aff_003">3</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Wang</surname><given-names>Xindong</given-names></name><xref ref-type="aff" rid="j_jds1165_aff_004">4</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Ma</surname><given-names>Ping</given-names></name><email xlink:href="mailto:pingma@uga.edu">pingma@uga.edu</email><xref ref-type="aff" rid="j_jds1165_aff_001">1</xref><xref ref-type="corresp" rid="cor1">∗</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Zhong</surname><given-names>Wenxuan</given-names></name><email xlink:href="mailto:wenxuan@uga.edu">wenxuan@uga.edu</email><xref ref-type="aff" rid="j_jds1165_aff_001">1</xref><xref ref-type="corresp" rid="cor1">∗</xref>
</contrib>
<aff id="j_jds1165_aff_001"><label>1</label>Department of Statistics, <institution>University of Georgia</institution>, Athens, GA 30602, <country>United States</country></aff>
<aff id="j_jds1165_aff_002"><label>2</label>Department of Biostatistics, <institution>Boston University</institution>, Boston, MA 02118, <country>United States</country></aff>
<aff id="j_jds1165_aff_003"><label>3</label>Model Risk Management, <institution>Wells Fargo</institution>, Charlotte, NC 28202, <country>United States</country></aff>
<aff id="j_jds1165_aff_004"><label>4</label>Model Risk Management, <institution>Wells Fargo</institution>, McLean, VA 22102, <country>United States</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>∗</label>Corresponding author. Email: <ext-link ext-link-type="uri" xlink:href="mailto:pingma@uga.edu">pingma@uga.edu</ext-link> or <ext-link ext-link-type="uri" xlink:href="mailto:wenxuan@uga.edu">wenxuan@uga.edu</ext-link>.</corresp>
</author-notes>
<pub-date pub-type="ppub"><year>2025</year></pub-date><pub-date pub-type="epub"><day>30</day><month>1</month><year>2025</year></pub-date><volume>23</volume><issue>4</issue><fpage>578</fpage><lpage>591</lpage><supplementary-material id="S1" content-type="archive" xlink:href="jds1165_s001.zip" mimetype="application" mime-subtype="x-zip-compressed">
<caption>
<title>Supplementary Material</title>
<p>Some details of the EM algorithm for QuadS are provided in Appendix A. The code and instructions of the QuadS method are available on GitHub (<uri>https://github.com/haoranlustat/QuadS</uri>). The dataset used in the case study is publicly available from Fannie Mae Data Dynamics (<uri>https://capitalmarkets.fanniemae.com/tools-applications/data-dynamics</uri>).</p>
</caption>
</supplementary-material><history><date date-type="received"><day>25</day><month>8</month><year>2024</year></date><date date-type="accepted"><day>6</day><month>1</month><year>2025</year></date></history>
<permissions><copyright-statement>2025 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.</copyright-statement><copyright-year>2025</copyright-year>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
<abstract>
<p>Loan behavior modeling is crucial in financial engineering. In particular, predicting loan prepayment based on large-scale historical time series data of massive customers is challenging. Existing approaches, such as logistic regression or nonparametric regression, could only model the direct relationship between the features and the prepayments. Motivated by extracting the hidden states of loan behavior, we propose the smoothing spline state space (QuadS) model based on a hidden Markov model with varying transition and emission matrices modeled by smoothing splines. In contrast to existing methods, our method benefits from capturing the loans’ unobserved state transitions, which not only increases prediction performances but also provides more interpretability. The overall model is learned by EM algorithm iterations, and within each iteration, smoothing splines are fitted with penalized least squares. Simulation studies demonstrate the effectiveness of the proposed method. Furthermore, a real-world case study using loan data from the Federal National Mortgage Association illustrates the practical applicability of our model. The QuadS model not only provides reliable predictions but also uncovers meaningful, hidden behavior patterns that can offer valuable insights for the financial industry.</p>
</abstract>
<kwd-group>
<label>Keywords</label>
<kwd>hidden Markov model</kwd>
<kwd>mortgage prepayment</kwd>
<kwd>nonparametric model</kwd>
<kwd>smoothing spline ANOVA</kwd>
</kwd-group>
<funding-group><funding-statement>This research was partially supported by supported by the U.S. National Science Foundation under grants NSF DMS-1925066, DMS-1903226, DMS-2124493, DMS-2311297, DMS-2319279, DMS-2318809, the U.S. National Institute of Health under grant R01GM152814.</funding-statement></funding-group>
</article-meta>
</front>
<back>
<ref-list id="j_jds1165_reflist_001">
<title>References</title>
<ref id="j_jds1165_ref_001">
<mixed-citation publication-type="other"> <string-name><surname>Agarwal</surname> <given-names>S</given-names></string-name>, <string-name><surname>Chomsisengphet</surname> <given-names>S</given-names></string-name>, <string-name><surname>Kiefer</surname> <given-names>H</given-names></string-name>, <string-name><surname>Kiefer</surname> <given-names>LC</given-names></string-name>, <string-name><surname>Medina</surname> <given-names>PC</given-names></string-name> (<year>2020</year>). Inequality during the COVID-19 pandemic: The case of savings from mortgage refinancing. <italic>Available at SSRN 3750133</italic>.</mixed-citation>
</ref>
<ref id="j_jds1165_ref_002">
<mixed-citation publication-type="journal"> <string-name><surname>Aldridge</surname> <given-names>I</given-names></string-name>, <string-name><surname>Avellaneda</surname> <given-names>M</given-names></string-name> (<year>2019</year>). <article-title>Neural networks in finance: Design and performance</article-title>. <source><italic>The Journal of Financial Data Science</italic></source>, <volume>1</volume>(<issue>4</issue>): <fpage>39</fpage>–<lpage>62</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.3905/jfds.2019.1.4.039" xlink:type="simple">https://doi.org/10.3905/jfds.2019.1.4.039</ext-link></mixed-citation>
</ref>
<ref id="j_jds1165_ref_003">
<mixed-citation publication-type="chapter"> <string-name><surname>Bengio</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Frasconi</surname> <given-names>P</given-names></string-name> (<year>1995</year>). <chapter-title>An input output HMM architecture</chapter-title>. In: <source><italic>Advances in Neural Information Processing Systems</italic></source>, volume <volume>7</volume>, <fpage>427</fpage>–<lpage>434</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1165_ref_004">
<mixed-citation publication-type="journal"> <string-name><surname>Bengio</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Frasconi</surname> <given-names>P</given-names></string-name> (<year>1996</year>). <article-title>Input-output HMMs for sequence processing</article-title>. <source><italic>IEEE Transactions on Neural Networks</italic></source>, <volume>7</volume>(<issue>5</issue>): <fpage>1231</fpage>–<lpage>1249</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/72.536317" xlink:type="simple">https://doi.org/10.1109/72.536317</ext-link></mixed-citation>
</ref>
<ref id="j_jds1165_ref_005">
<mixed-citation publication-type="other"> <string-name><surname>Berger</surname> <given-names>DW</given-names></string-name>, <string-name><surname>Milbradt</surname> <given-names>K</given-names></string-name>, <string-name><surname>Tourre</surname> <given-names>F</given-names></string-name>, <string-name><surname>Vavra</surname> <given-names>J</given-names></string-name> (<year>2018</year>). Mortgage prepayment and path-dependent effects of monetary policy. Technical report, <italic>National Bureau of Economic Research</italic>.</mixed-citation>
</ref>
<ref id="j_jds1165_ref_006">
<mixed-citation publication-type="chapter"> <string-name><surname>Fang</surname> <given-names>L</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Zhong</surname> <given-names>W</given-names></string-name>, <string-name><surname>Ma</surname> <given-names>P</given-names></string-name> (<year>2024</year>). <chapter-title>Bayesian knowledge distillation: A bayesian perspective of distillation with uncertainty quantification</chapter-title>. In: <source><italic>Forty-first International Conference on Machine Learning</italic></source>, <fpage>12935</fpage>–<lpage>12956</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1165_ref_007">
<mixed-citation publication-type="other"> <string-name><surname>Federal Housing Finance Agency</surname></string-name> (<year>2024</year>). Prepayment Monitoring Report: First Quarter 2024. Technical report, <italic>Federal Housing Finance Agency</italic>.</mixed-citation>
</ref>
<ref id="j_jds1165_ref_008">
<mixed-citation publication-type="other"> <string-name><surname>Freddie Mac</surname></string-name> (<year>2024</year>). Primary Mortgage Market Survey (PMMS).</mixed-citation>
</ref>
<ref id="j_jds1165_ref_009">
<mixed-citation publication-type="other"> <string-name><surname>Fuster</surname> <given-names>A</given-names></string-name>, <string-name><surname>Hizmo</surname> <given-names>A</given-names></string-name>, <string-name><surname>Lambie-Hanson</surname> <given-names>L</given-names></string-name>, <string-name><surname>Vickery</surname> <given-names>J</given-names></string-name>, <string-name><surname>Willen</surname> <given-names>PS</given-names></string-name> (<year>2021</year>). How resilient is mortgage credit supply? evidence from the COVID-19 pandemic. Technical Report, <italic>National Bureau of Economic Research</italic>.</mixed-citation>
</ref>
<ref id="j_jds1165_ref_010">
<mixed-citation publication-type="book"> <string-name><surname>Gu</surname> <given-names>C</given-names></string-name> (<year>2013</year>). <source><italic>Smoothing Spline ANOVA Models</italic></source>, volume <volume>297</volume>. <publisher-name>Springer</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1165_ref_011">
<mixed-citation publication-type="journal"> <string-name><surname>Gu</surname> <given-names>C</given-names></string-name> (<year>2014</year>). <article-title>Smoothing spline ANOVA models: R package gss</article-title>. <source><italic>Journal of Statistical Software</italic></source>, <volume>58</volume>: <fpage>1</fpage>–<lpage>25</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.18637/jss.v058.i05" xlink:type="simple">https://doi.org/10.18637/jss.v058.i05</ext-link></mixed-citation>
</ref>
<ref id="j_jds1165_ref_012">
<mixed-citation publication-type="journal"> <string-name><surname>Gu</surname> <given-names>C</given-names></string-name>, <string-name><surname>Ma</surname> <given-names>P</given-names></string-name> (<year>2005</year>). <article-title>Optimal smoothing in nonparametric mixed-effect models</article-title>. <source><italic>The Annals of Statistics</italic></source>, <volume>33</volume>(<issue>3</issue>): <fpage>1357</fpage>–<lpage>1379</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1214/009053605000000110" xlink:type="simple">https://doi.org/10.1214/009053605000000110</ext-link></mixed-citation>
</ref>
<ref id="j_jds1165_ref_013">
<mixed-citation publication-type="journal"> <string-name><surname>Gu</surname> <given-names>C</given-names></string-name>, <string-name><surname>Wahba</surname> <given-names>G</given-names></string-name> (<year>1991</year>). <article-title>Minimizing GCV/GML scores with multiple smoothing parameters via the Newton method</article-title>. <source><italic>SIAM Journal on Scientific and Statistical Computing</italic></source>, <volume>12</volume>(<issue>2</issue>): <fpage>383</fpage>–<lpage>398</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1137/0912021" xlink:type="simple">https://doi.org/10.1137/0912021</ext-link></mixed-citation>
</ref>
<ref id="j_jds1165_ref_014">
<mixed-citation publication-type="journal"> <string-name><surname>Guidotti</surname> <given-names>R</given-names></string-name>, <string-name><surname>Monreale</surname> <given-names>A</given-names></string-name>, <string-name><surname>Ruggieri</surname> <given-names>S</given-names></string-name>, <string-name><surname>Turini</surname> <given-names>F</given-names></string-name>, <string-name><surname>Giannotti</surname> <given-names>F</given-names></string-name>, <string-name><surname>Pedreschi</surname> <given-names>D</given-names></string-name> (<year>2018</year>). <article-title>A survey of methods for explaining black box models</article-title>. <source><italic>ACM Computing Surveys</italic></source>, <volume>51</volume>(<issue>5</issue>): <fpage>1</fpage>–<lpage>42</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1165_ref_015">
<mixed-citation publication-type="journal"> <string-name><surname>Helwig</surname> <given-names>NE</given-names></string-name>, <string-name><surname>Ma</surname> <given-names>P</given-names></string-name> (<year>2015</year>). <article-title>Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples</article-title>. <source><italic>Journal of Computational and Graphical Statistics</italic></source>, <volume>24</volume>(<issue>3</issue>): <fpage>715</fpage>–<lpage>732</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1080/10618600.2014.926819" xlink:type="simple">https://doi.org/10.1080/10618600.2014.926819</ext-link></mixed-citation>
</ref>
<ref id="j_jds1165_ref_016">
<mixed-citation publication-type="journal"> <string-name><surname>Johnson</surname> <given-names>K</given-names></string-name>, <string-name><surname>Pasquale</surname> <given-names>F</given-names></string-name>, <string-name><surname>Chapman</surname> <given-names>J</given-names></string-name> (<year>2019</year>). <article-title>Artificial intelligence, machine learning, and bias in finance: Toward responsible innovation</article-title>. <source><italic>Fordham Law Review</italic></source>, <volume>88</volume>: <fpage>499</fpage>.</mixed-citation>
</ref>
<ref id="j_jds1165_ref_017">
<mixed-citation publication-type="journal"> <string-name><surname>Kung</surname> <given-names>J-Y</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>C-C</given-names></string-name>, <string-name><surname>Hsu</surname> <given-names>S-Y</given-names></string-name>, <string-name><surname>Lee</surname> <given-names>S</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>C-W</given-names></string-name> (<year>2010</year>). <article-title>Application of logistic regression analysis of home mortgage loan prepayment and default risk</article-title>. <source><italic>ICIC Express Letters</italic></source>, <volume>4</volume>(<issue>2</issue>): <fpage>325</fpage>–<lpage>331</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1165_ref_018">
<mixed-citation publication-type="journal"> <string-name><surname>Lai</surname> <given-names>TL</given-names></string-name>, <string-name><surname>Su</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>KH</given-names></string-name> (<year>2014</year>). <article-title>Dynamic empirical bayes models and their applications to longitudinal data analysis and prediction</article-title>. <source><italic>Statistica Sinica</italic></source>, <volume>24</volume>(<issue>4</issue>): <fpage>1505</fpage>–<lpage>1528</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1165_ref_019">
<mixed-citation publication-type="journal"> <string-name><surname>Ma</surname> <given-names>P</given-names></string-name>, <string-name><surname>Huang</surname> <given-names>JZ</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>N</given-names></string-name> (<year>2015</year>). <article-title>Efficient computation of smoothing splines via adaptive basis sampling</article-title>. <source><italic>Biometrika</italic></source>, <volume>102</volume>(<issue>3</issue>): <fpage>631</fpage>–<lpage>645</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1093/biomet/asv009" xlink:type="simple">https://doi.org/10.1093/biomet/asv009</ext-link></mixed-citation>
</ref>
<ref id="j_jds1165_ref_020">
<mixed-citation publication-type="journal"> <string-name><surname>Maxam</surname> <given-names>CL</given-names></string-name>, <string-name><surname>LaCour-Little</surname> <given-names>M</given-names></string-name> (<year>2001</year>). <article-title>Applied nonparametric regression techniques: Estimating prepayments on fixed-rate mortgage-backed securities</article-title>. <source><italic>Journal of Real Estate Finance and Economics</italic></source>, <volume>23</volume>(<issue>2</issue>): <fpage>139</fpage>–<lpage>160</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1023/A:1011102332025" xlink:type="simple">https://doi.org/10.1023/A:1011102332025</ext-link></mixed-citation>
</ref>
<ref id="j_jds1165_ref_021">
<mixed-citation publication-type="book"> <string-name><surname>McLachlan</surname> <given-names>GJ</given-names></string-name>, <string-name><surname>Krishnan</surname> <given-names>T</given-names></string-name> (<year>2007</year>). <source><italic>The EM Algorithm and Extensions</italic></source>, volume <volume>382</volume>. <publisher-name>John Wiley &amp; Sons</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1165_ref_022">
<mixed-citation publication-type="journal"> <string-name><surname>Meng</surname> <given-names>C</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>J</given-names></string-name>, <string-name><surname>Zhong</surname> <given-names>W</given-names></string-name>, <string-name><surname>Ma</surname> <given-names>P</given-names></string-name> (<year>2020</year>). <article-title>More efficient approximation of smoothing splines via space-filling basis selection</article-title>. <source><italic>Biometrika</italic></source>, <volume>107</volume>(<issue>3</issue>): <fpage>723</fpage>–<lpage>735</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1093/biomet/asaa019" xlink:type="simple">https://doi.org/10.1093/biomet/asaa019</ext-link></mixed-citation>
</ref>
<ref id="j_jds1165_ref_023">
<mixed-citation publication-type="journal"> <string-name><surname>Ozbayoglu</surname> <given-names>AM</given-names></string-name>, <string-name><surname>Gudelek</surname> <given-names>MU</given-names></string-name>, <string-name><surname>Sezer</surname> <given-names>OB</given-names></string-name> (<year>2020</year>). <article-title>Deep learning for financial applications: A survey</article-title>. <source><italic>Applied Soft Computing</italic></source>, <volume>93</volume>: <fpage>106384</fpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.asoc.2020.106384" xlink:type="simple">https://doi.org/10.1016/j.asoc.2020.106384</ext-link></mixed-citation>
</ref>
<ref id="j_jds1165_ref_024">
<mixed-citation publication-type="other"> <string-name><surname>Sirignano</surname> <given-names>J</given-names></string-name>, <string-name><surname>Sadhwani</surname> <given-names>A</given-names></string-name>, <string-name><surname>Giesecke</surname> <given-names>K</given-names></string-name> (<year>2016</year>). Deep learning for mortgage risk. arXiv preprint: <uri>https://arxiv.org/abs/1607.02470</uri></mixed-citation>
</ref>
<ref id="j_jds1165_ref_025">
<mixed-citation publication-type="journal"> <string-name><surname>Sun</surname> <given-names>X</given-names></string-name>, <string-name><surname>Zhong</surname> <given-names>W</given-names></string-name>, <string-name><surname>Ma</surname> <given-names>P</given-names></string-name> (<year>2021</year>). <article-title>An asymptotic and empirical smoothing parameters selection method for smoothing spline ANOVA models in large samples</article-title>. <source><italic>Biometrika</italic></source>, <volume>108</volume>(<issue>1</issue>): <fpage>149</fpage>–<lpage>166</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1093/biomet/asaa047" xlink:type="simple">https://doi.org/10.1093/biomet/asaa047</ext-link></mixed-citation>
</ref>
<ref id="j_jds1165_ref_026">
<mixed-citation publication-type="book"> <string-name><surname>Van Deventer</surname> <given-names>DR</given-names></string-name>, <string-name><surname>Imai</surname> <given-names>K</given-names></string-name>, <string-name><surname>Mesler</surname> <given-names>M</given-names></string-name> (<year>2013</year>). <source><italic>Advanced Financial Risk Management: Tools and Techniques for Integrated Credit Risk and Interest Rate Risk Management</italic></source>. <publisher-name>John Wiley &amp; Sons</publisher-name>.</mixed-citation>
</ref>
</ref-list>
</back>
</article>
