Mortgage Prepayment Modeling via a Smoothing Spline State Space Model
Pub. online: 30 January 2025
Type: Statistical Data Science
Open Access
Received
25 August 2024
25 August 2024
Accepted
6 January 2025
6 January 2025
Published
30 January 2025
30 January 2025
Abstract
Loan behavior modeling is crucial in financial engineering. In particular, predicting loan prepayment based on large-scale historical time series data of massive customers is challenging. Existing approaches, such as logistic regression or nonparametric regression, could only model the direct relationship between the features and the prepayments. Motivated by extracting the hidden states of loan behavior, we propose the smoothing spline state space (QuadS) model based on a hidden Markov model with varying transition and emission matrices modeled by smoothing splines. In contrast to existing methods, our method benefits from capturing the loans’ unobserved state transitions, which not only increases prediction performances but also provides more interpretability. The overall model is learned by EM algorithm iterations, and within each iteration, smoothing splines are fitted with penalized least squares. Simulation studies demonstrate the effectiveness of the proposed method. Furthermore, a real-world case study using loan data from the Federal National Mortgage Association illustrates the practical applicability of our model. The QuadS model not only provides reliable predictions but also uncovers meaningful, hidden behavior patterns that can offer valuable insights for the financial industry.
Supplementary material
Supplementary MaterialSome details of the EM algorithm for QuadS are provided in Appendix A. The code and instructions of the QuadS method are available on GitHub (https://github.com/haoranlustat/QuadS). The dataset used in the case study is publicly available from Fannie Mae Data Dynamics (https://capitalmarkets.fanniemae.com/tools-applications/data-dynamics).
References
Aldridge I, Avellaneda M (2019). Neural networks in finance: Design and performance. The Journal of Financial Data Science, 1(4): 39–62. https://doi.org/10.3905/jfds.2019.1.4.039
Bengio Y, Frasconi P (1996). Input-output HMMs for sequence processing. IEEE Transactions on Neural Networks, 7(5): 1231–1249. https://doi.org/10.1109/72.536317
Gu C (2014). Smoothing spline ANOVA models: R package gss. Journal of Statistical Software, 58: 1–25. https://doi.org/10.18637/jss.v058.i05
Gu C, Ma P (2005). Optimal smoothing in nonparametric mixed-effect models. The Annals of Statistics, 33(3): 1357–1379. https://doi.org/10.1214/009053605000000110
Gu C, Wahba G (1991). Minimizing GCV/GML scores with multiple smoothing parameters via the Newton method. SIAM Journal on Scientific and Statistical Computing, 12(2): 383–398. https://doi.org/10.1137/0912021
Helwig NE, Ma P (2015). Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. Journal of Computational and Graphical Statistics, 24(3): 715–732. https://doi.org/10.1080/10618600.2014.926819
Ma P, Huang JZ, Zhang N (2015). Efficient computation of smoothing splines via adaptive basis sampling. Biometrika, 102(3): 631–645. https://doi.org/10.1093/biomet/asv009
Maxam CL, LaCour-Little M (2001). Applied nonparametric regression techniques: Estimating prepayments on fixed-rate mortgage-backed securities. Journal of Real Estate Finance and Economics, 23(2): 139–160. https://doi.org/10.1023/A:1011102332025
Meng C, Zhang X, Zhang J, Zhong W, Ma P (2020). More efficient approximation of smoothing splines via space-filling basis selection. Biometrika, 107(3): 723–735. https://doi.org/10.1093/biomet/asaa019
Ozbayoglu AM, Gudelek MU, Sezer OB (2020). Deep learning for financial applications: A survey. Applied Soft Computing, 93: 106384. https://doi.org/10.1016/j.asoc.2020.106384
Sirignano J, Sadhwani A, Giesecke K (2016). Deep learning for mortgage risk. arXiv preprint: https://arxiv.org/abs/1607.02470
Sun X, Zhong W, Ma P (2021). An asymptotic and empirical smoothing parameters selection method for smoothing spline ANOVA models in large samples. Biometrika, 108(1): 149–166. https://doi.org/10.1093/biomet/asaa047