An Innovative Method of Singular Spectrum Analysis to Conduct Gap-filling and Denoising on Time Series Data
Pub. online: 28 January 2025
Type: Statistical Data Science
Open Access
Received
28 June 2024
28 June 2024
Accepted
3 January 2025
3 January 2025
Published
28 January 2025
28 January 2025
Abstract
Heart rate data collected from wearable devices – one type of time series data – could provide insights into activities, stress levels, and health. Yet, consecutive missing segments (i.e., gaps) that commonly occur due to improper device placement or device malfunction could distort the temporal patterns inherent in the data and undermine the validity of downstream analyses. This study proposes an innovative iterative procedure to fill gaps in time series data that capitalizes on the denoising capability of Singular Spectrum Analysis (SSA) and eliminates SSA’s requirement of pre-specifying the window length and number of groups. The results of simulations demonstrate that the performance of SSA-based gap-filling methods depends on the choice of window length, number of groups, and the percentage of missing values. In contrast, the proposed method consistently achieves the lowest rates of reconstruction error and gap-filling error across a variety of combinations of the factors manipulated in the simulations. The simulation findings also highlight that the commonly recommended long window length – half of the time series length – may not apply to time series with varying frequencies such as heart rate data. The initialization step of the proposed method that involves a large window length and the first four singular values in the iterative singular value decomposition process not only avoids convergence issues but also facilitates imputation accuracy in subsequent iterations. The proposed method provides the flexibility for researchers to conduct gap-filling solely or in combination with denoising on time series data and thus widens the applications.
Supplementary material
Supplementary MaterialThe supplementary material includes the following files: (1) README.md , a brief explanation of all the files in the supplementary material; (2) HR.csv , the application dataset; (3) GapFilling.jl , the Julia module implementing the proposed method; and (4) main.jl , the demo program.
References
Bose A, Mitra J (2002). Limiting spectral distribution of a special circulant. Statistics & Probability Letters, 60(1): 111–120. https://doi.org/10.1016/S0167-7152(02)00289-4
Bryc W, Dembo A, Jiang T (2006). Spectral measure of large random Hankel, Markov and Toeplitz matrices. Annals of Probability, 34(1): 1–38. https://doi.org/10.1214/009117905000000495
Dempster AP, Laird NM, Rubin DB (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, Methodological, 39(1): 1–22. https://doi.org/10.1111/j.2517-6161.1977.tb01600.x
Fang C, Wang C (2020). Time series data imputation: A survey on deep learning approaches. arXiv preprint: https://arxiv.org/abs/2011.11347
Fu Y, Fan X, Cao J (2024). An imputation method based on the varimax variant of multivariate singular spectrum analysis. IEEE Access, 12: 127749–127767. https://doi.org/10.1109/ACCESS.2024.3429292
Golyandina N (2010). On the choice of parameters in singular spectrum analysis and related subspace-based methods. Statistics and its Interface, 3(3): 259–279. https://doi.org/10.4310/SII.2010.v3.n3.a2
Golyandina N (2020). Particularities and commonalities of singular spectrum analysis as a method of time series analysis and signal processing. Wiley Interdisciplinary Reviews: Computational Statistics, 12(4): e1487. https://doi.org/10.1002/wics.1487
Groth A, Ghil M (2011). Multivariate singular spectrum analysis and the road to phase synchronization. Physical Review E, Statistical, Nonlinear, and Soft Matter Physics, 84(3): 036206. https://doi.org/10.1103/PhysRevE.84.036206
Hassani H, Kalantari M, Ghodsi Z (2019). Evaluating the performance of multiple imputation methods for handling missing values in time series data: A study focused on East Africa, soil-carbonate-stable isotope data. Stats, 2(4): 457–467. https://doi.org/10.3390/stats2040032
Indic P, Murray G, Maggini C, Amore M, Meschi T, Borghi L, et al. (2012). Multi-scale motility amplitude associated with suicidal thoughts in major depression. PLoS ONE, 7(6): e38761. https://doi.org/10.1371/journal.pone.0038761
Indic P, Salvatore P, Maggini C, Ghidini S, Ferraro G, Baldessarini RJ, et al. (2011). Scaling behavior of human locomotor activity amplitude: Association with bipolar disorder. PLoS ONE, 6(5): e20650. https://doi.org/10.1371/journal.pone.0020650
Ji K, Shen Y, Wang F, Chen Q (2025). An efficient improved singular spectrum analysis for processing GNSS position time series with missing data. Geophysical Journal International, 240(1): 189–200. https://doi.org/10.1093/gji/ggae381
Kondrashov D, Ghil M (2006). Spatio-temporal filling of missing points in geophysical data sets. Nonlinear Processes in Geophysics, 13(2): 151–159. https://doi.org/10.5194/npg-13-151-2006
Yang JJ, Piper ME, Indic P, Buu A (2024). Statistical methods for predicting e-cigarette use events based on beat-to-beat interval (BBI) data collected from wearable devices. Statistics in Medicine, 43(17): 3227–3238. https://doi.org/10.1002/sim.10124