Journal of Data Science logo


Login Register

  1. Home
  2. To appear
  3. An Innovative Method of Singular Spectru ...

Journal of Data Science

Submit your article Information
  • Article info
  • Related articles
  • More
    Article info Related articles

An Innovative Method of Singular Spectrum Analysis to Conduct Gap-filling and Denoising on Time Series Data
James J. Yang ORCID icon link to view author James J. Yang details   Anne Buu  

Authors

 
Placeholder
https://doi.org/10.6339/25-JDS1164
Pub. online: 28 January 2025      Type: Statistical Data Science      Open accessOpen Access

Received
28 June 2024
Accepted
3 January 2025
Published
28 January 2025

Abstract

Heart rate data collected from wearable devices – one type of time series data – could provide insights into activities, stress levels, and health. Yet, consecutive missing segments (i.e., gaps) that commonly occur due to improper device placement or device malfunction could distort the temporal patterns inherent in the data and undermine the validity of downstream analyses. This study proposes an innovative iterative procedure to fill gaps in time series data that capitalizes on the denoising capability of Singular Spectrum Analysis (SSA) and eliminates SSA’s requirement of pre-specifying the window length and number of groups. The results of simulations demonstrate that the performance of SSA-based gap-filling methods depends on the choice of window length, number of groups, and the percentage of missing values. In contrast, the proposed method consistently achieves the lowest rates of reconstruction error and gap-filling error across a variety of combinations of the factors manipulated in the simulations. The simulation findings also highlight that the commonly recommended long window length – half of the time series length – may not apply to time series with varying frequencies such as heart rate data. The initialization step of the proposed method that involves a large window length and the first four singular values in the iterative singular value decomposition process not only avoids convergence issues but also facilitates imputation accuracy in subsequent iterations. The proposed method provides the flexibility for researchers to conduct gap-filling solely or in combination with denoising on time series data and thus widens the applications.

Supplementary material

 Supplementary Material
The supplementary material includes the following files: (1) README.md, a brief explanation of all the files in the supplementary material; (2) HR.csv, the application dataset; (3) GapFilling.jl, the Julia module implementing the proposed method; and (4) main.jl, the demo program.

References

 
Bose A, Mitra J (2002). Limiting spectral distribution of a special circulant. Statistics & Probability Letters, 60(1): 111–120. https://doi.org/10.1016/S0167-7152(02)00289-4
 
Bryc W, Dembo A, Jiang T (2006). Spectral measure of large random Hankel, Markov and Toeplitz matrices. Annals of Probability, 34(1): 1–38. https://doi.org/10.1214/009117905000000495
 
Caussinus H (1986a). Models and uses of principal component analysis. Multidimensional Data Analysis, 86: 149–170.
 
Caussinus H (1986b). Models and uses of principal component analysis. Multidimensional Data Analysis, 86: 149–170.
 
Dempster AP, Laird NM, Rubin DB (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, Methodological, 39(1): 1–22. https://doi.org/10.1111/j.2517-6161.1977.tb01600.x
 
Fang C, Wang C (2020). Time series data imputation: A survey on deep learning approaches. arXiv preprint: https://arxiv.org/abs/2011.11347
 
Fu Y, Fan X, Cao J (2024). An imputation method based on the varimax variant of multivariate singular spectrum analysis. IEEE Access, 12: 127749–127767. https://doi.org/10.1109/ACCESS.2024.3429292
 
Golyandina N (2010). On the choice of parameters in singular spectrum analysis and related subspace-based methods. Statistics and its Interface, 3(3): 259–279. https://doi.org/10.4310/SII.2010.v3.n3.a2
 
Golyandina N (2020). Particularities and commonalities of singular spectrum analysis as a method of time series analysis and signal processing. Wiley Interdisciplinary Reviews: Computational Statistics, 12(4): e1487. https://doi.org/10.1002/wics.1487
 
Golyandina N, Korobeynikov A, Zhigljavsky A (2018). Singular Spectrum Analysis with R. Springer Berlin, Heidelberg.
 
Golyandina N, Nekrutkin V, Zhigljavsky AA (2001). Analysis of Time Series Structure: SSA and Related Techniques. CRC Press.
 
Golyandina N, Zhigljavsky A (2020). Singular Spectrum Analysis for Time Series, 2nd edition. Springer Berlin, Heidelberg.
 
Groth A, Ghil M (2011). Multivariate singular spectrum analysis and the road to phase synchronization. Physical Review E, Statistical, Nonlinear, and Soft Matter Physics, 84(3): 036206. https://doi.org/10.1103/PhysRevE.84.036206
 
Hassani H, Kalantari M, Ghodsi Z (2019). Evaluating the performance of multiple imputation methods for handling missing values in time series data: A study focused on East Africa, soil-carbonate-stable isotope data. Stats, 2(4): 457–467. https://doi.org/10.3390/stats2040032
 
Indic P, Murray G, Maggini C, Amore M, Meschi T, Borghi L, et al. (2012). Multi-scale motility amplitude associated with suicidal thoughts in major depression. PLoS ONE, 7(6): e38761. https://doi.org/10.1371/journal.pone.0038761
 
Indic P, Salvatore P, Maggini C, Ghidini S, Ferraro G, Baldessarini RJ, et al. (2011). Scaling behavior of human locomotor activity amplitude: Association with bipolar disorder. PLoS ONE, 6(5): e20650. https://doi.org/10.1371/journal.pone.0020650
 
Ji K, Shen Y, Wang F, Chen Q (2025). An efficient improved singular spectrum analysis for processing GNSS position time series with missing data. Geophysical Journal International, 240(1): 189–200. https://doi.org/10.1093/gji/ggae381
 
Kondrashov D, Ghil M (2006). Spatio-temporal filling of missing points in geophysical data sets. Nonlinear Processes in Geophysics, 13(2): 151–159. https://doi.org/10.5194/npg-13-151-2006
 
Miao W, Gel YR, Gastwirth JL (2006). A new test of symmetry about an unknown median. In: Hsiung AC, Ying Z, Zhang CH (eds.), Random Walk, Sequential Analysis and Related Topics: A Festschrift in Honor of Yuan-Shih Chow, 199–214. World Scientific.
 
Sanei S, Hassani H (2015). Singular Spectrum Analysis of Biomedical Signals. CRC Press.
 
Wu X, Mattingly S, Mirjafari S, Huang C, Chawla NV (2020). Personalized imputation on wearable-sensory time series via knowledge transfer. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 1625–1634.
 
Yang JJ, Piper ME, Indic P, Buu A (2024). Statistical methods for predicting e-cigarette use events based on beat-to-beat interval (BBI) data collected from wearable devices. Statistics in Medicine, 43(17): 3227–3238. https://doi.org/10.1002/sim.10124

Related articles PDF XML
Related articles PDF XML

Copyright
2025 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
heart rate imputation missing data wearable device

Funding
This research was supported by a grant funded by the National Institutes of Health (NIH): R01 DA049154 to A. Buu. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The authors declare no conflict of interest.

Metrics
since February 2021
149

Article info
views

34

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy