<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">JDS</journal-id>
<journal-title-group><journal-title>Journal of Data Science</journal-title></journal-title-group>
<issn pub-type="epub">1683-8602</issn><issn pub-type="ppub">1680-743X</issn><issn-l>1680-743X</issn-l>
<publisher>
<publisher-name>School of Statistics, Renmin University of China</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">JDS1164</article-id>
<article-id pub-id-type="doi">10.6339/25-JDS1164</article-id>
<article-categories><subj-group subj-group-type="heading">
<subject>Statistical Data Science</subject></subj-group></article-categories>
<title-group>
<article-title>An Innovative Method of Singular Spectrum Analysis to Conduct Gap-filling and Denoising on Time Series Data</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0003-0842-7933</contrib-id>
<name><surname>Yang</surname><given-names>James J.</given-names></name><email xlink:href="mailto:James.J.Yang@uth.tmc.edu">James.J.Yang@uth.tmc.edu</email><xref ref-type="aff" rid="j_jds1164_aff_001">1</xref><xref ref-type="corresp" rid="cor1">∗</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Buu</surname><given-names>Anne</given-names></name><xref ref-type="aff" rid="j_jds1164_aff_002">2</xref>
</contrib>
<aff id="j_jds1164_aff_001"><label>1</label>Department of Biostatistics and Data Science, <institution>University of Texas Health Science Center at Houston</institution>, <country>U.S.A.</country></aff>
<aff id="j_jds1164_aff_002"><label>2</label>Department of Health Promotion and Behavioral Sciences, <institution>University of Texas Health Science Center at Houston</institution>, <country>U.S.A.</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>∗</label>Corresponding author. Email: <ext-link ext-link-type="uri" xlink:href="mailto:James.J.Yang@uth.tmc.edu">James.J.Yang@uth.tmc.edu</ext-link>.</corresp>
</author-notes>
<pub-date pub-type="ppub"><year>2025</year></pub-date><pub-date pub-type="epub"><day>28</day><month>1</month><year>2025</year></pub-date><volume content-type="ahead-of-print">0</volume><issue>0</issue><fpage>1</fpage><lpage>13</lpage><supplementary-material id="S1" content-type="archive" xlink:href="jds1164_s001.zip" mimetype="application" mime-subtype="x-zip-compressed">
<caption>
<title>Supplementary Material</title>
<p>The supplementary material includes the following files: (1) <monospace>README.md</monospace>, a brief explanation of all the files in the supplementary material; (2) <monospace>HR.csv</monospace>, the application dataset; (3) <monospace>GapFilling.jl</monospace>, the Julia module implementing the proposed method; and (4) <monospace>main.jl</monospace>, the demo program.</p>
</caption>
</supplementary-material><history><date date-type="received"><day>28</day><month>6</month><year>2024</year></date><date date-type="accepted"><day>3</day><month>1</month><year>2025</year></date></history>
<permissions><copyright-statement>2025 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.</copyright-statement><copyright-year>2025</copyright-year>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
<abstract>
<p>Heart rate data collected from wearable devices – one type of time series data – could provide insights into activities, stress levels, and health. Yet, consecutive missing segments (i.e., gaps) that commonly occur due to improper device placement or device malfunction could distort the temporal patterns inherent in the data and undermine the validity of downstream analyses. This study proposes an innovative iterative procedure to fill gaps in time series data that capitalizes on the denoising capability of Singular Spectrum Analysis (SSA) and eliminates SSA’s requirement of pre-specifying the window length and number of groups. The results of simulations demonstrate that the performance of SSA-based gap-filling methods depends on the choice of window length, number of groups, and the percentage of missing values. In contrast, the proposed method consistently achieves the lowest rates of reconstruction error and gap-filling error across a variety of combinations of the factors manipulated in the simulations. The simulation findings also highlight that the commonly recommended long window length – half of the time series length – may not apply to time series with varying frequencies such as heart rate data. The initialization step of the proposed method that involves a large window length and the first four singular values in the iterative singular value decomposition process not only avoids convergence issues but also facilitates imputation accuracy in subsequent iterations. The proposed method provides the flexibility for researchers to conduct gap-filling solely or in combination with denoising on time series data and thus widens the applications.</p>
</abstract>
<kwd-group>
<label>Keywords</label>
<kwd>heart rate</kwd>
<kwd>imputation</kwd>
<kwd>missing data</kwd>
<kwd>wearable device</kwd>
</kwd-group>
<funding-group><funding-statement>This research was supported by a grant funded by the National Institutes of Health (NIH): R01 DA049154 to A. Buu. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The authors declare no conflict of interest.</funding-statement></funding-group>
</article-meta>
</front>
<back>
<ref-list id="j_jds1164_reflist_001">
<title>References</title>
<ref id="j_jds1164_ref_001">
<mixed-citation publication-type="journal"> <string-name><surname>Bose</surname> <given-names>A</given-names></string-name>, <string-name><surname>Mitra</surname> <given-names>J</given-names></string-name> (<year>2002</year>). <article-title>Limiting spectral distribution of a special circulant</article-title>. <source><italic>Statistics &amp; Probability Letters</italic></source>, <volume>60</volume>(<issue>1</issue>): <fpage>111</fpage>–<lpage>120</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/S0167-7152(02)00289-4" xlink:type="simple">https://doi.org/10.1016/S0167-7152(02)00289-4</ext-link></mixed-citation>
</ref>
<ref id="j_jds1164_ref_002">
<mixed-citation publication-type="journal"> <string-name><surname>Bryc</surname> <given-names>W</given-names></string-name>, <string-name><surname>Dembo</surname> <given-names>A</given-names></string-name>, <string-name><surname>Jiang</surname> <given-names>T</given-names></string-name> (<year>2006</year>). <article-title>Spectral measure of large random Hankel, Markov and Toeplitz matrices</article-title>. <source><italic>Annals of Probability</italic></source>, <volume>34</volume>(<issue>1</issue>): <fpage>1</fpage>–<lpage>38</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1214/009117905000000495" xlink:type="simple">https://doi.org/10.1214/009117905000000495</ext-link></mixed-citation>
</ref>
<ref id="j_jds1164_ref_003">
<mixed-citation publication-type="journal"> <string-name><surname>Caussinus</surname> <given-names>H</given-names></string-name> (<year>1986</year>a). <article-title>Models and uses of principal component analysis</article-title>. <source><italic>Multidimensional Data Analysis</italic></source>, <volume>86</volume>: <fpage>149</fpage>–<lpage>170</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1164_ref_004">
<mixed-citation publication-type="journal"> <string-name><surname>Caussinus</surname> <given-names>H</given-names></string-name> (<year>1986</year>b). <article-title>Models and uses of principal component analysis</article-title>. <source><italic>Multidimensional Data Analysis</italic></source>, <volume>86</volume>: <fpage>149</fpage>–<lpage>170</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1164_ref_005">
<mixed-citation publication-type="journal"> <string-name><surname>Dempster</surname> <given-names>AP</given-names></string-name>, <string-name><surname>Laird</surname> <given-names>NM</given-names></string-name>, <string-name><surname>Rubin</surname> <given-names>DB</given-names></string-name> (<year>1977</year>). <article-title>Maximum likelihood from incomplete data via the EM algorithm</article-title>. <source><italic>Journal of the Royal Statistical Society, Series B, Methodological</italic></source>, <volume>39</volume>(<issue>1</issue>): <fpage>1</fpage>–<lpage>22</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1111/j.2517-6161.1977.tb01600.x" xlink:type="simple">https://doi.org/10.1111/j.2517-6161.1977.tb01600.x</ext-link></mixed-citation>
</ref>
<ref id="j_jds1164_ref_006">
<mixed-citation publication-type="other"> <string-name><surname>Fang</surname> <given-names>C</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>C</given-names></string-name> (<year>2020</year>). Time series data imputation: A survey on deep learning approaches. arXiv preprint: <uri>https://arxiv.org/abs/2011.11347</uri></mixed-citation>
</ref>
<ref id="j_jds1164_ref_007">
<mixed-citation publication-type="journal"> <string-name><surname>Fu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Fan</surname> <given-names>X</given-names></string-name>, <string-name><surname>Cao</surname> <given-names>J</given-names></string-name> (<year>2024</year>). <article-title>An imputation method based on the varimax variant of multivariate singular spectrum analysis</article-title>. <source><italic>IEEE Access</italic></source>, <volume>12</volume>: <fpage>127749</fpage>–<lpage>127767</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/ACCESS.2024.3429292" xlink:type="simple">https://doi.org/10.1109/ACCESS.2024.3429292</ext-link></mixed-citation>
</ref>
<ref id="j_jds1164_ref_008">
<mixed-citation publication-type="journal"> <string-name><surname>Golyandina</surname> <given-names>N</given-names></string-name> (<year>2010</year>). <article-title>On the choice of parameters in singular spectrum analysis and related subspace-based methods</article-title>. <source><italic>Statistics and its Interface</italic></source>, <volume>3</volume>(<issue>3</issue>): <fpage>259</fpage>–<lpage>279</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.4310/SII.2010.v3.n3.a2" xlink:type="simple">https://doi.org/10.4310/SII.2010.v3.n3.a2</ext-link></mixed-citation>
</ref>
<ref id="j_jds1164_ref_009">
<mixed-citation publication-type="journal"> <string-name><surname>Golyandina</surname> <given-names>N</given-names></string-name> (<year>2020</year>). <article-title>Particularities and commonalities of singular spectrum analysis as a method of time series analysis and signal processing</article-title>. <source><italic>Wiley Interdisciplinary Reviews: Computational Statistics</italic></source>, <volume>12</volume>(<issue>4</issue>): <fpage>e1487</fpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1002/wics.1487" xlink:type="simple">https://doi.org/10.1002/wics.1487</ext-link></mixed-citation>
</ref>
<ref id="j_jds1164_ref_010">
<mixed-citation publication-type="book"> <string-name><surname>Golyandina</surname> <given-names>N</given-names></string-name>, <string-name><surname>Korobeynikov</surname> <given-names>A</given-names></string-name>, <string-name><surname>Zhigljavsky</surname> <given-names>A</given-names></string-name> (<year>2018</year>). <source><italic>Singular Spectrum Analysis with R</italic></source>. <publisher-name>Springer Berlin</publisher-name>, <publisher-loc>Heidelberg</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1164_ref_011">
<mixed-citation publication-type="book"> <string-name><surname>Golyandina</surname> <given-names>N</given-names></string-name>, <string-name><surname>Nekrutkin</surname> <given-names>V</given-names></string-name>, <string-name><surname>Zhigljavsky</surname> <given-names>AA</given-names></string-name> (<year>2001</year>). <source><italic>Analysis of Time Series Structure: SSA and Related Techniques</italic></source>. <publisher-name>CRC Press</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1164_ref_012">
<mixed-citation publication-type="book"> <string-name><surname>Golyandina</surname> <given-names>N</given-names></string-name>, <string-name><surname>Zhigljavsky</surname> <given-names>A</given-names></string-name> (<year>2020</year>). <source><italic>Singular Spectrum Analysis for Time Series</italic></source>, <edition>2</edition>nd edition. <publisher-name>Springer Berlin</publisher-name>, <publisher-loc>Heidelberg</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1164_ref_013">
<mixed-citation publication-type="journal"> <string-name><surname>Groth</surname> <given-names>A</given-names></string-name>, <string-name><surname>Ghil</surname> <given-names>M</given-names></string-name> (<year>2011</year>). <article-title>Multivariate singular spectrum analysis and the road to phase synchronization</article-title>. <source><italic>Physical Review E, Statistical, Nonlinear, and Soft Matter Physics</italic></source>, <volume>84</volume>(<issue>3</issue>): <fpage>036206</fpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1103/PhysRevE.84.036206" xlink:type="simple">https://doi.org/10.1103/PhysRevE.84.036206</ext-link></mixed-citation>
</ref>
<ref id="j_jds1164_ref_014">
<mixed-citation publication-type="journal"> <string-name><surname>Hassani</surname> <given-names>H</given-names></string-name>, <string-name><surname>Kalantari</surname> <given-names>M</given-names></string-name>, <string-name><surname>Ghodsi</surname> <given-names>Z</given-names></string-name> (<year>2019</year>). <article-title>Evaluating the performance of multiple imputation methods for handling missing values in time series data: A study focused on East Africa, soil-carbonate-stable isotope data</article-title>. <source><italic>Stats</italic></source>, <volume>2</volume>(<issue>4</issue>): <fpage>457</fpage>–<lpage>467</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.3390/stats2040032" xlink:type="simple">https://doi.org/10.3390/stats2040032</ext-link></mixed-citation>
</ref>
<ref id="j_jds1164_ref_015">
<mixed-citation publication-type="journal"> <string-name><surname>Indic</surname> <given-names>P</given-names></string-name>, <string-name><surname>Murray</surname> <given-names>G</given-names></string-name>, <string-name><surname>Maggini</surname> <given-names>C</given-names></string-name>, <string-name><surname>Amore</surname> <given-names>M</given-names></string-name>, <string-name><surname>Meschi</surname> <given-names>T</given-names></string-name>, <string-name><surname>Borghi</surname> <given-names>L</given-names></string-name>, <etal>et al.</etal> (<year>2012</year>). <article-title>Multi-scale motility amplitude associated with suicidal thoughts in major depression</article-title>. <source><italic>PLoS ONE</italic></source>, <volume>7</volume>(<issue>6</issue>): <fpage>e38761</fpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1371/journal.pone.0038761" xlink:type="simple">https://doi.org/10.1371/journal.pone.0038761</ext-link></mixed-citation>
</ref>
<ref id="j_jds1164_ref_016">
<mixed-citation publication-type="journal"> <string-name><surname>Indic</surname> <given-names>P</given-names></string-name>, <string-name><surname>Salvatore</surname> <given-names>P</given-names></string-name>, <string-name><surname>Maggini</surname> <given-names>C</given-names></string-name>, <string-name><surname>Ghidini</surname> <given-names>S</given-names></string-name>, <string-name><surname>Ferraro</surname> <given-names>G</given-names></string-name>, <string-name><surname>Baldessarini</surname> <given-names>RJ</given-names></string-name>, <etal>et al.</etal> (<year>2011</year>). <article-title>Scaling behavior of human locomotor activity amplitude: Association with bipolar disorder</article-title>. <source><italic>PLoS ONE</italic></source>, <volume>6</volume>(<issue>5</issue>): <fpage>e20650</fpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1371/journal.pone.0020650" xlink:type="simple">https://doi.org/10.1371/journal.pone.0020650</ext-link></mixed-citation>
</ref>
<ref id="j_jds1164_ref_017">
<mixed-citation publication-type="journal"> <string-name><surname>Ji</surname> <given-names>K</given-names></string-name>, <string-name><surname>Shen</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>F</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>Q</given-names></string-name> (<year>2025</year>). <article-title>An efficient improved singular spectrum analysis for processing GNSS position time series with missing data</article-title>. <source><italic>Geophysical Journal International</italic></source>, <volume>240</volume>(<issue>1</issue>): <fpage>189</fpage>–<lpage>200</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1093/gji/ggae381" xlink:type="simple">https://doi.org/10.1093/gji/ggae381</ext-link></mixed-citation>
</ref>
<ref id="j_jds1164_ref_018">
<mixed-citation publication-type="journal"> <string-name><surname>Kondrashov</surname> <given-names>D</given-names></string-name>, <string-name><surname>Ghil</surname> <given-names>M</given-names></string-name> (<year>2006</year>). <article-title>Spatio-temporal filling of missing points in geophysical data sets</article-title>. <source><italic>Nonlinear Processes in Geophysics</italic></source>, <volume>13</volume>(<issue>2</issue>): <fpage>151</fpage>–<lpage>159</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.5194/npg-13-151-2006" xlink:type="simple">https://doi.org/10.5194/npg-13-151-2006</ext-link></mixed-citation>
</ref>
<ref id="j_jds1164_ref_019">
<mixed-citation publication-type="chapter"> <string-name><surname>Miao</surname> <given-names>W</given-names></string-name>, <string-name><surname>Gel</surname> <given-names>YR</given-names></string-name>, <string-name><surname>Gastwirth</surname> <given-names>JL</given-names></string-name> (<year>2006</year>). <chapter-title>A new test of symmetry about an unknown median</chapter-title>. In: <string-name><surname>Hsiung</surname> <given-names>AC</given-names></string-name>, <string-name><surname>Ying</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>CH</given-names></string-name> (eds.), <source><italic>Random Walk, Sequential Analysis and Related Topics: A Festschrift in Honor of Yuan-Shih Chow</italic></source>, <fpage>199</fpage>–<lpage>214</lpage>. <publisher-name>World Scientific</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1164_ref_020">
<mixed-citation publication-type="book"> <string-name><surname>Sanei</surname> <given-names>S</given-names></string-name>, <string-name><surname>Hassani</surname> <given-names>H</given-names></string-name> (<year>2015</year>). <source><italic>Singular Spectrum Analysis of Biomedical Signals</italic></source>. <publisher-name>CRC Press</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1164_ref_021">
<mixed-citation publication-type="chapter"> <string-name><surname>Wu</surname> <given-names>X</given-names></string-name>, <string-name><surname>Mattingly</surname> <given-names>S</given-names></string-name>, <string-name><surname>Mirjafari</surname> <given-names>S</given-names></string-name>, <string-name><surname>Huang</surname> <given-names>C</given-names></string-name>, <string-name><surname>Chawla</surname> <given-names>NV</given-names></string-name> (<year>2020</year>). <chapter-title>Personalized imputation on wearable-sensory time series via knowledge transfer</chapter-title>. In: <source><italic>Proceedings of the 29th ACM International Conference on Information &amp; Knowledge Management</italic></source>, <fpage>1625</fpage>–<lpage>1634</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1164_ref_022">
<mixed-citation publication-type="journal"> <string-name><surname>Yang</surname> <given-names>JJ</given-names></string-name>, <string-name><surname>Piper</surname> <given-names>ME</given-names></string-name>, <string-name><surname>Indic</surname> <given-names>P</given-names></string-name>, <string-name><surname>Buu</surname> <given-names>A</given-names></string-name> (<year>2024</year>). <article-title>Statistical methods for predicting e-cigarette use events based on beat-to-beat interval (BBI) data collected from wearable devices</article-title>. <source><italic>Statistics in Medicine</italic></source>, <volume>43</volume>(<issue>17</issue>): <fpage>3227</fpage>–<lpage>3238</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1002/sim.10124" xlink:type="simple">https://doi.org/10.1002/sim.10124</ext-link></mixed-citation>
</ref>
</ref-list>
</back>
</article>
