<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">JDS</journal-id>
<journal-title-group><journal-title>Journal of Data Science</journal-title></journal-title-group>
<issn pub-type="epub">1683-8602</issn><issn pub-type="ppub">1680-743X</issn><issn-l>1680-743X</issn-l>
<publisher>
<publisher-name>School of Statistics, Renmin University of China</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">JDS1135</article-id>
<article-id pub-id-type="doi">10.6339/24-JDS1135</article-id>
<article-categories><subj-group subj-group-type="heading">
<subject>Statistical Data Science</subject></subj-group></article-categories>
<title-group>
<article-title>Predictive Mean Matching Imputation Procedure Based on Machine Learning Models for Complex Survey Data</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Chen</surname><given-names>Sixia</given-names></name><email xlink:href="mailto:sixia-chen@ouhsc.edu">sixia-chen@ouhsc.edu</email><xref ref-type="aff" rid="j_jds1135_aff_001">1</xref><xref ref-type="corresp" rid="cor1">∗</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Xu</surname><given-names>Chao</given-names></name><email xlink:href="mailto:chao-xu@ouhsc.edu">chao-xu@ouhsc.edu</email><xref ref-type="aff" rid="j_jds1135_aff_001">1</xref>
</contrib>
<aff id="j_jds1135_aff_001"><label>1</label><institution>University of Oklahoma Health Sciences Center</institution>, 801 NE 13th ST, Oklahoma City, OK, 73104, <country>United States</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>∗</label>Corresponding author. Email: <ext-link ext-link-type="uri" xlink:href="mailto:sixia-chen@ouhsc.edu">sixia-chen@ouhsc.edu</ext-link> or <ext-link ext-link-type="uri" xlink:href="mailto:chao-xu@ouhsc.edu">chao-xu@ouhsc.edu</ext-link>.</corresp>
</author-notes>
<pub-date pub-type="ppub"><year>2024</year></pub-date><pub-date pub-type="epub"><day>10</day><month>7</month><year>2024</year></pub-date><volume>22</volume><issue>3</issue><fpage>456</fpage><lpage>468</lpage><history><date date-type="received"><day>22</day><month>11</month><year>2023</year></date><date date-type="accepted"><day>16</day><month>4</month><year>2024</year></date></history>
<permissions><copyright-statement>2024 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.</copyright-statement><copyright-year>2024</copyright-year>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
<abstract>
<p>Missing data is a common occurrence in various fields, spanning social science, education, economics, and biomedical research. Disregarding missing data in statistical analyses can introduce bias to study outcomes. To mitigate this issue, imputation methods have proven effective in reducing nonresponse bias and generating complete datasets for subsequent analysis of secondary data. The efficacy of imputation methods hinges on the assumptions of the underlying imputation model. While machine learning techniques such as regression trees, random forest, XGBoost, and deep learning have demonstrated robustness against model misspecification, their optimal performance may necessitate fine-tuning under specific conditions. Moreover, imputed values generated by these methods can sometimes deviate unnaturally, falling outside the normal range. To address these challenges, we propose a novel Predictive Mean Matching imputation (PMM) procedure that leverages popular machine learning-based methods. PMM strikes a balance between robustness and the generation of appropriate imputed values. In this paper, we present our innovative PMM approach and conduct a comparative performance analysis through Monte Carlo simulation studies, assessing its effectiveness against other established methods.</p>
</abstract>
<kwd-group>
<label>Keywords</label>
<kwd>imputation</kwd>
<kwd>missing data</kwd>
<kwd>nonresponse bias</kwd>
</kwd-group>
<funding-group><funding-statement>Dr. Sixia Chen was partially supported by the Oklahoma Shared Clinical and Translational Resources (U54GM104938) with an Institutional Development Award (IDeA) from NIGMS. The content is solely the responsibility of the authors and does not necessarily represent official views of the National Institutes of Health or the Indian Health Service. Part of the computing for this project was performed at the OU Supercomputing Center for Education &amp; Research (OSCER) at the University of Oklahoma (OU).</funding-statement></funding-group>
</article-meta>
</front>
<back>
<ref-list id="j_jds1135_reflist_001">
<title>References</title>
<ref id="j_jds1135_ref_001">
<mixed-citation publication-type="other"> <string-name><surname>Akinbami</surname> <given-names>LJ</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>TC</given-names></string-name>, <string-name><surname>Davy</surname> <given-names>O</given-names></string-name>, <string-name><surname>Ogden</surname> <given-names>CL</given-names></string-name>, <string-name><surname>Fink</surname> <given-names>S</given-names></string-name>, <string-name><surname>Clark</surname> <given-names>J</given-names></string-name>, et al. (<year>2022</year>). National health and nutrition examination survey, 2017–March 2020 prepandemic file: Sample design, estimation, and analytic guidelines.</mixed-citation>
</ref>
<ref id="j_jds1135_ref_002">
<mixed-citation publication-type="journal"> <string-name><surname>Andridge</surname> <given-names>RR</given-names></string-name>, <string-name><surname>Little</surname> <given-names>RJ</given-names></string-name> (<year>2010</year>). <article-title>A review of hot deck imputation for survey non-response</article-title>. <source><italic>International Statistical Review</italic></source>, <volume>78</volume>(<issue>1</issue>): <fpage>40</fpage>–<lpage>64</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1111/j.1751-5823.2010.00103.x" xlink:type="simple">https://doi.org/10.1111/j.1751-5823.2010.00103.x</ext-link></mixed-citation>
</ref>
<ref id="j_jds1135_ref_003">
<mixed-citation publication-type="journal"> <string-name><surname>Aydilek</surname> <given-names>IB</given-names></string-name>, <string-name><surname>Arslan</surname> <given-names>A</given-names></string-name> (<year>2013</year>). <article-title>A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm</article-title>. <source><italic>Information Sciences</italic></source>, <volume>233</volume>: <fpage>25</fpage>–<lpage>35</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.ins.2013.01.021" xlink:type="simple">https://doi.org/10.1016/j.ins.2013.01.021</ext-link></mixed-citation>
</ref>
<ref id="j_jds1135_ref_004">
<mixed-citation publication-type="chapter"> <string-name><surname>Bottou</surname> <given-names>L</given-names></string-name> (<year>2010</year>). <chapter-title>Large-scale machine learning with stochastic gradient descent</chapter-title>. In: <string-name><surname>Lechevallier</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Saporta</surname> <given-names>G</given-names></string-name> (eds.), <source><italic>Proceedings of COMPSTAT’2010: 19th International Conference on Computational Statistics, Paris, France, August 22–27, 2010 Keynote, Invited and Contributed Papers</italic></source>, <fpage>177</fpage>–<lpage>186</lpage>. <publisher-name>Springer</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1135_ref_005">
<mixed-citation publication-type="journal"> <string-name><surname>Breiman</surname> <given-names>L</given-names></string-name> (<year>2001</year>). <article-title>Random forests</article-title>. <source><italic>Machine Learning</italic></source>, <volume>45</volume>: <fpage>5</fpage>–<lpage>32</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1023/A:1010933404324" xlink:type="simple">https://doi.org/10.1023/A:1010933404324</ext-link></mixed-citation>
</ref>
<ref id="j_jds1135_ref_006">
<mixed-citation publication-type="journal"> <string-name><surname>Burgette</surname> <given-names>LF</given-names></string-name>, <string-name><surname>Reiter</surname> <given-names>JP</given-names></string-name> (<year>2010</year>). <article-title>Multiple imputation for missing data via sequential regression trees</article-title>. <source><italic>American Journal of Epidemiology</italic></source>, <volume>172</volume>(<issue>9</issue>): <fpage>1070</fpage>–<lpage>1076</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1093/aje/kwq260" xlink:type="simple">https://doi.org/10.1093/aje/kwq260</ext-link></mixed-citation>
</ref>
<ref id="j_jds1135_ref_007">
<mixed-citation publication-type="journal"> <string-name><surname>Chen</surname> <given-names>J</given-names></string-name>, <string-name><surname>Shao</surname> <given-names>J</given-names></string-name> (<year>2000</year>). <article-title>Nearest neighbor imputation for survey data</article-title>. <source><italic>Journal of Official Statistics</italic></source>, <volume>16</volume>(<issue>2</issue>): <fpage>113</fpage>.</mixed-citation>
</ref>
<ref id="j_jds1135_ref_008">
<mixed-citation publication-type="journal"> <string-name><surname>Chen</surname> <given-names>S</given-names></string-name>, <string-name><surname>Haziza</surname> <given-names>D</given-names></string-name>, <string-name><surname>Stubblefield</surname> <given-names>A</given-names></string-name> (<year>2021</year>). <article-title>A note on multiply robust predictive mean matching imputation with complex survey data</article-title>. <source><italic>Survey Methodology</italic></source>, <volume>47</volume>(<issue>1</issue>): <fpage>215</fpage>–<lpage>223</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1135_ref_009">
<mixed-citation publication-type="journal"> <string-name><surname>Chen</surname> <given-names>S</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>C</given-names></string-name> (<year>2023</year>). <article-title>Handling high-dimensional data with missing values by modern machine learning techniques</article-title>. <source><italic>Journal of Applied Statistics</italic></source>, <volume>50</volume>(<issue>3</issue>): <fpage>786</fpage>–<lpage>804</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1080/02664763.2022.2068514" xlink:type="simple">https://doi.org/10.1080/02664763.2022.2068514</ext-link></mixed-citation>
</ref>
<ref id="j_jds1135_ref_010">
<mixed-citation publication-type="journal"> <string-name><surname>Chen</surname> <given-names>S</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>S</given-names></string-name>, <string-name><surname>Kim</surname> <given-names>JK</given-names></string-name> (<year>2022</year>). <article-title>Nonparametric mass imputation for data integration</article-title>. <source><italic>Journal of Survey Statistics and Methodology</italic></source>, <volume>10</volume>(<issue>1</issue>): <fpage>1</fpage>–<lpage>24</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1093/jssam/smaa036" xlink:type="simple">https://doi.org/10.1093/jssam/smaa036</ext-link></mixed-citation>
</ref>
<ref id="j_jds1135_ref_011">
<mixed-citation publication-type="chapter"> <string-name><surname>Chen</surname> <given-names>T</given-names></string-name>, <string-name><surname>Guestrin</surname> <given-names>C</given-names></string-name> (<year>2016</year>). <chapter-title>Xgboost: A scalable tree boosting system</chapter-title>. In: <string-name><surname>Krishnapuram</surname> <given-names>B</given-names></string-name> et al. (eds.), <source><italic>Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</italic></source>, <fpage>785</fpage>–<lpage>794</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1135_ref_012">
<mixed-citation publication-type="journal"> <string-name><surname>Cheng</surname> <given-names>PE</given-names></string-name> (<year>1994</year>). <article-title>Nonparametric estimation of mean functionals with data missing at random</article-title>. <source><italic>Journal of the American Statistical Association</italic></source>, <volume>89</volume>(<issue>425</issue>): <fpage>81</fpage>–<lpage>87</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1080/01621459.1994.10476448" xlink:type="simple">https://doi.org/10.1080/01621459.1994.10476448</ext-link></mixed-citation>
</ref>
<ref id="j_jds1135_ref_013">
<mixed-citation publication-type="journal"> <string-name><surname>Chou</surname> <given-names>PA</given-names></string-name> (<year>1991</year>). <article-title>Optimal partitioning for classification and regression trees</article-title>. <source><italic>IEEE Transactions on Pattern Analysis and Machine Intelligence</italic></source>, <volume>13</volume>(<issue>04</issue>): <fpage>340</fpage>–<lpage>354</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/34.88569" xlink:type="simple">https://doi.org/10.1109/34.88569</ext-link></mixed-citation>
</ref>
<ref id="j_jds1135_ref_014">
<mixed-citation publication-type="other"> <string-name><surname>Das</surname> <given-names>D</given-names></string-name>, <string-name><surname>Avancha</surname> <given-names>S</given-names></string-name>, <string-name><surname>Mudigere</surname> <given-names>D</given-names></string-name>, <string-name><surname>Vaidynathan</surname> <given-names>K</given-names></string-name>, <string-name><surname>Sridharan</surname> <given-names>S</given-names></string-name>, <string-name><surname>Kalamkar</surname> <given-names>D</given-names></string-name>, et al. (<year>2016</year>). Distributed deep learning using synchronous stochastic gradient descent. arXiv preprint: <uri>https://arxiv.org/abs/1602.06709</uri>.</mixed-citation>
</ref>
<ref id="j_jds1135_ref_015">
<mixed-citation publication-type="journal"> <string-name><surname>Deng</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Lumley</surname> <given-names>T</given-names></string-name> (<year>2023</year>). <article-title>Multiple imputation through XGBoost</article-title>. <source><italic>Journal of Computational and Graphical Statistics</italic></source>, <volume>33</volume>(<issue>2</issue>): <fpage>352</fpage>–<lpage>363</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1080/10618600.2023.2252501" xlink:type="simple">https://doi.org/10.1080/10618600.2023.2252501</ext-link></mixed-citation>
</ref>
<ref id="j_jds1135_ref_016">
<mixed-citation publication-type="journal"> <string-name><surname>Farrell</surname> <given-names>MH</given-names></string-name>, <string-name><surname>Liang</surname> <given-names>T</given-names></string-name>, <string-name><surname>Misra</surname> <given-names>S</given-names></string-name> (<year>2021</year>). <article-title>Deep neural networks for estimation and inference</article-title>. <source><italic>Econometrica</italic></source>, <volume>89</volume>(<issue>1</issue>): <fpage>181</fpage>–<lpage>213</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.3982/ECTA16901" xlink:type="simple">https://doi.org/10.3982/ECTA16901</ext-link></mixed-citation>
</ref>
<ref id="j_jds1135_ref_017">
<mixed-citation publication-type="book"> <string-name><surname>Fuller</surname> <given-names>WA</given-names></string-name> (<year>2009</year>). <source><italic>Measurement Error Models</italic></source>. <publisher-name>John Wiley &amp; Sons</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1135_ref_018">
<mixed-citation publication-type="book"> <string-name><surname>Goodfellow</surname> <given-names>I</given-names></string-name>, <string-name><surname>Bengio</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Courville</surname> <given-names>A</given-names></string-name> (<year>2016</year>). <source><italic>Deep Learning</italic></source>. <publisher-name>MIT Press</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1135_ref_019">
<mixed-citation publication-type="chapter"> <string-name><surname>Hastie</surname> <given-names>TJ</given-names></string-name> (<year>2017</year>). <chapter-title>Generalized additive models</chapter-title>. In: <string-name><surname>Chambers</surname> <given-names>JM</given-names></string-name>, <string-name><surname>Hastie</surname> <given-names>TJ</given-names></string-name> (eds.), <source><italic>Statistical Models in S</italic></source>, <fpage>249</fpage>–<lpage>307</lpage>. <publisher-name>Routledge</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1135_ref_020">
<mixed-citation publication-type="journal"> <string-name><surname>Hearst</surname> <given-names>MA</given-names></string-name>, <string-name><surname>Dumais</surname> <given-names>ST</given-names></string-name>, <string-name><surname>Osuna</surname> <given-names>E</given-names></string-name>, <string-name><surname>Platt</surname> <given-names>J</given-names></string-name>, <string-name><surname>Scholkopf</surname> <given-names>B</given-names></string-name> (<year>1998</year>). <article-title>Support vector machines</article-title>. <source><italic>IEEE Intelligent Systems &amp; Their Applications</italic></source>, <volume>13</volume>(<issue>4</issue>): <fpage>18</fpage>–<lpage>28</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/5254.708428" xlink:type="simple">https://doi.org/10.1109/5254.708428</ext-link></mixed-citation>
</ref>
<ref id="j_jds1135_ref_021">
<mixed-citation publication-type="journal"> <string-name><surname>Heitjan</surname> <given-names>DF</given-names></string-name>, <string-name><surname>Little</surname> <given-names>RJ</given-names></string-name> (<year>1991</year>). <article-title>Multiple imputation for the fatal accident reporting system</article-title>. <source><italic>Journal of the Royal Statistical Society. Series C. Applied Statistics</italic></source>, <volume>40</volume>(<issue>1</issue>): <fpage>13</fpage>–<lpage>29</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1135_ref_022">
<mixed-citation publication-type="journal"> <string-name><surname>Hinton</surname> <given-names>G</given-names></string-name>, <string-name><surname>Srivastava</surname> <given-names>N</given-names></string-name>, <string-name><surname>Swersky</surname> <given-names>K</given-names></string-name> (<year>2012</year>). <article-title>Neural networks for machine learning. Lecture 6a. Overview of mini-batch gradient descent</article-title>. <source><italic>Cited on</italic></source>, <volume>14</volume>(<issue>8</issue>): <fpage>2</fpage>.</mixed-citation>
</ref>
<ref id="j_jds1135_ref_023">
<mixed-citation publication-type="book"> <string-name><surname>Imbens</surname> <given-names>GW</given-names></string-name>, <string-name><surname>Rubin</surname> <given-names>DB</given-names></string-name> (<year>2015</year>). <source><italic>Causal Inference in Statistics, Social, and Biomedical Sciences</italic></source>. <publisher-name>Cambridge University Press</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1135_ref_024">
<mixed-citation publication-type="journal"> <string-name><surname>Kim</surname> <given-names>JK</given-names></string-name> (<year>2011</year>). <article-title>Parametric fractional imputation for missing data analysis</article-title>. <source><italic>Biometrika</italic></source>, <volume>98</volume>(<issue>1</issue>): <fpage>119</fpage>–<lpage>132</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1093/biomet/asq073" xlink:type="simple">https://doi.org/10.1093/biomet/asq073</ext-link></mixed-citation>
</ref>
<ref id="j_jds1135_ref_025">
<mixed-citation publication-type="journal"> <string-name><surname>Kim</surname> <given-names>JK</given-names></string-name>, <string-name><surname>Fuller</surname> <given-names>W</given-names></string-name> (<year>2004</year>). <article-title>Fractional hot deck imputation</article-title>. <source><italic>Biometrika</italic></source>, <volume>91</volume>(<issue>3</issue>): <fpage>559</fpage>–<lpage>578</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1093/biomet/91.3.559" xlink:type="simple">https://doi.org/10.1093/biomet/91.3.559</ext-link></mixed-citation>
</ref>
<ref id="j_jds1135_ref_026">
<mixed-citation publication-type="book"> <string-name><surname>Kim</surname> <given-names>JK</given-names></string-name>, <string-name><surname>Shao</surname> <given-names>J</given-names></string-name> (<year>2021</year>). <source><italic>Statistical Methods for Handling Incomplete Data</italic></source>. <publisher-name>CRC Press</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1135_ref_027">
<mixed-citation publication-type="other"> <string-name><surname>Kingma</surname> <given-names>DP</given-names></string-name>, <string-name><surname>Ba</surname> <given-names>J</given-names></string-name> (<year>2014</year>). Adam: A method for stochastic optimization. arXiv preprint: <uri>https://arxiv.org/abs/1412.6980</uri>.</mixed-citation>
</ref>
<ref id="j_jds1135_ref_028">
<mixed-citation publication-type="journal"> <string-name><surname>Lin</surname> <given-names>J</given-names></string-name>, <string-name><surname>Li</surname> <given-names>N</given-names></string-name>, <string-name><surname>Alam</surname> <given-names>MA</given-names></string-name>, <string-name><surname>Ma</surname> <given-names>Y</given-names></string-name> (<year>2020</year>). <article-title>Data-driven missing data imputation in cluster monitoring system based on deep neural network</article-title>. <source><italic>Applied Intelligence</italic></source>, <volume>50</volume>: <fpage>860</fpage>–<lpage>877</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s10489-019-01560-y" xlink:type="simple">https://doi.org/10.1007/s10489-019-01560-y</ext-link></mixed-citation>
</ref>
<ref id="j_jds1135_ref_029">
<mixed-citation publication-type="journal"> <string-name><surname>Little</surname> <given-names>RJ</given-names></string-name> (<year>1988</year>). <article-title>Missing-data adjustments in large surveys</article-title>. <source><italic>Journal of Business &amp; Economic Statistics</italic></source>, <volume>6</volume>(<issue>3</issue>): <fpage>287</fpage>–<lpage>296</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1080/07350015.1988.10509663" xlink:type="simple">https://doi.org/10.1080/07350015.1988.10509663</ext-link></mixed-citation>
</ref>
<ref id="j_jds1135_ref_030">
<mixed-citation publication-type="book"> <string-name><surname>Little</surname> <given-names>RJ</given-names></string-name>, <string-name><surname>Rubin</surname> <given-names>DB</given-names></string-name> (<year>2019</year>). <source><italic>Statistical Analysis with Missing Data</italic></source>, volume <volume>793</volume>. <publisher-name>John Wiley &amp; Sons</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1135_ref_031">
<mixed-citation publication-type="book"> <string-name><surname>Loehlin</surname> <given-names>JC</given-names></string-name> (<year>2004</year>). <source><italic>Latent Variable Models: An Introduction to Factor, Path, and Structural Equation Analysis</italic></source>. <publisher-name>Psychology Press</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1135_ref_032">
<mixed-citation publication-type="other"> <string-name><surname>Mallinson</surname> <given-names>H</given-names></string-name>, <string-name><surname>Gammerman</surname> <given-names>A</given-names></string-name> (<year>2003</year>). Imputation using support vector machines. <italic>University of London Egham, UK: Department of Computer Science Royal Holloway</italic>.</mixed-citation>
</ref>
<ref id="j_jds1135_ref_033">
<mixed-citation publication-type="journal"> <string-name><surname>Noble</surname> <given-names>WS</given-names></string-name> (<year>2006</year>). <article-title>What is a support vector machine?</article-title> <source><italic>Nature Biotechnology</italic></source>, <volume>24</volume>(<issue>12</issue>): <fpage>1565</fpage>–<lpage>1567</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1038/nbt1206-1565" xlink:type="simple">https://doi.org/10.1038/nbt1206-1565</ext-link></mixed-citation>
</ref>
<ref id="j_jds1135_ref_034">
<mixed-citation publication-type="journal"> <string-name><surname>Peterson</surname> <given-names>LE</given-names></string-name> (<year>2009</year>). <article-title>K-nearest neighbor</article-title>. <source><italic>Scholarpedia</italic></source>, <volume>4</volume>(<issue>2</issue>): <fpage>1883</fpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.4249/scholarpedia.1883" xlink:type="simple">https://doi.org/10.4249/scholarpedia.1883</ext-link></mixed-citation>
</ref>
<ref id="j_jds1135_ref_035">
<mixed-citation publication-type="other"> <string-name><surname>Polley</surname> <given-names>EC</given-names></string-name>, <string-name><surname>Van der Laan</surname> <given-names>MJ</given-names></string-name> (<year>2010</year>). Super learner in prediction.</mixed-citation>
</ref>
<ref id="j_jds1135_ref_036">
<mixed-citation publication-type="chapter"> <string-name><surname>Qiao</surname> <given-names>L</given-names></string-name>, <string-name><surname>Ran</surname> <given-names>R</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>H</given-names></string-name>, <string-name><surname>Zhou</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>S</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Y</given-names></string-name> (<year>2018</year>). <chapter-title>Imputation method of missing values for dissolved gas analysis data based on iterative KNN and XGBoost</chapter-title>. In: <source><italic>Proceedings of the 2018 International Conference on Algorithms, Computing and Artificial Intelligence</italic></source>, <fpage>1</fpage>–<lpage>7</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1135_ref_037">
<mixed-citation publication-type="chapter"> <string-name><surname>Rahman</surname> <given-names>MG</given-names></string-name>, <string-name><surname>Islam</surname> <given-names>MZ</given-names></string-name> (<year>2011</year>). <chapter-title>A decision tree-based missing value imputation technique for data pre-processing</chapter-title>. In: <string-name><surname>Vamplew</surname> <given-names>P</given-names></string-name>, <string-name><surname>Stranieri</surname> <given-names>A</given-names></string-name>, <string-name><surname>Ong</surname> <given-names>K-L</given-names></string-name>, <string-name><surname>Christen</surname> <given-names>P</given-names></string-name>, <string-name><surname>Kennedy</surname> <given-names>PJ</given-names></string-name> (eds.), <source><italic>The 9th Australasian Data Mining Conference: AusDM 2011</italic></source>, <fpage>41</fpage>–<lpage>50</lpage>. <publisher-name>Australian Computer Society Inc.</publisher-name></mixed-citation>
</ref>
<ref id="j_jds1135_ref_038">
<mixed-citation publication-type="journal"> <string-name><surname>Rao</surname> <given-names>JN</given-names></string-name>, <string-name><surname>Shao</surname> <given-names>J</given-names></string-name> (<year>1992</year>). <article-title>Jackknife variance estimation with survey data under hot deck imputation</article-title>. <source><italic>Biometrika</italic></source>, <volume>79</volume>(<issue>4</issue>): <fpage>811</fpage>–<lpage>822</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1093/biomet/79.4.811" xlink:type="simple">https://doi.org/10.1093/biomet/79.4.811</ext-link></mixed-citation>
</ref>
<ref id="j_jds1135_ref_039">
<mixed-citation publication-type="journal"> <string-name><surname>Rubin</surname> <given-names>DB</given-names></string-name> (<year>1996</year>). <article-title>Multiple imputation after 18+ years</article-title>. <source><italic>Journal of the American Statistical Association</italic></source>, <volume>91</volume>(<issue>434</issue>): <fpage>473</fpage>–<lpage>489</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1080/01621459.1996.10476908" xlink:type="simple">https://doi.org/10.1080/01621459.1996.10476908</ext-link></mixed-citation>
</ref>
<ref id="j_jds1135_ref_040">
<mixed-citation publication-type="chapter"> <string-name><surname>Rubin</surname> <given-names>DB</given-names></string-name> (<year>2018</year>). <chapter-title>Multiple imputation</chapter-title>. In: <string-name><surname>van Buuren</surname> <given-names>S</given-names></string-name> (ed.), <source><italic>Flexible Imputation of Missing Data</italic></source>, <edition>Second</edition> Edition, <fpage>29</fpage>–<lpage>62</lpage>. <publisher-name>Chapman and Hall/CRC</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1135_ref_041">
<mixed-citation publication-type="journal"> <string-name><surname>Shah</surname> <given-names>AD</given-names></string-name>, <string-name><surname>Bartlett</surname> <given-names>JW</given-names></string-name>, <string-name><surname>Carpenter</surname> <given-names>J</given-names></string-name>, <string-name><surname>Nicholas</surname> <given-names>O</given-names></string-name>, <string-name><surname>Hemingway</surname> <given-names>H</given-names></string-name> (<year>2014</year>). <article-title>Comparison of random forest and parametric imputation models for imputing missing data using mice: A caliber study</article-title>. <source><italic>American Journal of Epidemiology</italic></source>, <volume>179</volume>(<issue>6</issue>): <fpage>764</fpage>–<lpage>774</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1093/aje/kwt312" xlink:type="simple">https://doi.org/10.1093/aje/kwt312</ext-link></mixed-citation>
</ref>
<ref id="j_jds1135_ref_042">
<mixed-citation publication-type="chapter"> <string-name><surname>Steinberg</surname> <given-names>D</given-names></string-name>, <string-name><surname>Colla</surname> <given-names>P</given-names></string-name> (<year>2009</year>). <chapter-title>Cart: classification and regression trees</chapter-title>. In: <string-name><surname>Wu</surname> <given-names>X</given-names></string-name>, <string-name><surname>Kumar</surname> <given-names>V</given-names></string-name> (eds.), <source><italic>The Top Ten Algorithms in Data Mining</italic></source>, volume <volume>9</volume>, <fpage>179</fpage>.</mixed-citation>
</ref>
<ref id="j_jds1135_ref_043">
<mixed-citation publication-type="journal"> <string-name><surname>Tang</surname> <given-names>F</given-names></string-name>, <string-name><surname>Ishwaran</surname> <given-names>H</given-names></string-name> (<year>2017</year>). <article-title>Random forest missing data algorithms</article-title>. <source><italic>Statistical Analysis and Data Mining: The ASA Data Science Journal</italic></source>, <volume>10</volume>(<issue>6</issue>): <fpage>363</fpage>–<lpage>377</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1135_ref_044">
<mixed-citation publication-type="journal"> <string-name><surname>Toth</surname> <given-names>D</given-names></string-name>, <string-name><surname>Eltinge</surname> <given-names>JL</given-names></string-name> (<year>2011</year>). <article-title>Building consistent regression trees from complex sample data</article-title>. <source><italic>Journal of the American Statistical Association</italic></source>, <volume>106</volume>(<issue>496</issue>): <fpage>1626</fpage>–<lpage>1636</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1198/jasa.2011.tm10383" xlink:type="simple">https://doi.org/10.1198/jasa.2011.tm10383</ext-link></mixed-citation>
</ref>
<ref id="j_jds1135_ref_045">
<mixed-citation publication-type="journal"> <string-name><surname>Van der Laan</surname> <given-names>MJ</given-names></string-name>, <string-name><surname>Polley</surname> <given-names>EC</given-names></string-name>, <string-name><surname>Hubbard</surname> <given-names>AE</given-names></string-name> (<year>2007</year>). <article-title>Super learner</article-title>. <source><italic>Statistical Applications in Genetics and Molecular Biology</italic></source>, <volume>6</volume>(<issue>1</issue>): <fpage>25</fpage>.</mixed-citation>
</ref>
<ref id="j_jds1135_ref_046">
<mixed-citation publication-type="journal"> <string-name><surname>Wager</surname> <given-names>S</given-names></string-name>, <string-name><surname>Athey</surname> <given-names>S</given-names></string-name> (<year>2018</year>). <article-title>Estimation and inference of heterogeneous treatment effects using random forests</article-title>. <source><italic>Journal of the American Statistical Association</italic></source>, <volume>113</volume>(<issue>523</issue>): <fpage>1228</fpage>–<lpage>1242</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1080/01621459.2017.1319839" xlink:type="simple">https://doi.org/10.1080/01621459.2017.1319839</ext-link></mixed-citation>
</ref>
<ref id="j_jds1135_ref_047">
<mixed-citation publication-type="chapter"> <string-name><surname>Yang</surname> <given-names>S</given-names></string-name>, <string-name><surname>Kim</surname> <given-names>JK</given-names></string-name> (<year>2019</year>). <chapter-title>Nearest neighbor imputation for general parameter estimation in survey sampling</chapter-title>. In: <string-name><surname>Huynh</surname> <given-names>KP</given-names></string-name>, <string-name><surname>Jacho-Chávez</surname> <given-names>DT</given-names></string-name>, <string-name><surname>Tripathi</surname> <given-names>G</given-names></string-name> (eds.), <source><italic>The Econometrics of Complex Survey Data</italic></source>, volume <volume>39</volume>, <fpage>209</fpage>–<lpage>234</lpage>. <publisher-name>Emerald Publishing Limited</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1135_ref_048">
<mixed-citation publication-type="journal"> <string-name><surname>Yang</surname> <given-names>S</given-names></string-name>, <string-name><surname>Kim</surname> <given-names>JK</given-names></string-name> (<year>2020</year>a). <article-title>Asymptotic theory and inference of predictive mean matching imputation using a superpopulation model framework</article-title>. <source><italic>Scandinavian Journal of Statistics</italic></source>, <volume>47</volume>(<issue>3</issue>): <fpage>839</fpage>–<lpage>861</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1111/sjos.12429" xlink:type="simple">https://doi.org/10.1111/sjos.12429</ext-link></mixed-citation>
</ref>
<ref id="j_jds1135_ref_049">
<mixed-citation publication-type="journal"> <string-name><surname>Yang</surname> <given-names>S</given-names></string-name>, <string-name><surname>Kim</surname> <given-names>JK</given-names></string-name> (<year>2020</year>b). <article-title>Statistical data integration in survey sampling: A review</article-title>. <source><italic>Japanese Journal of Statistics and Data Science</italic></source>, <volume>3</volume>: <fpage>625</fpage>–<lpage>650</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s42081-020-00093-w" xlink:type="simple">https://doi.org/10.1007/s42081-020-00093-w</ext-link></mixed-citation>
</ref>
<ref id="j_jds1135_ref_050">
<mixed-citation publication-type="journal"> <string-name><surname>Zhang</surname> <given-names>Z</given-names></string-name> (<year>2016</year>). <article-title>Missing data imputation: Focusing on single imputation</article-title>. <source><italic>Annals of Translational Medicine</italic></source>, <volume>4</volume>(<issue>1</issue>): <fpage>9</fpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.21037/atm-20-3623" xlink:type="simple">https://doi.org/10.21037/atm-20-3623</ext-link></mixed-citation>
</ref>
</ref-list>
</back>
</article>
