<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">JDS</journal-id>
<journal-title-group><journal-title>Journal of Data Science</journal-title></journal-title-group>
<issn pub-type="epub">1683-8602</issn><issn pub-type="ppub">1680-743X</issn><issn-l>1680-743X</issn-l>
<publisher>
<publisher-name>School of Statistics, Renmin University of China</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">JDS1206</article-id>
<article-id pub-id-type="doi">10.6339/25-JDS1206</article-id>
<article-categories><subj-group subj-group-type="heading">
<subject>Data Science in Action</subject></subj-group></article-categories>
<title-group>
<article-title>Inside Out: Externalizing Assumptions in Data Analysis as Validation Checks</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0002-7122-1463</contrib-id>
<name><surname>Zhang</surname><given-names>H. Sherry</given-names></name><email xlink:href="mailto:huize.zhang@austin.utexas.edu">huize.zhang@austin.utexas.edu</email><xref ref-type="aff" rid="j_jds1206_aff_001">1</xref><xref ref-type="corresp" rid="cor1">∗</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Peng</surname><given-names>Roger D.</given-names></name><xref ref-type="aff" rid="j_jds1206_aff_001">1</xref>
</contrib>
<aff id="j_jds1206_aff_001"><label>1</label>Department of Statistics and Data Sciences, <institution>University of Texas at Austin</institution>, Texas, <country>United States</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>∗</label>Corresponding author. Email: <ext-link ext-link-type="uri" xlink:href="mailto:huize.zhang@austin.utexas.edu">huize.zhang@austin.utexas.edu</ext-link>.</corresp>
</author-notes>
<pub-date pub-type="ppub"><year>2025</year></pub-date><pub-date pub-type="epub"><day>9</day><month>3</month><year>2026</year></pub-date><volume content-type="ahead-of-print">0</volume><issue>0</issue><fpage>1</fpage><lpage>18</lpage><supplementary-material id="S1" content-type="archive" xlink:href="jds1206_s001.zip" mimetype="application" mime-subtype="x-zip-compressed">
<caption>
<title>Supplementary Material</title>
<p>The supplementary materials include a full script of the examples in the paper (<monospace>index.R</monospace>) and its output (<monospace>index.html</monospace>), the data used in the examples in Section 5 (<monospace>data/</monospace>), the package source (<monospace>adtoolbox_0.1.0.tar.gz</monospace>), and a README.md file containing the install instructions for running the scripts.</p>
</caption>
</supplementary-material><history><date date-type="received"><day>24</day><month>12</month><year>2024</year></date><date date-type="accepted"><day>29</day><month>10</month><year>2025</year></date></history>
<permissions><copyright-statement>2025 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.</copyright-statement><copyright-year>2025</copyright-year>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
<abstract>
<p>In data analysis, unexpected results often prompt researchers to revisit their procedures to identify potential issues. While some researchers may struggle to identify the root causes, experienced researchers can often quickly diagnose problems by checking a few key assumptions. These checked assumptions, or expectations, are typically informal, difficult to trace, and rarely discussed in publications. In this paper, we introduce the term <italic>analysis validation checks</italic> to formalize and externalize these informal assumptions. We then introduce a procedure to identify a subset of checks that best predict the occurrence of unexpected outcomes, based on simulations of the original data. The checks are evaluated in terms of accuracy, determined by binary classification metrics, and independence, which measures the shared information among checks. We demonstrate this approach with a toy example using step count data and a generalized linear model example examining the effect of particulate matter air pollution on daily mortality.</p>
</abstract>
<kwd-group>
<label>Keywords</label>
<kwd>data analysis assumptions</kwd>
<kwd>diagnostic</kwd>
<kwd>logic regression</kwd>
</kwd-group>
</article-meta>
</front>
<back>
<ref-list id="j_jds1206_reflist_001">
<title>References</title>
<ref id="j_jds1206_ref_001">
<mixed-citation publication-type="other"> <string-name><surname>Allaire</surname> <given-names>JJ</given-names></string-name>, <string-name><surname>Teague</surname> <given-names>C</given-names></string-name>, <string-name><surname>Scheidegger</surname> <given-names>C</given-names></string-name>, <string-name><surname>Xie</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Dervieux</surname> <given-names>C</given-names></string-name> (<year>2022</year>). Quarto. URL <uri>https://github.com/quarto-dev/quarto-cli</uri>.</mixed-citation>
</ref>
<ref id="j_jds1206_ref_002">
<mixed-citation publication-type="journal"> <string-name><surname>Batini</surname> <given-names>C</given-names></string-name>, <string-name><surname>Cappiello</surname> <given-names>C</given-names></string-name>, <string-name><surname>Francalanci</surname> <given-names>C</given-names></string-name>, <string-name><surname>Maurino</surname> <given-names>A</given-names></string-name> (<year>2009</year>). <article-title>Methodologies for data quality assessment and improvement</article-title>. <source><italic>ACM Computing Surveys (CSUR)</italic></source>, <volume>41</volume>(<issue>3</issue>): <fpage>1</fpage>–<lpage>52</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1145/1541880.1541883" xlink:type="simple">https://doi.org/10.1145/1541880.1541883</ext-link></mixed-citation>
</ref>
<ref id="j_jds1206_ref_003">
<mixed-citation publication-type="journal"> <string-name><surname>Bell</surname> <given-names>ML</given-names></string-name>, <string-name><surname>McDermott</surname> <given-names>A</given-names></string-name>, <string-name><surname>Zeger</surname> <given-names>SL</given-names></string-name>, <string-name><surname>Samet</surname> <given-names>JM</given-names></string-name>, <string-name><surname>Dominici</surname> <given-names>F</given-names></string-name> (<year>2004</year>). <article-title>Ozone and short-term mortality in 95 US urban communities, 1987–2000</article-title>. <source><italic>Journal of the American Medical Association</italic></source>, <volume>292</volume>(<issue>19</issue>): <fpage>2372</fpage>–<lpage>2378</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1001/jama.292.19.2372" xlink:type="simple">https://doi.org/10.1001/jama.292.19.2372</ext-link></mixed-citation>
</ref>
<ref id="j_jds1206_ref_004">
<mixed-citation publication-type="other"> <string-name><surname>Broderick</surname> <given-names>T</given-names></string-name>, <string-name><surname>Gelman</surname> <given-names>A</given-names></string-name>, <string-name><surname>Meager</surname> <given-names>R</given-names></string-name>, <string-name><surname>Smith</surname> <given-names>AL</given-names></string-name>, <string-name><surname>Zheng</surname> <given-names>T</given-names></string-name> (<year>2023</year>). <article-title>Toward a taxonomy of trust for probabilistic machine learning</article-title>. <source><italic>Science Advances</italic></source>, <volume>9</volume>(<issue>7</issue>), <fpage>eabn3999</fpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1126/sciadv.abn3999" xlink:type="simple">https://doi.org/10.1126/sciadv.abn3999</ext-link></mixed-citation>
</ref>
<ref id="j_jds1206_ref_005">
<mixed-citation publication-type="journal"> <string-name><surname>Cai</surname> <given-names>L</given-names></string-name>, <string-name><surname>Zhu</surname> <given-names>Y</given-names></string-name> (<year>2015</year>). <article-title>The challenges of data quality and data quality assessment in the big data era</article-title>. <source><italic>Data Science Journal</italic></source>, <volume>14</volume>: <fpage>2</fpage>–<lpage>2</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.5334/dsj-2015-002" xlink:type="simple">https://doi.org/10.5334/dsj-2015-002</ext-link></mixed-citation>
</ref>
<ref id="j_jds1206_ref_006">
<mixed-citation publication-type="journal"> <string-name><surname>Cichy</surname> <given-names>C</given-names></string-name>, <string-name><surname>Rass</surname> <given-names>S</given-names></string-name> (<year>2019</year>). <article-title>An overview of data quality frameworks</article-title>. <source><italic>IEEE Access</italic></source>, <volume>7</volume>: <fpage>24634</fpage>–<lpage>24648</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/ACCESS.2019.2899751" xlink:type="simple">https://doi.org/10.1109/ACCESS.2019.2899751</ext-link></mixed-citation>
</ref>
<ref id="j_jds1206_ref_007">
<mixed-citation publication-type="journal"> <string-name><surname>Dong</surname> <given-names>J</given-names></string-name>, <string-name><surname>Roth</surname> <given-names>A</given-names></string-name>, <string-name><surname>Su</surname> <given-names>WJ</given-names></string-name> (<year>2022</year>). <article-title>Gaussian differential privacy</article-title>. <source><italic>Journal of the Royal Statistical Society Series B: Statistical Methodology</italic></source>, <volume>84</volume>(<issue>1</issue>): <fpage>3</fpage>–<lpage>37</lpage>. <comment>ISSN 1369-7412</comment>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1111/rssb.12454" xlink:type="simple">https://doi.org/10.1111/rssb.12454</ext-link></mixed-citation>
</ref>
<ref id="j_jds1206_ref_008">
<mixed-citation publication-type="journal"> <string-name><surname>Donoghue</surname> <given-names>T</given-names></string-name>, <string-name><surname>Voytek</surname> <given-names>B</given-names></string-name>, <string-name><surname>Ellis</surname> <given-names>SE</given-names></string-name> (<year>2021</year>). <article-title>Teaching creative and practical data science at scale</article-title>. <source><italic>Journal of Statistics and Data Science Education</italic></source>, <volume>29</volume>(<issue>sup1</issue>): <fpage>S27</fpage>–<lpage>S39</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1080/10691898.2020.1860725" xlink:type="simple">https://doi.org/10.1080/10691898.2020.1860725</ext-link></mixed-citation>
</ref>
<ref id="j_jds1206_ref_009">
<mixed-citation publication-type="other"> <string-name><surname>Fischetti</surname> <given-names>T</given-names></string-name> (<year>2023</year>). assertr: Assertive programming for R analysis pipelines. URL <uri>https://CRAN.R-project.org/package=assertr</uri>. R package version 3.0.1.</mixed-citation>
</ref>
<ref id="j_jds1206_ref_010">
<mixed-citation publication-type="journal"> <string-name><surname>Grolemund</surname> <given-names>G</given-names></string-name>, <string-name><surname>Wickham</surname> <given-names>H</given-names></string-name> (<year>2014</year>). <article-title>A cognitive interpretation of data analysis</article-title>. <source><italic>International Statistical Review</italic></source>, <volume>82</volume>(<issue>2</issue>): <fpage>184</fpage>–<lpage>204</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1111/insr.12028" xlink:type="simple">https://doi.org/10.1111/insr.12028</ext-link></mixed-citation>
</ref>
<ref id="j_jds1206_ref_011">
<mixed-citation publication-type="chapter"> <string-name><surname>Gu</surname> <given-names>K</given-names></string-name>, <string-name><surname>Grunde-McLaughlin</surname> <given-names>M</given-names></string-name>, <string-name><surname>McNutt</surname> <given-names>A</given-names></string-name>, <string-name><surname>Heer</surname> <given-names>J</given-names></string-name>, <string-name><surname>Althoff</surname> <given-names>T</given-names></string-name> (<year>2024</year>). <chapter-title>How do data analysts respond to AI assistance? A wizard-of-oz study</chapter-title>. In: <source><italic>Proceedings of the CHI Conference on Human Factors in Computing Systems</italic></source>, <publisher-name>Association for Computing Machinery</publisher-name>, <publisher-loc>New York, NY, USA</publisher-loc>. <fpage>1</fpage>–<lpage>22</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1145/3613904.3641891" xlink:type="simple">https://doi.org/10.1145/3613904.3641891</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1206_ref_012">
<mixed-citation publication-type="other"> <string-name><surname>Henry</surname> <given-names>L</given-names></string-name>, <string-name><surname>Pedersen</surname> <given-names>TL</given-names></string-name>, <string-name><surname>Luciani</surname> <given-names>TJ</given-names></string-name>, <string-name><surname>Decorde</surname> <given-names>M</given-names></string-name>, <string-name><surname>Lise</surname> <given-names>V</given-names></string-name> (<year>2023</year>). <italic>vdiffr: Visual regression testing and graphical diffing</italic>. URL <uri>https://CRAN.R-project.org/package=vdiffr</uri>. R package version 1.0.7.</mixed-citation>
</ref>
<ref id="j_jds1206_ref_013">
<mixed-citation publication-type="other"> <string-name><surname>Iannone</surname> <given-names>R</given-names></string-name>, <string-name><surname>Vargas</surname> <given-names>M</given-names></string-name>, <string-name><surname>Choe</surname> <given-names>J</given-names></string-name> (<year>2024</year>). pointblank: Data validation and organization of metadata for local and remote tables. URL <uri>https://CRAN.R-project.org/package=pointblank</uri>. R package version 0.12.2.</mixed-citation>
</ref>
<ref id="j_jds1206_ref_014">
<mixed-citation publication-type="journal"> <string-name><surname>Leiner</surname> <given-names>J</given-names></string-name>, <string-name><surname>Duan</surname> <given-names>B</given-names></string-name>, <string-name><surname>Wasserman</surname> <given-names>L</given-names></string-name>, <string-name><surname>Ramdas</surname> <given-names>A</given-names></string-name> (<year>2023</year>). <article-title>Data fission: Splitting a single data point</article-title>. <source><italic>Journal of the American Statistical Association</italic></source>, <volume>12-</volume>(<issue>549</issue>), <fpage>135</fpage>–<lpage>146</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1080/01621459.2023.2270748" xlink:type="simple">https://doi.org/10.1080/01621459.2023.2270748</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1206_ref_015">
<mixed-citation publication-type="chapter"> <string-name><surname>Li</surname> <given-names>C</given-names></string-name>, <string-name><surname>Chan</surname> <given-names>E</given-names></string-name>, <string-name><surname>Denny</surname> <given-names>P</given-names></string-name>, <string-name><surname>Luxton-Reilly</surname> <given-names>A</given-names></string-name>, <string-name><surname>Tempero</surname> <given-names>E</given-names></string-name> (<year>2019</year>). <chapter-title>Towards a framework for teaching debugging</chapter-title>. In: <source><italic>Proceedings of the Twenty-First Australasian Computing Education Conference</italic></source>, <publisher-name>Association for Computing Machinery</publisher-name>, <publisher-loc>New York, NY, USA</publisher-loc>, <fpage>79</fpage>–<lpage>86</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1206_ref_016">
<mixed-citation publication-type="book"> <string-name><surname>Michael</surname> <given-names>S</given-names></string-name>, <string-name><surname>Joane</surname> <given-names>D</given-names></string-name>, <string-name><surname>Joseph</surname> <given-names>F</given-names></string-name>, <string-name><surname>Joseph</surname> <given-names>M</given-names></string-name>, <string-name><surname>Jan</surname> <given-names>R</given-names></string-name> (<year>2002</year>). <source><italic>Fault Tree Handbook with Aerospace Applications</italic></source>. <publisher-name>NASA Office of Safety and Mission Assurance-NASA Headquarters</publisher-name>. <fpage>2</fpage>–<lpage>8</lpage>. <publisher-loc>Washington</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1206_ref_017">
<mixed-citation publication-type="journal"> <string-name><surname>Neufeld</surname> <given-names>A</given-names></string-name>, <string-name><surname>Dharamshi</surname> <given-names>A</given-names></string-name>, <string-name><surname>Gao</surname> <given-names>LL</given-names></string-name>, <string-name><surname>Witten</surname> <given-names>D</given-names></string-name> (<year>2024</year>). <article-title>Data thinning for convolution-closed distributions</article-title>. <source><italic>Journal of Machine Learning Research</italic></source>, <volume>25</volume>(<issue>57</issue>): <fpage>1</fpage>–<lpage>35</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1206_ref_018">
<mixed-citation publication-type="journal"> <string-name><surname>Peng</surname> <given-names>RD</given-names></string-name> (<year>2011</year>). <article-title>Reproducible research in computational science</article-title>. <source><italic>Science</italic></source>, <volume>334</volume>(<issue>6060</issue>): <fpage>1226</fpage>–<lpage>1227</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1126/science.1213847" xlink:type="simple">https://doi.org/10.1126/science.1213847</ext-link></mixed-citation>
</ref>
<ref id="j_jds1206_ref_019">
<mixed-citation publication-type="journal"> <string-name><surname>Peng</surname> <given-names>RD</given-names></string-name>, <string-name><surname>Parker</surname> <given-names>HS</given-names></string-name> (<year>2022</year>). <article-title>Perspective on data science</article-title>. <source><italic>Annual Review of Statistics and Its Application</italic></source>, <volume>9</volume>(<issue>1</issue>): <fpage>1</fpage>–<lpage>20</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1146/annurev-statistics-040220-013917" xlink:type="simple">https://doi.org/10.1146/annurev-statistics-040220-013917</ext-link></mixed-citation>
</ref>
<ref id="j_jds1206_ref_020">
<mixed-citation publication-type="journal"> <string-name><surname>Peng</surname> <given-names>RD</given-names></string-name>, <string-name><surname>Dominici</surname> <given-names>F</given-names></string-name>, <string-name><surname>Louis</surname> <given-names>TA</given-names></string-name> (<year>2006</year>). <article-title>Model choice in time series studies of air pollution and mortality</article-title>. <source><italic>Journal of the Royal Statistical Society Series A: Statistics in Society</italic></source>, <volume>169</volume>(<issue>2</issue>): <fpage>179</fpage>–<lpage>203</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1111/j.1467-985X.2006.00410.x" xlink:type="simple">https://doi.org/10.1111/j.1467-985X.2006.00410.x</ext-link></mixed-citation>
</ref>
<ref id="j_jds1206_ref_021">
<mixed-citation publication-type="journal"> <string-name><surname>Peng</surname> <given-names>RD</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>A</given-names></string-name>, <string-name><surname>Bridgeford</surname> <given-names>E</given-names></string-name>, <string-name><surname>Leek</surname> <given-names>JT</given-names></string-name>, <string-name><surname>Hicks</surname> <given-names>SC</given-names></string-name> (<year>2021</year>). <article-title>Diagnosing data analytic problems in the classroom</article-title>. <source><italic>Journal of Statistics and Data Science Education</italic></source>, <volume>29</volume>(<issue>3</issue>): <fpage>267</fpage>–<lpage>276</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1080/26939169.2021.1971586" xlink:type="simple">https://doi.org/10.1080/26939169.2021.1971586</ext-link></mixed-citation>
</ref>
<ref id="j_jds1206_ref_022">
<mixed-citation publication-type="journal"> <string-name><surname>Petersen</surname> <given-names>AH</given-names></string-name>, <string-name><surname>Ekstrøm</surname> <given-names>CT</given-names></string-name> (<year>2019</year>). <article-title>Datamaid: Your assistant for documenting supervised data quality screening in R</article-title>. <source><italic>Journal of Statistical Software</italic></source>, <volume>90</volume>: <fpage>1</fpage>–<lpage>38</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.18637/jss.v090.i06" xlink:type="simple">https://doi.org/10.18637/jss.v090.i06</ext-link></mixed-citation>
</ref>
<ref id="j_jds1206_ref_023">
<mixed-citation publication-type="journal"> <string-name><surname>Polyzotis</surname> <given-names>N</given-names></string-name>, <string-name><surname>Zinkevich</surname> <given-names>M</given-names></string-name>, <string-name><surname>Roy</surname> <given-names>S</given-names></string-name>, <string-name><surname>Breck</surname> <given-names>E</given-names></string-name>, <string-name><surname>Whang</surname> <given-names>S</given-names></string-name> (<year>2019</year>). <article-title>Data validation for machine learning</article-title>. <source><italic>Proceedings of Machine Learning and Systems</italic></source>, <volume>1</volume>: <fpage>334</fpage>–<lpage>347</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1206_ref_024">
<mixed-citation publication-type="book"> <string-name><surname>R Core Team</surname></string-name> (<year>2023</year>). <source><italic>R: A Language and Environment for Statistical Computing</italic></source>. <publisher-name>R Foundation for Statistical Computing</publisher-name>, <publisher-loc>Vienna, Austria</publisher-loc>. URL <uri>https://www.R-project.org/</uri>.</mixed-citation>
</ref>
<ref id="j_jds1206_ref_025">
<mixed-citation publication-type="journal"> <string-name><surname>Ruczinski</surname> <given-names>I</given-names></string-name>, <string-name><surname>Kooperberg</surname> <given-names>C</given-names></string-name>, <string-name><surname>LeBlanc</surname> <given-names>M</given-names></string-name> (<year>2003</year>). <article-title>Logic regression</article-title>. <source><italic>Journal of Computational and Graphical Statistics</italic></source>, <volume>12</volume>(<issue>3</issue>): <fpage>475</fpage>–<lpage>511</lpage>. <comment>ISSN 1061-8600</comment>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1198/1061860032238" xlink:type="simple">https://doi.org/10.1198/1061860032238</ext-link></mixed-citation>
</ref>
<ref id="j_jds1206_ref_026">
<mixed-citation publication-type="journal"> <string-name><surname>Samet</surname> <given-names>JM</given-names></string-name>, <string-name><surname>Dominici</surname> <given-names>F</given-names></string-name>, <string-name><surname>Curriero</surname> <given-names>FC</given-names></string-name>, <string-name><surname>Coursac</surname> <given-names>I</given-names></string-name>, <string-name><surname>Zeger</surname> <given-names>SL</given-names></string-name> (<year>2000</year>). <article-title>Fine particulate air pollution and mortality in 20 US cities, 1987–1994</article-title>. <source><italic>New England Journal of Medicine</italic></source>, <volume>343</volume>(<issue>24</issue>): <fpage>1742</fpage>–<lpage>1749</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1056/NEJM200012143432401" xlink:type="simple">https://doi.org/10.1056/NEJM200012143432401</ext-link></mixed-citation>
</ref>
<ref id="j_jds1206_ref_027">
<mixed-citation publication-type="journal"> <string-name><surname>Schelter</surname> <given-names>S</given-names></string-name>, <string-name><surname>Lange</surname> <given-names>D</given-names></string-name>, <string-name><surname>Schmidt</surname> <given-names>P</given-names></string-name>, <string-name><surname>Celikel</surname> <given-names>M</given-names></string-name>, <string-name><surname>Biessmann</surname> <given-names>F</given-names></string-name>, <string-name><surname>Grafberger</surname> <given-names>A</given-names></string-name> (<year>2018</year>). <article-title>Automating large-scale data quality verification</article-title>. <source><italic>Proceedings of the VLDB Endowment</italic></source>, <volume>11</volume>(<issue>12</issue>): <fpage>1781</fpage>–<lpage>1794</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.14778/3229863.3229867" xlink:type="simple">https://doi.org/10.14778/3229863.3229867</ext-link></mixed-citation>
</ref>
<ref id="j_jds1206_ref_028">
<mixed-citation publication-type="chapter"> <string-name><surname>Sidi</surname> <given-names>F</given-names></string-name>, <string-name><surname>Hassany Shariat Panahy</surname> <given-names>P</given-names></string-name>, <string-name><surname>Suriani Affendey</surname> <given-names>L</given-names></string-name>, <string-name><surname>Jabar</surname> <given-names>MA</given-names></string-name>, <string-name><surname>Ibrahim</surname> <given-names>H</given-names></string-name>, <string-name><surname>Mustapha</surname> <given-names>A</given-names></string-name> (<year>2012</year>). <chapter-title>Data quality: A survey of data quality dimensions</chapter-title>. In: <source><italic>2012 International Conference on Information Retrieval &amp; Knowledge Management</italic></source>, <publisher-loc>Kuala Lumpur, Malaysia</publisher-loc>, <fpage>300</fpage>–<lpage>304</lpage>. <publisher-name>IEEE</publisher-name>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/InfRKM.2012.6204995" xlink:type="simple">https://doi.org/10.1109/InfRKM.2012.6204995</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1206_ref_029">
<mixed-citation publication-type="journal"> <string-name><surname>Van der Loo</surname> <given-names>M</given-names></string-name>, <string-name><surname>de Jonge</surname> <given-names>E</given-names></string-name> (<year>2021</year>). <article-title>Data validation infrastructure for R</article-title>. <source><italic>Journal of Statistical Software</italic></source>, <volume>97</volume>: <fpage>1</fpage>–<lpage>33</lpage>. URL <uri>https://www.jstatsoft.org/article/view/v097i10</uri>.</mixed-citation>
</ref>
<ref id="j_jds1206_ref_030">
<mixed-citation publication-type="book"> <string-name><surname>Vesely</surname> <given-names>WE</given-names></string-name>, <string-name><surname>Goldberg</surname> <given-names>FF</given-names></string-name>, <string-name><surname>Roberts</surname> <given-names>NH</given-names></string-name>, <string-name><surname>Haasl</surname> <given-names>DF</given-names></string-name> (<year>1981</year>). <source><italic>Fault Tree Handbook</italic></source>, <series><italic>Technical report</italic></series>, <publisher-name>Nuclear Regulatory Commission Washington DC</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1206_ref_031">
<mixed-citation publication-type="journal"> <string-name><surname>Wang</surname> <given-names>RY</given-names></string-name>, <string-name><surname>Strong</surname> <given-names>DM</given-names></string-name> (<year>1996</year>). <article-title>Beyond accuracy: What data quality means to data consumers</article-title>. <source><italic>Journal of Management Information Systems</italic></source>, <volume>12</volume>(<issue>4</issue>): <fpage>5</fpage>–<lpage>33</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1080/07421222.1996.11518099" xlink:type="simple">https://doi.org/10.1080/07421222.1996.11518099</ext-link></mixed-citation>
</ref>
<ref id="j_jds1206_ref_032">
<mixed-citation publication-type="other"> <string-name><surname>Waring</surname> <given-names>E</given-names></string-name>, <string-name><surname>Quinn</surname> <given-names>M</given-names></string-name>, <string-name><surname>McNamara</surname> <given-names>A</given-names></string-name>, <string-name><surname>Arino de la Rubia</surname> <given-names>E</given-names></string-name>, <string-name><surname>Zhu</surname> <given-names>H</given-names></string-name>, <string-name><surname>Ellis</surname> <given-names>S</given-names></string-name> (<year>2022</year>). skimr: Compact and flexible summaries of data. URL <uri>https://CRAN.R-project.org/package=skimr</uri>. R package version 2.1.5.</mixed-citation>
</ref>
<ref id="j_jds1206_ref_033">
<mixed-citation publication-type="journal"> <string-name><surname>Welty</surname> <given-names>LJ</given-names></string-name>, <string-name><surname>Zeger</surname> <given-names>SL</given-names></string-name> (<year>2005</year>). <article-title>Are the acute effects of particulate matter on mortality in the national morbidity, mortality, and air pollution study the result of inadequate control for weather and season? A sensitivity analysis using flexible distributed lag models</article-title>. <source><italic>American Journal of Epidemiology</italic></source>, <volume>162</volume>(<issue>1</issue>): <fpage>80</fpage>–<lpage>88</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1093/aje/kwi157" xlink:type="simple">https://doi.org/10.1093/aje/kwi157</ext-link></mixed-citation>
</ref>
<ref id="j_jds1206_ref_034">
<mixed-citation publication-type="journal"> <string-name><surname>Wild</surname> <given-names>CJ</given-names></string-name>, <string-name><surname>Pfannkuch</surname> <given-names>M</given-names></string-name> (<year>1999</year>). <article-title>Statistical thinking in empirical enquiry</article-title>. <source><italic>International Statistical Review</italic></source>, <volume>67</volume>(<issue>3</issue>): <fpage>223</fpage>–<lpage>248</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1111/j.1751-5823.1999.tb00442.x" xlink:type="simple">https://doi.org/10.1111/j.1751-5823.1999.tb00442.x</ext-link></mixed-citation>
</ref>
<ref id="j_jds1206_ref_035">
<mixed-citation publication-type="journal"> <string-name><surname>Woodall</surname> <given-names>P</given-names></string-name>, <string-name><surname>Oberhofer</surname> <given-names>M</given-names></string-name>, <string-name><surname>Borek</surname> <given-names>A</given-names></string-name> (<year>2014</year>). <article-title>A classification of data quality assessment and improvement methods</article-title>. <source><italic>International Journal of Information Quality</italic></source> 16, <volume>3</volume>(<issue>4</issue>): <fpage>298</fpage>–<lpage>321</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1504/IJIQ.2014.068656" xlink:type="simple">https://doi.org/10.1504/IJIQ.2014.068656</ext-link></mixed-citation>
</ref>
<ref id="j_jds1206_ref_036">
<mixed-citation publication-type="book"> <string-name><surname>Yu</surname> <given-names>B</given-names></string-name>, <string-name><surname>Barter</surname> <given-names>RL</given-names></string-name> (<year>2024</year>). <source><italic>Veridical Data Science: The Practice of Responsible Data Analysis and Decision Making</italic></source>. <publisher-name>MIT Press</publisher-name>.</mixed-citation>
</ref>
</ref-list>
</back>
</article>
