<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">JDS</journal-id>
<journal-title-group><journal-title>Journal of Data Science</journal-title></journal-title-group>
<issn pub-type="epub">1683-8602</issn><issn pub-type="ppub">1680-743X</issn><issn-l>1680-743X</issn-l>
<publisher>
<publisher-name>School of Statistics, Renmin University of China</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">JDS1074</article-id>
<article-id pub-id-type="doi">10.6339/22-JDS1074</article-id>
<article-categories><subj-group subj-group-type="heading">
<subject>Computing in Data Science</subject></subj-group></article-categories>
<title-group>
<article-title>Vecchia Approximations and Optimization for Multivariate Matérn Models</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Fahmy</surname><given-names>Youssef</given-names></name><xref ref-type="aff" rid="j_jds1074_aff_001">1</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Guinness</surname><given-names>Joseph</given-names></name><email xlink:href="mailto:guinness@cornell.edu">guinness@cornell.edu</email><xref ref-type="aff" rid="j_jds1074_aff_001">1</xref><xref ref-type="corresp" rid="cor1">∗</xref>
</contrib>
<aff id="j_jds1074_aff_001"><label>1</label>Department of Statistics and Data Science, <institution>Cornell University</institution>, Ithaca, NY, <country>USA</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>∗</label>Corresponding author. Email: <ext-link ext-link-type="uri" xlink:href="mailto:guinness@cornell.edu">guinness@cornell.edu</ext-link>.</corresp>
</author-notes>
<pub-date pub-type="ppub"><year>2022</year></pub-date><pub-date pub-type="epub"><day>14</day><month>10</month><year>2022</year></pub-date><volume>20</volume><issue>4</issue><fpage>475</fpage><lpage>492</lpage><supplementary-material id="S1" content-type="archive" xlink:href="jds1074_s001.zip" mimetype="application" mime-subtype="x-zip-compressed">
<caption>
<title>Supplementary Material</title>
<p>The datasets and code used for this project can be found at <uri>https://github.com/yf297/GpGp_multi_paper</uri>.</p>
</caption>
</supplementary-material><history><date date-type="received"><day>1</day><month>8</month><year>2022</year></date><date date-type="accepted"><day>11</day><month>10</month><year>2022</year></date></history>
<permissions><copyright-statement>2022 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.</copyright-statement><copyright-year>2022</copyright-year>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
<abstract>
<p>We describe our implementation of the multivariate Matérn model for multivariate spatial datasets, using Vecchia’s approximation and a Fisher scoring optimization algorithm. We consider various pararameterizations for the multivariate Matérn that have been proposed in the literature for ensuring model validity, as well as an unconstrained model. A strength of our study is that the code is tested on many real-world multivariate spatial datasets. We use it to study the effect of ordering and conditioning in Vecchia’s approximation and the restrictions imposed by the various parameterizations. We also consider a model in which co-located nuggets are correlated across components and find that forcing this cross-component nugget correlation to be zero can have a serious impact on the other model parameters, so we suggest allowing cross-component correlation in co-located nugget terms.</p>
</abstract>
<kwd-group>
<label>Keywords</label>
<kwd>Gaussian process</kwd>
<kwd>Fisher scoring</kwd>
<kwd>software</kwd>
</kwd-group>
<funding-group><award-group><funding-source xlink:href="https://doi.org/10.13039/100000001">National Science Foundation</funding-source><award-id>ACI-1548562</award-id></award-group><award-group><funding-source xlink:href="https://doi.org/10.13039/100000121">National Science Foundation Division of Mathematical Sciences</funding-source><award-id>1916208</award-id><award-id>1953088</award-id></award-group><funding-statement>This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1548562. This work is supported by the National Science Foundation Division of Mathematical Sciences under grant numbers 1916208 and 1953088. </funding-statement></funding-group>
</article-meta>
</front>
<body/>
<back>
<ref-list id="j_jds1074_reflist_001">
<title>References</title>
<ref id="j_jds1074_ref_001">
<mixed-citation publication-type="other"> <string-name><surname>Abdulah</surname> <given-names>S</given-names></string-name>, <string-name><surname>Almari</surname> <given-names>F</given-names></string-name>, <string-name><surname>Nag</surname> <given-names>P</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Ltaief</surname> <given-names>H</given-names></string-name>, <string-name><surname>Keyes</surname> <given-names>DE</given-names></string-name>, <string-name><surname>Genton</surname> <given-names>MG</given-names></string-name> (2022). The second competition on spatial statistics for large datasets. <italic>Journal of Data Science</italic>. In Press.</mixed-citation>
</ref>
<ref id="j_jds1074_ref_002">
<mixed-citation publication-type="journal"> <string-name><surname>Abdulah</surname> <given-names>S</given-names></string-name>, <string-name><surname>Ltaief</surname> <given-names>H</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Genton</surname> <given-names>MG</given-names></string-name>, <string-name><surname>Keyes</surname> <given-names>DE</given-names></string-name> (<year>2018</year>). <article-title>Exageostat: a high performance unified software for geostatistics on manycore systems</article-title>. <source>IEEE Transactions on Parallel and Distributed Systems</source>, <volume>29</volume>(<issue>12</issue>): <fpage>2771</fpage>–<lpage>2784</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1074_ref_003">
<mixed-citation publication-type="journal"> <string-name><surname>Apanasovich</surname> <given-names>TV</given-names></string-name>, <string-name><surname>Genton</surname> <given-names>MG</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>Y</given-names></string-name> (<year>2012</year>). <article-title>A valid Matérn class of cross-covariance functions for multivariate random fields with any number of components</article-title>. <source>Journal of the American Statistical Association</source>, <volume>107</volume>(<issue>497</issue>): <fpage>180</fpage>–<lpage>193</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1074_ref_004">
<mixed-citation publication-type="book"> <string-name><surname>Bevilacqua</surname> <given-names>M</given-names></string-name>, <string-name><surname>Morales-Oñate</surname> <given-names>V</given-names></string-name>, <string-name><surname>Caamaño-Carrillo</surname> <given-names>C</given-names></string-name> (<year>2018</year>). <source>GeoModels: Procedures for Gaussian and Non Gaussian Geostatistical (Large) Data Analysis</source>. <comment>R package version 1.0.0</comment>.</mixed-citation>
</ref>
<ref id="j_jds1074_ref_005">
<mixed-citation publication-type="journal"> <string-name><surname>Eckel</surname> <given-names>FA</given-names></string-name>, <string-name><surname>Mass</surname> <given-names>CF</given-names></string-name> (<year>2005</year>). <article-title>Aspects of effective mesoscale, short-range ensemble forecasting</article-title>. <source>Weather and Forecasting</source>, <volume>20</volume>(<issue>3</issue>): <fpage>328</fpage>–<lpage>350</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1074_ref_006">
<mixed-citation publication-type="journal"> <string-name><surname>Emery</surname> <given-names>X</given-names></string-name>, <string-name><surname>Porcu</surname> <given-names>E</given-names></string-name>, <string-name><surname>White</surname> <given-names>P</given-names></string-name> (<year>2022</year>). <article-title>New validity conditions for the multivariate Matérn coregionalization model, with an application to exploration geochemistry</article-title>. <source>Mathematical Geosciences</source>, <volume>54</volume>(<issue>6</issue>): <fpage>1043</fpage>–<lpage>1068</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1074_ref_007">
<mixed-citation publication-type="other"> <string-name><surname>Finley</surname> <given-names>A</given-names></string-name>, <string-name><surname>Datta</surname> <given-names>A</given-names></string-name>, <string-name><surname>Banerjee</surname> <given-names>S</given-names></string-name> (2022). spNNGP: Spatial regression models for large datasets using nearest neighbor Gaussian processes. R package version 0.1.7.</mixed-citation>
</ref>
<ref id="j_jds1074_ref_008">
<mixed-citation publication-type="journal"> <string-name><surname>Finley</surname> <given-names>AO</given-names></string-name>, <string-name><surname>Banerjee</surname> <given-names>S</given-names></string-name>, <string-name><surname>EGelfand</surname> <given-names>A</given-names></string-name> (<year>2015</year>). <article-title>spBayes for large univariate and multivariate point-referenced spatio-temporal data models</article-title>. <source>Journal of Statistical Software</source>, <volume>63</volume>(<issue>13</issue>): <fpage>1</fpage>–<lpage>28</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1074_ref_009">
<mixed-citation publication-type="journal"> <string-name><surname>Genton</surname> <given-names>MG</given-names></string-name>, <string-name><surname>Kleiber</surname> <given-names>W</given-names></string-name> (<year>2015</year>). <article-title>Cross-covariance functions for multivariate geostatistics</article-title>. <source>Statistical Science</source>, <volume>30</volume>(<issue>2</issue>): <fpage>147</fpage>–<lpage>163</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1074_ref_010">
<mixed-citation publication-type="journal"> <string-name><surname>Gneiting</surname> <given-names>T</given-names></string-name>, <string-name><surname>Kleiber</surname> <given-names>W</given-names></string-name>, <string-name><surname>Schlather</surname> <given-names>M</given-names></string-name> (<year>2010</year>). <article-title>Matérn cross-covariance functions for multivariate random fields</article-title>. <source>Journal of the American Statistical Association</source>, <volume>105</volume>(<issue>491</issue>): <fpage>1167</fpage>–<lpage>1177</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1074_ref_011">
<mixed-citation publication-type="journal"> <string-name><surname>Guinness</surname> <given-names>J</given-names></string-name> (<year>2018</year>). <article-title>Permutation and grouping methods for sharpening Gaussian process approximations</article-title>. <source>Technometrics</source>, <volume>60</volume>(<issue>4</issue>): <fpage>415</fpage>–<lpage>429</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1074_ref_012">
<mixed-citation publication-type="journal"> <string-name><surname>Guinness</surname> <given-names>J</given-names></string-name> (<year>2021</year>). <article-title>Gaussian process learning via Fisher scoring of Vecchia’s approximation</article-title>. <source>Statistics and Computing</source>, <volume>31</volume>(<issue>3</issue>): <fpage>1</fpage>–<lpage>8</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1074_ref_013">
<mixed-citation publication-type="journal"> <string-name><surname>Guinness</surname> <given-names>J</given-names></string-name> (<year>2022</year>). <article-title>Nonparametric spectral methods for multivariate spatial and spatial–temporal data</article-title>. <source>Journal of Multivariate Analysis</source>, <volume>187</volume>: <fpage>104823</fpage>.</mixed-citation>
</ref>
<ref id="j_jds1074_ref_014">
<mixed-citation publication-type="other"> <string-name><surname>Guinness</surname> <given-names>J</given-names></string-name>, <string-name><surname>Katzfuss</surname> <given-names>M</given-names></string-name>, <string-name><surname>Fahmy</surname> <given-names>Y</given-names></string-name> (2021). GpGp: Fast Gaussian process computation using Vecchia’s approximation. R package version 0.4. 0.</mixed-citation>
</ref>
<ref id="j_jds1074_ref_015">
<mixed-citation publication-type="journal"> <string-name><surname>Huang</surname> <given-names>H</given-names></string-name>, <string-name><surname>Abdulah</surname> <given-names>S</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Ltaief</surname> <given-names>H</given-names></string-name>, <string-name><surname>Keyes</surname> <given-names>DE</given-names></string-name>, <string-name><surname>Genton</surname> <given-names>MG</given-names></string-name> (<year>2021</year>). <article-title>Competition on spatial statistics for large datasets</article-title>. <source>Journal of Agricultural, Biological, and Environmental Statistics</source>, <volume>26</volume>(<issue>4</issue>): <fpage>580</fpage>–<lpage>595</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1074_ref_016">
<mixed-citation publication-type="other"> <string-name><surname>Katzfuss</surname> <given-names>M</given-names></string-name>, <string-name><surname>Jurek</surname> <given-names>M</given-names></string-name>, <string-name><surname>Zilber</surname> <given-names>D</given-names></string-name>, <string-name><surname>Gong</surname> <given-names>W</given-names></string-name>, <string-name><surname>Guinness</surname> <given-names>J</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>J</given-names></string-name>, et al. (2020). GPvecchia: Scalable Gaussian-process approximations. R package version 0.1.3.</mixed-citation>
</ref>
<ref id="j_jds1074_ref_017">
<mixed-citation publication-type="other"> <string-name><surname>Kinniburgh</surname> <given-names>D</given-names></string-name>, <string-name><surname>Smedley</surname> <given-names>P</given-names></string-name> (2001). Arsenic contamination of groundwater in Bangladesh, <italic>British Geological Survey Technical Report WC/00/19</italic>.</mixed-citation>
</ref>
<ref id="j_jds1074_ref_018">
<mixed-citation publication-type="journal"> <string-name><surname>Kleiber</surname> <given-names>W</given-names></string-name> (<year>2017</year>). <article-title>Coherence for multivariate random fields</article-title>. <source>Statistica Sinica</source>, <volume>27</volume>(<issue>4</issue>): <fpage>1675</fpage>–<lpage>1697</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1074_ref_019">
<mixed-citation publication-type="journal"> <string-name><surname>Li</surname> <given-names>B</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>H</given-names></string-name> (<year>2011</year>). <article-title>An approach to modeling asymmetric multivariate spatial covariance structures</article-title>. <source>Journal of Multivariate Analysis</source>, <volume>102</volume>(<issue>10</issue>): <fpage>1445</fpage>–<lpage>1453</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1074_ref_020">
<mixed-citation publication-type="journal"> <string-name><surname>Lindgren</surname> <given-names>F</given-names></string-name>, <string-name><surname>Rue</surname> <given-names>H</given-names></string-name>, <string-name><surname>Lindström</surname> <given-names>J</given-names></string-name> (<year>2011</year>). <article-title>An explicit link between Gaussian fields and Gaussian Markov random fields: The stochastic partial differential equation approach</article-title>. <source>Journal of the Royal Statistical Society, Series B, Statistical Methodology</source>, <volume>73</volume>(<issue>4</issue>): <fpage>423</fpage>–<lpage>498</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1074_ref_021">
<mixed-citation publication-type="other"> <string-name><surname>Nychka</surname> <given-names>D</given-names></string-name>, <string-name><surname>Furrer</surname> <given-names>R</given-names></string-name>, <string-name><surname>Paige</surname> <given-names>J</given-names></string-name>, <string-name><surname>Sain</surname> <given-names>S</given-names></string-name> (2021). fields: Tools for spatial data. R package version 14.0.</mixed-citation>
</ref>
<ref id="j_jds1074_ref_022">
<mixed-citation publication-type="other"> <string-name><surname>Nychka</surname> <given-names>D</given-names></string-name>, <string-name><surname>Hammerling</surname> <given-names>D</given-names></string-name>, <string-name><surname>Sain</surname> <given-names>S</given-names></string-name>, <string-name><surname>Lenssen</surname> <given-names>N</given-names></string-name> (2016). LatticeKrig: Multiresolution Kriging based on Markov random fields. R package version 8.4.</mixed-citation>
</ref>
<ref id="j_jds1074_ref_023">
<mixed-citation publication-type="journal"> <string-name><surname>Pinheiro</surname> <given-names>JC</given-names></string-name>, <string-name><surname>Bates</surname> <given-names>DM</given-names></string-name> (<year>1996</year>). <article-title>Unconstrained parameterizations for variance-covariance matrices</article-title>. <source>Statistics and Computing</source>, <volume>6</volume>: <fpage>289</fpage>–<lpage>296</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1074_ref_024">
<mixed-citation publication-type="journal"> <string-name><surname>Qadir</surname> <given-names>GA</given-names></string-name>, <string-name><surname>Euán</surname> <given-names>C</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>Y</given-names></string-name> (<year>2021</year>). <article-title>Flexible modeling of variable asymmetries in cross-covariance functions for multivariate random fields</article-title>. <source>Journal of Agricultural, Biological, and Environmental Statistics</source>, <volume>26</volume>(<issue>1</issue>): <fpage>1</fpage>–<lpage>22</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1074_ref_025">
<mixed-citation publication-type="journal"> <string-name><surname>Qadir</surname> <given-names>GA</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>Y</given-names></string-name> (<year>2021</year>). <article-title>Semiparametric estimation of cross-covariance functions for multivariate random fields</article-title>. <source>Biometrics</source>, <volume>77</volume>(<issue>2</issue>): <fpage>547</fpage>–<lpage>560</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1074_ref_026">
<mixed-citation publication-type="journal"> <string-name><surname>Rue</surname> <given-names>H</given-names></string-name>, <string-name><surname>Martino</surname> <given-names>S</given-names></string-name>, <string-name><surname>Chopin</surname> <given-names>N</given-names></string-name> (<year>2009</year>). <article-title>Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations</article-title>. <source>Journal of the Royal Statistical Society, Series B, Statistical Methodology</source>, <volume>71</volume>(<issue>2</issue>): <fpage>319</fpage>–<lpage>392</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1074_ref_027">
<mixed-citation publication-type="journal"> <string-name><surname>Saby</surname> <given-names>N</given-names></string-name>, <string-name><surname>Thioulouse</surname> <given-names>J</given-names></string-name>, <string-name><surname>Jolivet</surname> <given-names>C</given-names></string-name>, <string-name><surname>Ratié</surname> <given-names>C</given-names></string-name>, <string-name><surname>Boulonne</surname> <given-names>L</given-names></string-name>, <string-name><surname>Bispo</surname> <given-names>A</given-names></string-name>, <etal>et al.</etal> (<year>2009</year>). <article-title>Multivariate analysis of the spatial patterns of 8 trace elements using the French soil monitoring network data</article-title>. <source>Science of the Total Environment</source>, <volume>407</volume>(<issue>21</issue>): <fpage>5644</fpage>–<lpage>5652</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1074_ref_028">
<mixed-citation publication-type="other"> <string-name><surname>Schlather</surname> <given-names>M</given-names></string-name>, <string-name><surname>Malinowski</surname> <given-names>A</given-names></string-name>, <string-name><surname>Oesting</surname> <given-names>M</given-names></string-name> (2022). RandomFields (archived on CRAN). R package version 3.3.14.</mixed-citation>
</ref>
<ref id="j_jds1074_ref_029">
<mixed-citation publication-type="journal"> <string-name><surname>Vecchia</surname> <given-names>AV</given-names></string-name> (<year>1988</year>). <article-title>Estimation and model identification for continuous spatial processes</article-title>. <source>Journal of the Royal Statistical Society, Series B, Methodological</source>, <volume>50</volume>(<issue>2</issue>): <fpage>297</fpage>–<lpage>312</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1074_ref_030">
<mixed-citation publication-type="journal"> <string-name><surname>Zammit-Mangion</surname> <given-names>A</given-names></string-name>, <string-name><surname>Cressie</surname> <given-names>N</given-names></string-name> (<year>2021</year>). <article-title>FRK: an R package for spatial and spatio-temporal prediction with large datasets</article-title>. <source>Journal of Statistical Software</source>, <volume>98</volume>(<issue>4</issue>): <fpage>1</fpage>–<lpage>48</lpage>.</mixed-citation>
</ref>
</ref-list>
</back>
</article>
