The Second Competition on Spatial Statistics for Large Datasets
Volume 20, Issue 4 (2022): Special Issue: Large-Scale Spatial Data Science, pp. 439–460
Pub. online: 8 November 2022
Type: Statistical Data Science
Open Access
Received
14 August 2022
14 August 2022
Accepted
29 October 2022
29 October 2022
Published
8 November 2022
8 November 2022
Abstract
In the last few decades, the size of spatial and spatio-temporal datasets in many research areas has rapidly increased with the development of data collection technologies. As a result, classical statistical methods in spatial statistics are facing computational challenges. For example, the kriging predictor in geostatistics becomes prohibitive on traditional hardware architectures for large datasets as it requires high computing power and memory footprint when dealing with large dense matrix operations. Over the years, various approximation methods have been proposed to address such computational issues, however, the community lacks a holistic process to assess their approximation efficiency. To provide a fair assessment, in 2021, we organized the first competition on spatial statistics for large datasets, generated by our ExaGeoStat software, and asked participants to report the results of estimation and prediction. Thanks to its widely acknowledged success and at the request of many participants, we organized the second competition in 2022 focusing on predictions for more complex spatial and spatio-temporal processes, including univariate nonstationary spatial processes, univariate stationary space-time processes, and bivariate stationary spatial processes. In this paper, we describe in detail the data generation procedure and make the valuable datasets publicly available for a wider adoption. Then, we review the submitted methods from fourteen teams worldwide, analyze the competition outcomes, and assess the performance of each team.
Supplementary material
Supplementary MaterialIn the Supplementary Material, we list the members of all the teams participating in this competition in Table S1. Moreover, Tables S2 to S11 summarize the RMSE values obtained by different teams in each dataset of different sub-competitions, as well as those obtained with ExaGeoStat for reference purpose.