Pub. online:19 Jan 2026Type:Computing In Data ScienceOpen Access
Journal:Journal of Data Science
Volume 24, Issue 2 (2026): Special Issue: The 2025 Symposium on Data Science and Statistics (SDSS 2025),, pp. 436–454
Abstract
Land use land cover (LULC) change in the agriculture, is a critical area of concern as it directly impacts food security, environmental health, and economic stability. One of the leading LULC data products is the U.S. Department of Agriculture’s (USDA) Cropland Data Layer (CDL). Produced annually by the USDA National Agricultural Statistics Service (NASS) using satellite imagery, the CDL provides crop-specific data with an estimated classification accuracy of 85% to 95% for major crop types across the U.S. However, several limitations inherent to the CDL, such as crop underestimation bias, pixel misclassification, and difficulty distinguishing certain vegetation types, have raised questions about the accuracy of LULC change estimates derived from this dataset. In this paper, we introduce the R package cdlsim, designed to quantify the sensitivity of CDL-derived metrics through simulations of CDL data at the patch level using NASS published accuracy statistics. We present a case study utilizing landscape metrics calculated with the popular landscapemetrics R package to demonstrate the utility of cdlsim in quantifying the sensitivity of metrics to random perturbations in the data. The case study examines a mixed agricultural and grassland landscape in South Dakota, illustrating how our package enables researchers to achieve a more nuanced representation of land-use change.
Pub. online:4 Aug 2022Type:Research ArticleOpen Access
Journal:Journal of Data Science
Volume 18, Issue 3 (2020): Special issue: Data Science in Action in Response to the Outbreak of COVID-19, pp. 536–549
Abstract
As the COVID-19 pandemic has strongly disrupted people’s daily work and life, a great amount of scientific research has been conducted to understand the key characteristics of this new epidemic. In this manuscript, we focus on four crucial epidemic metrics with regard to the COVID-19, namely the basic reproduction number, the incubation period, the serial interval and the epidemic doubling time. We collect relevant studies based on the COVID-19 data in China and conduct a meta-analysis to obtain pooled estimates on the four metrics. From the summary results, we conclude that the COVID-19 has stronger transmissibility than SARS, implying that stringent public health strategies are necessary.