Pub. online:19 Jan 2026Type:Computing In Data ScienceOpen Access
Journal:Journal of Data Science
Volume 24, Issue 2 (2026): Special Issue: The 2025 Symposium on Data Science and Statistics (SDSS 2025),, pp. 436–454
Abstract
Land use land cover (LULC) change in the agriculture, is a critical area of concern as it directly impacts food security, environmental health, and economic stability. One of the leading LULC data products is the U.S. Department of Agriculture’s (USDA) Cropland Data Layer (CDL). Produced annually by the USDA National Agricultural Statistics Service (NASS) using satellite imagery, the CDL provides crop-specific data with an estimated classification accuracy of 85% to 95% for major crop types across the U.S. However, several limitations inherent to the CDL, such as crop underestimation bias, pixel misclassification, and difficulty distinguishing certain vegetation types, have raised questions about the accuracy of LULC change estimates derived from this dataset. In this paper, we introduce the R package cdlsim, designed to quantify the sensitivity of CDL-derived metrics through simulations of CDL data at the patch level using NASS published accuracy statistics. We present a case study utilizing landscape metrics calculated with the popular landscapemetrics R package to demonstrate the utility of cdlsim in quantifying the sensitivity of metrics to random perturbations in the data. The case study examines a mixed agricultural and grassland landscape in South Dakota, illustrating how our package enables researchers to achieve a more nuanced representation of land-use change.
Abstract: In the natural history of Human Immunodeficiency Virus Type-1 (HIV-1) infection, many studies included the participants who were seropos itive at time of enrollment. Estimation of the unknown times since exposure to HIV-1 in the prevalent cohorts is of primary importance for estimation of the incubation period of Acquired Immunodeficiency Syndrome (AIDS). To estimate incubation period of AIDS we used prior distribution of incubation times, based on a external data as suggested by Bacchetti and Jewell (1991, Biometrics, 47,947-960). In the present study, our estimate was nonpara metric based on a method proposed by Wang, Jewell and Tsai (1986, Annals of Statistics, 14, 1597-1605).
Pub. online:7 May 2021Type:Statistical Data ScienceOpen Access
Journal:Journal of Data Science
Volume 19, Issue 2 (2021): Special issue: Continued Data Science Contributions to COVID-19 Pandemic, pp. 253–268
Abstract
Following the outbreak of COVID-19, various containment measures have been taken, including the use of quarantine. At present, the quarantine period is the same for everyone, since it is implicitly assumed that the incubation period distribution of COVID-19 is the same regardless of age or gender. For testing the effects of age and gender on the incubation period of COVID-19, a novel two-component mixture regression model is proposed. An expectation-maximization (EM) algorithm is adopted to obtain estimates of the parameters of interest, and the simulation results show that the proposed method outperforms the simple regression method and has robustness. The proposed method is applied to a Zhejiang COVID-19 dataset, and it is found that age and gender statistically have no effect on the incubation period of COVID-19, which indicates that the quarantine measure currently in operation is reasonable.