Satellite precipitation products have the potential to be employed for the purpose of better understanding extreme precipitation events in remote mountainous terrain, where weather stations and radar data tend to be sparse. For this reason, it is crucial to assess how closely satellite estimates agree with ground observations during extreme events, and how that agreement varies across such regions. We use asymptotic dependence from multivariate extreme value theory as the primary tool in this study. After presenting two measures of asymptotic dependence and their associated estimators, we illustrate these ideas using simulated data. We then model the level of asymptotic dependence between PERSIANN-CDR and SNOTEL station data over the US Northern Rocky Mountains. We consider both asymptotic dependence estimators, and based on hypothesis tests and visual diagnostics, both estimates of asymptotic dependence indicate positive spatial dependence. We also investigate whether geographical factors influence the levels of asymptotic dependence over this region. Using a spatial correlation analysis, we find that elevation is negatively correlated with both asymptotic dependence estimators and average summer temperature is positively correlated with both asymptotic dependence estimators. However, we did not find any geographical covariates to be statistically significant in the model.
Our contribution is to widen the scope of extreme value analysis applied to discrete-valued data. Extreme values of a random variable are commonly modeled using the generalized Pareto distribution, a peak-over-threshold method that often gives good results in practice. When data is discrete, we propose two other methods using a discrete generalized Pareto and a generalized Zipf distribution respectively. Both are theoretically motivated and we show that they perform well in estimating rare events in several simulated and real data cases such as word frequency, tornado outbreaks and multiple births.
Pub. online:25 Jan 2023Type:Statistical Data ScienceOpen Access
Journal:Journal of Data Science
Volume 21, Issue 2 (2023): Special Issue: Symposium Data Science and Statistics 2022, pp. 368–390
Abstract
The potential weight of accumulated snow on the roof of a structure has long been an important consideration in structure design. However, the historical approach of modeling the weight of snow on structures is incompatible for structures with surfaces and geometry where snow is expected to slide off of the structure, such as standalone solar panels. This paper proposes a “storm-level” adaptation of previous structure-related snow studies that is designed to estimate short-term, rather than season-long, accumulations of the snow water equivalent or snow load. One key development associated with this paper includes a climate-driven random forests model to impute missing snow water equivalent values at stations that measure only snow depth in order to produce continuous snow load records. Additionally, the paper compares six different approaches of extreme value estimation on short-term snow accumulations. The results of this study indicate that, when considering the 50-year mean recurrence interval (MRI) for short-term snow accumulations across different weather station types, the traditional block maxima approach, the mean-adjusted quantile method with a gamma distribution approach, and the peak over threshold Bayesian approach tend to most often provide MRI estimates near the median of all six approaches considered in this study. Further, this paper also shows, via bootstrap simulation, that the peak over threshold extreme value estimation using automatic threshold selection approaches tend to have higher variance compared to the other approaches considered. The results suggest that there is no one-size-fits-all option for extreme value estimation of short-term snow accumulations, but highlights the potential value from integrating multiple extreme value estimation approaches.
In this work, we introduce a new distribution for modeling the extreme values. Some important mathematical properties of the new model are derived. We assess the performance of the maximum likelihood method in terms of biases and mean squared errors by means of a simulation study. The new model is better than some other important competitive models in modeling the repair times data and the breaking stress data.