Discrete Extremes
Volume 22, Issue 4 (2024), pp. 524–536
Pub. online: 23 February 2024
Type: Statistical Data Science
Open Access
Received
28 January 2024
28 January 2024
Accepted
31 January 2024
31 January 2024
Published
23 February 2024
23 February 2024
Abstract
Our contribution is to widen the scope of extreme value analysis applied to discrete-valued data. Extreme values of a random variable are commonly modeled using the generalized Pareto distribution, a peak-over-threshold method that often gives good results in practice. When data is discrete, we propose two other methods using a discrete generalized Pareto and a generalized Zipf distribution respectively. Both are theoretically motivated and we show that they perform well in estimating rare events in several simulated and real data cases such as word frequency, tornado outbreaks and multiple births.
Supplementary material
Supplementary MaterialThe data and code supporting this article are available in the GitHub repository at https://github.com/adhi1000/discrete_extremes. This archive includes the file Simulated Data.R , which details the simulation study discussed in Section 3, and the files Word Frequency.R , Tornado.R and Multiple Birth.R , which replicate the real data analysis presented in Section 4.
References
Ahmad T, Gaetan C, Naveau P (2022). Modelling of discrete extremes through extended versions of discrete generalized Pareto distribution. arXiv preprint: https://arxiv.org/abs/2210.15253.
Anderson CW (1970). Extreme value theory for a class of discrete distributions with applications to some stochastic processes. Journal of Applied Probability, 7: 99–113. https://doi.org/10.2307/3212152
Anderson CW (1980). Local limit theorems for the maxima of discrete random variables. Mathematical Proceedings of the Cambridge Philosophical Society, 88(1): 161–165. https://doi.org/10.1017/S0305004100057443
Arnold TB, Emerson JW (2011). Nonparametric goodness-of-fit tests for discrete null distributions. The R Journal, 3(2): 34–39. https://doi.org/10.32614/RJ-2011-016
Axtell RL (2001). Zipf distribution of US firm sizes. Science, 293: 1818–1820. https://doi.org/10.1126/science.1062081
Booth AD (1967). A law of occurrences for words of low frequency. Information and Control, 10(4): 386–393. https://doi.org/10.1016/S0019-9958(67)90201-X
Buddana A, Kozubowski TJ (2014). Discrete Pareto distributions. Economic Quality Control, 29(2): 143–156. https://doi.org/10.1515/eqc-2014-0014
Charpentier A, Flachaire E (2019). Extended scale-free networks. arXiv preprint: https://arxiv.org/abs/1905.10267.
Clauset A, Shalizi CR, Newman MEJ (2009). Power-law distributions in empirical data. SIAM Review, 51(4): 661–703. https://doi.org/10.1137/070710111
Dkengne PS, Eckert N, Naveau P (2016). A limiting distribution for maxima of discrete stationary triangular arrays with an application to risk due to avalanches. Extremes, 19(1): 25–40. https://doi.org/10.1007/s10687-015-0234-0
Evin G, Sielenou PD, Eckert N, Naveau P, Hagenmuller P, Morin S (2021). Extreme avalanche cycles: Return levels and probability distributions depending on snow and meteorological conditions. Weather and Climate Extremes, 33: 100344. https://doi.org/10.1016/j.wace.2021.100344
Gabaix X (1999). Zipf’s law and the growth of cities. American Economic Review, 89(2): 129–132. https://doi.org/10.1257/aer.89.2.129
Koh J (2023). Gradient boosting with extreme-value theory for wildfire prediction. Extremes, 26: 273–299. https://doi.org/10.1007/s10687-022-00454-6
Koutsoyiannis D (2023). Knowable moments in stochastics: Knowing their advantages. Axioms, 12(6): 590. https://doi.org/10.3390/axioms12060590
Kozubowski TJ, Panorska AK, Forister ML (2015). A discrete truncated Pareto distribution. Statistical Methodology, 26: 135–150. https://doi.org/10.1016/j.stamet.2015.04.002
Krishna H, Pundir PS (2009). Discrete Burr and discrete Pareto distributions. Statistical Methodology, 6(2): 177–188. https://doi.org/10.1016/j.stamet.2008.07.001
New B, Pallier C, Brysbaert M, Ferrand L (2004). Lexique 2: A new French lexical database. Behavior Research Methods, Instruments, & Computers, 36(3): 516–524. https://doi.org/10.3758/BF03195598
Patel L, Shand L, Tucker JD, Huerta G (2021). Spatio-temporal extreme event modeling of terror insurgencies. arXiv preprint: https://arxiv.org/abs/2110.08363.
Prieto F, Gómez-Déniz E, Sarabia JM (2014). Modelling road accident blackspots data with the discrete generalized Pareto distribution. Accident Analysis and Prevention, 71: 38–49. https://doi.org/10.1016/j.aap.2014.05.005
Ranjbar S, Cantoni E, Chavez-Demoulin V, Marra G, Radice R, Jaton K (2022). Modelling the extremes of seasonal viruses and hospital congestion: The example of flu in a swiss hospital. Journal of the Royal Statistical Society. Series C. Applied Statistics, 71(4): 884–905. https://doi.org/10.1111/rssc.12559
Shimura T (2012). Discretization of distributions in the maximum domain of attraction. Extremes, 15(3): 299–317. https://doi.org/10.1007/s10687-011-0137-7
Tippett MK, Lepore C, Cohen JE (2016). More tornadoes in the most extreme US tornado outbreaks. Science, 354(6318): 1419–1423. https://doi.org/10.1126/science.aah7393
Valiquette S, Toulemonde G, Peyhardi J, É Marchand, Mortier F (2023). Asymptotic tail properties of poisson mixture distributions. Stat, 12(1): e622. https://doi.org/10.1002/sta4.622