Journal of Data Science logo


Login Register

  1. Home
  2. Issues
  3. Volume 22, Issue 4 (2024)
  4. Discrete Extremes

Journal of Data Science

Submit your article Information
  • Article info
  • Related articles
  • More
    Article info Related articles

Discrete Extremes
Volume 22, Issue 4 (2024), pp. 524–536
Adrien S. Hitz   Richard A. Davis   Gennady Samorodnitsky  

Authors

 
Placeholder
https://doi.org/10.6339/24-JDS1120
Pub. online: 23 February 2024      Type: Statistical Data Science      Open accessOpen Access

Received
28 January 2024
Accepted
31 January 2024
Published
23 February 2024

Abstract

Our contribution is to widen the scope of extreme value analysis applied to discrete-valued data. Extreme values of a random variable are commonly modeled using the generalized Pareto distribution, a peak-over-threshold method that often gives good results in practice. When data is discrete, we propose two other methods using a discrete generalized Pareto and a generalized Zipf distribution respectively. Both are theoretically motivated and we show that they perform well in estimating rare events in several simulated and real data cases such as word frequency, tornado outbreaks and multiple births.

Supplementary material

 Supplementary Material
The data and code supporting this article are available in the GitHub repository at https://github.com/adhi1000/discrete_extremes. This archive includes the file Simulated Data.R, which details the simulation study discussed in Section 3, and the files Word Frequency.R, Tornado.R and Multiple Birth.R, which replicate the real data analysis presented in Section 4.

References

 
Ahmad T, Gaetan C, Naveau P (2022). Modelling of discrete extremes through extended versions of discrete generalized Pareto distribution. arXiv preprint: https://arxiv.org/abs/2210.15253.
 
Anderson CW (1970). Extreme value theory for a class of discrete distributions with applications to some stochastic processes. Journal of Applied Probability, 7: 99–113. https://doi.org/10.2307/3212152
 
Anderson CW (1980). Local limit theorems for the maxima of discrete random variables. Mathematical Proceedings of the Cambridge Philosophical Society, 88(1): 161–165. https://doi.org/10.1017/S0305004100057443
 
Arnold BC (1983). Pareto Distribution. International Cooperative Publishing House, Maryland.
 
Arnold TB, Emerson JW (2011). Nonparametric goodness-of-fit tests for discrete null distributions. The R Journal, 3(2): 34–39. https://doi.org/10.32614/RJ-2011-016
 
Axtell RL (2001). Zipf distribution of US firm sizes. Science, 293: 1818–1820. https://doi.org/10.1126/science.1062081
 
Bingham NH, Goldie CM, Teugles JL (1989). Regular Variation. Cambridge University Press.
 
Booth AD (1967). A law of occurrences for words of low frequency. Information and Control, 10(4): 386–393. https://doi.org/10.1016/S0019-9958(67)90201-X
 
Buddana A, Kozubowski TJ (2014). Discrete Pareto distributions. Economic Quality Control, 29(2): 143–156. https://doi.org/10.1515/eqc-2014-0014
 
Charpentier A, Flachaire E (2019). Extended scale-free networks. arXiv preprint: https://arxiv.org/abs/1905.10267.
 
Clauset A, Shalizi CR, Newman MEJ (2009). Power-law distributions in empirical data. SIAM Review, 51(4): 661–703. https://doi.org/10.1137/070710111
 
Davison AC, Smith RL (1990). Models for exceedances over high thresholds. Journal of the Royal Statistical Society, Series B, Methodological, 52(3): 393–442.
 
Dkengne PS, Eckert N, Naveau P (2016). A limiting distribution for maxima of discrete stationary triangular arrays with an application to risk due to avalanches. Extremes, 19(1): 25–40. https://doi.org/10.1007/s10687-015-0234-0
 
Embrechts P, Klüppelberg C, Mikosch T (2013). Modelling Extremal Events: For Insurance and Finance. Springer, New York.
 
Evin G, Sielenou PD, Eckert N, Naveau P, Hagenmuller P, Morin S (2021). Extreme avalanche cycles: Return levels and probability distributions depending on snow and meteorological conditions. Weather and Climate Extremes, 33: 100344. https://doi.org/10.1016/j.wace.2021.100344
 
Gabaix X (1999). Zipf’s law and the growth of cities. American Economic Review, 89(2): 129–132. https://doi.org/10.1257/aer.89.2.129
 
Ghosh I, Alzaatreh A, Hamedani G (2023). A new class of discrete distribution arising as an analogue of gamma-lomax distribution: Properties and applications. In: G Families of Probability Distributions, 181.
 
Hamilton BE, Martin JA, Osterman MJ, Curtin MA, Mathews TJ (2015). Births: Final data for 2014. National Vital Statistics Reports, 64(1): 12.
 
Hitz AS (2016). Modelling of extremes, Ph.D. thesis, University of Oxford.
 
Koh J (2023). Gradient boosting with extreme-value theory for wildfire prediction. Extremes, 26: 273–299. https://doi.org/10.1007/s10687-022-00454-6
 
Koutsoyiannis D (2023). Knowable moments in stochastics: Knowing their advantages. Axioms, 12(6): 590. https://doi.org/10.3390/axioms12060590
 
Kozubowski TJ, Panorska AK, Forister ML (2015). A discrete truncated Pareto distribution. Statistical Methodology, 26: 135–150. https://doi.org/10.1016/j.stamet.2015.04.002
 
Krishna H, Pundir PS (2009). Discrete Burr and discrete Pareto distributions. Statistical Methodology, 6(2): 177–188. https://doi.org/10.1016/j.stamet.2008.07.001
 
Mandelbrot B (1953). Contribution à la théorie mathématique des jeux de communication. Publications de l’Institut de statistique de l’Université de Paris.
 
New B, Pallier C, Brysbaert M, Ferrand L (2004). Lexique 2: A new French lexical database. Behavior Research Methods, Instruments, & Computers, 36(3): 516–524. https://doi.org/10.3758/BF03195598
 
Patel L, Shand L, Tucker JD, Huerta G (2021). Spatio-temporal extreme event modeling of terror insurgencies. arXiv preprint: https://arxiv.org/abs/2110.08363.
 
Pickands J III (1975). Statistical inference using extreme order statistics. The Annals of Statistics, 3: 119–131.
 
Prieto F, Gómez-Déniz E, Sarabia JM (2014). Modelling road accident blackspots data with the discrete generalized Pareto distribution. Accident Analysis and Prevention, 71: 38–49. https://doi.org/10.1016/j.aap.2014.05.005
 
R Core Team (2024). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing.
 
Ranjbar S, Cantoni E, Chavez-Demoulin V, Marra G, Radice R, Jaton K (2022). Modelling the extremes of seasonal viruses and hospital congestion: The example of flu in a swiss hospital. Journal of the Royal Statistical Society. Series C. Applied Statistics, 71(4): 884–905. https://doi.org/10.1111/rssc.12559
 
Resnick SI (1987). Extreme Values, Regular Variation, and Point Processes. Springer, New York.
 
Shimura T (2012). Discretization of distributions in the maximum domain of attraction. Extremes, 15(3): 299–317. https://doi.org/10.1007/s10687-011-0137-7
 
Tippett MK, Lepore C, Cohen JE (2016). More tornadoes in the most extreme US tornado outbreaks. Science, 354(6318): 1419–1423. https://doi.org/10.1126/science.aah7393
 
Valiquette S, Toulemonde G, Peyhardi J, É Marchand, Mortier F (2023). Asymptotic tail properties of poisson mixture distributions. Stat, 12(1): e622. https://doi.org/10.1002/sta4.622

Related articles PDF XML
Related articles PDF XML

Copyright
2024 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
count data discrete distribution extreme value theory generalized Pareto distribution peaks over threshold tail approximation Zipf distribution

Funding
The first author is grateful to the Berrow Foundation for financial support. This research was partially supported by the ARO grant W911NF-12-10385.

Metrics
since February 2021
925

Article info
views

373

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy