Heavy Tailed Pareto Distribution: Properties and Applications

In this article, we introduce a class of distributions that have heavy tails as compared to Pareto distribution of third kind, which we termed as Heavy Tailed Pareto (HP) distribution. Various structural properties of the new distribution are derived. It is shown that HP distribution is in the domain of attraction of minimum of Weibull distribution. A representation of HP distribution in terms of Weibull random variable is obtained. Two characterizations of HP distribution are obtained. The method of maximum likelihood is used for estimation of model parameters and simulation results are presented to assess the performance of new model. Marshall-Olkin Heavy Tailed Pareto (MOHP) distribution is also introduced and some of its properties are studied. It is shown that MOHP distribution is geometric extreme stable. An autoregressive time series model with the new model as marginal distribution is developed and its properties are studied.


Introduction
Data with heavy tails have been studied by various researchers in different areas such as economics, finance, reliability, telecommunications, high speed network traffic, hydrology, insurance, linguistics, physics, biology, etc. The emergence of the Internet and World Wide Web gave a new incentive to the study of heavy-tailed distributions, in literature, due to their omnipresence in internet packet and flow data, the topological sector of the Web, size of computer files, etc. The tail dependent nature in the modelling and analysis of heavy-tailed phenomena sheds a new light in the literature. The shape of tail of heavy-tailed distributions resembles the hyperbolic shape of the Pareto distribution as a first approximation, which is characterized by the so-called tail index. The tail parameter plays a major role in modelling as it governs the thickness of tail of the distribution.
The Pareto distribution, also known as distribution of income, is a very popular heavy tailed distribution that has wider applications in different contexts. The distribution arises as tractable model in actuarial science, economics, finance, life testing, reliability, survival analysis and engineering. The family of Pareto distributions is very prominent in the literature for its competence in modelling heavy-tailed distributions. Arnold (2015) discussed various properties and applications of Pareto distribution and its extensions.
The survival function of classical Pareto distribution, denoted by P I (α, β) and the Pareto distribution of third kind, denoted by P III (λ), are respectively given bȳ andḠ (x; λ) = 1 1 + x λ ; x > 0, λ > 0. (2) The Pareto distribution was first proposed in literature as a model to explain the allocation of income among individuals. Later, various forms of the Pareto distribution have been formulated for modelling and analysis of data from engineering, environment, geology, hydrology, actuarial science, telecommunications, reliability, risk modelling, etc. These diverse applications of Pareto distribution lead researchers to develop different generalizations of Pareto distribution. Even though various forms of Pareto distribution such as generalized Pareto distribution by Pickands III (1975), Marshall-Olkin Pareto distribution by Thomas and Jose (2003), Pareto positive-stable by Sarabia and Prieto (2009), Kumaraswamy Pareto distribution by Bourguignon et al. (2013), transmuted Pareto (TP) distribution by Merovci and Puka (2014), Pareto ArcTan distribution by Gómez-Déniz and Calderín-Ojeda (2015), New Generalized Pareto distribution by Jayakumar et al. (2020) and so on, appeared in the literature for modelling heavy tailed data, there may arise lack of distributions for exact data fitting in many cases.
Despite the importance of Pareto distribution in data modelling, a comprehensive study on distributions heavier than Pareto distribution in the context of heavy-tailed distributions has not been considered so far. Motivated by this fact, we introduce here, a new distribution whose tail is heavier than that of Pareto distribution. The introduced distribution is a new heavy-tailed distribution that compete with Pareto distribution of third kind. Our study reveals that the new distribution is a good model to model data sets where extreme observations are significant and to model lifetime data sets.
With regard to the application for modelling time series data with Pareto marginals, Yeh et al. (1988) defined Pareto processes and discussed the applications of their model in income analysis. Pillai (1991) and Pillai et al. (1995) generalized this process in terms of semi-Pareto random variables. Balakrishna and Jayakumar (1997) introduced a bivariate minification process of the first order with semi-Pareto distribution. As the moment generating function of Pareto distribution is not in closed form, we developed here, minification time series models with our new distributions as marginal.
The paper is organized as follows. In Section 2, we introduce the Heavy Tailed Pareto distribution and discuss the shapes of density function and distribution function. Reliability properties and structural properties of the new distribution are also investigated. In Section 3, the maximum likelihood estimation of the model parameters are discussed and simulation is carried out to assess the performance of the proposed maximum likelihood estimation method. In Section 4, Marshall-Olkin Heavy Tailed Pareto (MOHP) distribution is introduced and some properties are studied. Some representations and characterizations are obtained in Section 5. A first order autoregressive minification process with MOHP distribution as marginal is developed and studied in Section 6. We conclude the article in Section 7 with some remarks on the main results and their significance.

Heavy Tailed Pareto Distribution
In this section, we introduce a new distribution with survival function where log is the natural logarithm. It is obvious that lim x→∞ e −θx F (x;λ) = 0, for any θ > 0 and so the function (3) represents the survival function of a heavy tailed distribution. Also, it can be easily verified that (3) has heavier tail than the Pareto distribution of third kind. Hence we refer to the distribution with survival function (3) as the Heavy Tailed Pareto distribution with parameter λ, denoted by HP(λ).
The cumulative distribution function (cdf) of HP(λ) is given by The probability density function (pdf) of HP(λ) is From equations (2) and (3), we can see thatF (x; λ) ≥Ḡ(x; λ) and hence if X ∼ HP (λ) and Y ∼ P III (λ), then X is stochastically larger than Y. Jayakumar and Ajitha (2003) showed that φ(λ) = 1 1+log(1+λ α ) is the Laplace transform of the probability distribution having the cumulative distribution function Hence, is a distribution with complete monotone derivative (c.m.d.), since φ(x) is completely monotone with φ(0) = 1 and φ(∞) = 0. Therefore, by Pillai and Sandhya (1990), F (x) is a mixture of exponentials. Also F (x) is geometrically infinitely divisible (g.i.d.) and hence infinitely divisible (i.d.). Thus the HP distribution we have introduced is a distribution with c.m.d. and hence is a mixture of exponentials and is g.i.d. For the applications of g.i.d. distribution and distribution with c.m.d., see Klebanov et al. (1985)and Aly and Bouzar (2000). Figure 1 gives a comparison between pdf of HP distribution and pdf of Pareto distribution of third kind for different values of λ. From the graphs of Figure 1, it is very clear that the distribution with pdf (5) is heavy tailed than Pareto distribution.

Shapes of pdf and cdf
Graphs of pdf and cdf for various values of parameter λ are presented in Figure 2 and Figure 3 respectively.

Hazard Rate Function
The hazard rate function of HP(λ) is .
Graph of hazard rate function for various values of λ is presented in Figure 4. From Figure 4, it is very clear that the hazard rate function of HP is decreasing when λ ≤ 1 and hence HP distribution belongs to the class of DFR distributions, if λ ≤ 1.
When λ > 1, the hazard rate function behaves differently in two regions: 1. For x ≤ x 0 , the hazard rate function is increasing.
From the hazard rate functions given by (6) and (7), we obtain h HP (x; λ) ≤ h P III (x; λ) and it is shown in Figure 5. Hence , at the same age, the unit whose life distribution is HP distribution is less likely to instantaneously perish than the one whose life distribution is Pareto of third kind.

Reverse Hazard Rate Function
Here we consider the reverse hazard rate which is useful in constructing the information matrix and in estimating the survival function for censored data. The reverse hazard rate function of HP(λ) is given by .

Mean Residual Life Function
An important ageing property in reliability theory is the mean residual life (MRL) function which is defined simply as the expected value of the remaining lifetime beyond an age x.
The MRL function of X ∼ HP (λ) is given by An estimate of MRL function can be obtained using the ordered data. Let X (1) ≤ X (2) ≤ ... ≤ X (n) be the ordered HP data, then the MRL function is easily estimated at x = X (n−k) , for some k = 1, 2, ..., n − 1 by the empirical average excess of the k data points higher than X (n−k) and is given bym

Log-Odds Rate
Hence a distribution is HP distribution if and only if LO(x) = log(log(1 + x λ )). The log-odds rate (LOR) is defined as where h(.) is the hazard rate function.

Structural Properties
The statistical properties of the HP distribution including moments, random number generation, median, quantile function, entropies and distribution of order statistics are presented in this section.

Moments
The r th moment of X ∼ HP (λ) is given by by substituting x λ = t. Again, by letting log(1 + t) = u, the above integral becomes As the above integral is not easy to evaluate, we present in Table 1, the behavior of mean, variance, skewness and kurtosis for selected values of parameter λ, using R software, when the upper limit of the integral is finite. From Table 1, we can see that when λ increases, then the mean also increases but the skewness and kurtosis decreases. The value of variance appears to decrease first and then increases when λ > 2.

Random Number Generation
A random sample X from HP(λ) distribution can be simulated as

Quantile Function
The q th quantile of the HP distribution is given by In particular, the median is M = (e − 1) 1 λ .

Rényi Entropy and Shannon Entropy
Entropy is the measure of variation or the uncertainty of a random variable X for the probability density function from the lifetime distribution. The Rényi entropy of a random variable X with pdf f (.) is defined as The Rényi Entropy of HP(λ) is (1 + t) η (1 + log(1 + t)) 2η dt, by letting x λ = t. The Shannon entropy of X ∼ HP (λ) is given by

Order Statistics
Let X 1 , X 2 , ..., X n be a random sample of size n from HP(λ) and let X (1) , X (2) , ..., X (n) denote the corresponding order statistics. Then, the pdf of i th order statistic is Hence the pdf of the largest order statistic, X (n) , is and the pdf of the smallest order statistic, X (1) , is

Estimation of Parameters
In this section, we consider the method of maximum likelihood for the estimation of parameter of HP distribution.
Let x 1 , x 2 , ..., x n be a random sample of size n from this distribution with unknown parameter λ. Then, the likelihood function is so that the log-likelihood function becomes log(1 + log(1 + x λ i )).
The first order derivative of log-likelihood function with respect to the parameter λ is given by The maximum likelihood estimator of the parameter λ is obtained by solving the equation, d log dλ = 0. The above equation (10) cannot be solved analytically and hence statistical software can be used to solve them numerically by means of iterative techniques such as the Newton-Raphson algorithm.
Letλ be the estimate of parameter. The asymptotic confidence interval for the parameter λ is obtained by considering the Fisher information matrix.
The Fisher information matrix corresponding to HP(λ) distribution is given by Using asymptotic normality results, the asymptotic confidence interval can be obtained as √ n(λ− λ) −→ N (0, I −1 (λ)). We can easily show that the HP family satisfies all the regularity conditions for λ > 0 and hence the above equation (11) holds, where I(λ) can be estimated by I −1 (λ). Also, the 95% confidence interval for the parameter λ is given bŷ

Simulation
We conduct Monte Carlo simulation study to assess the performance of maximum likelihood estimation procedure for estimating the HP parameter. Samples of sizes n = 50, 100, 200, 500 and 1000 are generated from the HP model, for different values of parameter λ. We repeated the simulation 1000 times and calculated the MLEs and mean squared errors (MSEs) of the parameter estimate. The empirical results are listed in Table 2. We note that the MSE of the parameter λ decrease as the sample size increases. The mean estimates of the parameter tend to be closer to the true parameter values. Marshall and Olkin (1997) introduced a general method to incorporate a shape parameter in the base random variable X and have shown the flexibility of transformed model by the introduction of the shape parameter. IfḠ(x) is the survival function of a continuous random variable X, then

Marshall-Olkin Heavy Tailed Pareto Distribution
is a proper survival function. The introduction of new parameter α will make the distribution with survival functionF (x) more flexible than the base distribution having survival function G(x), especially in data modelling. Sankaran and Jayakumar (2008) discussed the physical interpretation of Marshall-Olkin family of distributions using proportional odds model. The Marshall and Olkin (1997) extended class of distributions offer a wide range of behaviour than the basic distributions from which they are derived.
If X is a random variable defined on [0, ∞) with survival function (3), then, using the survival function defined by Marshall and Olkin (1997), we obtain a new family of distributions with survival functionF We refer this distribution as Marshall-Olkin Heavy Tailed Pareto distribution, denoted by MOHP(α, λ). The corresponding probability density function is defined as Note that the MOHP distribution is a class of distributions that contains HP distribution and is a viable competitor of the heavy-tailed Pareto distribution. Plots of pdf of MOHP distribution for different values of parameters α and λ are presented in Figure 6.

Properties
The hazard rate function of MOHP(α, λ) distribution is given by From the hazard rate of HP distribution and MOHP distribution, it follows that Also, it can be easily verified that h HP (x) Plot of hazard rate function of MOHP(α, λ) for various values of α and λ is presented in Figure 7.
The quantile function of X, where X ∼ M OHP (α, λ) is obtained by inverting the cdf of M OHP (α, λ) to obtain x u = F −1 (y) as Simulating the MOHP random variable is straightforward.
In particular, the median is given by The maximum likelihood estimates of the parameters α and λ of MOHP distribution are obtained by solving the equations, log(1 + x λ i ) α 2 (1 + 1 α log(1 + x λ i )) = 0.

Transformations and Characterizations
The relation between HP distribution and Pareto(III) distribution is established in the following theorem.
Proof. If X ∼ HP (λ) and Y = (log(1 + X λ )) 1 β , then by using (5) , the pdf of Y is given by If we define W n = min(X 1 , X 2 , ..., X n ) where X i 's are independent and identically distributed (i.i.d) random variables with distribution function F (x), then the distribution function of W n is given by The distribution F (x) is said to belong to the domain of attraction for minimum of a given cumulative distribution function H(x), if there exist sequences {a n } and {b n } where a n , b n > 0 are constants depending on n such that the limiting distribution lim n→∞ H n (a n + b n x) = H(x), for every x.
The next theorem proves the result related to HP and Weibull distributions with regard to the domain of attraction for minimum.
Theorem 2. The Heavy tailed Pareto distribution is in the domain of attraction of minimum of the Weibull distribution.
Proof. If a n = 0, b n = n − 1 λ > 0, then from equations (15) and (16), we have, which is the cdf of Weibull distribution. Now we establish the relationship between Weibull and Heavy Tailed Pareto distribution in the following theorem.
Theorem 3. If X and Y are independent random variables such that X has pdf and Y follows Weibull distribution with survival function e −y λ ; y > 0, λ > 0, then X − 1 λ Y follows Heavy Tailed Pareto distribution.
Proof. If U = X − 1 λ Y , then its survival function is given bȳ Since the Laplace transform of X with pdf g(x) is L(t) = 1 1+log(1+t) , see Pillai (1991), we have, The following theorem gives a characterization property of HP distribution which is similar to the characterization property satisfied by Pareto (III) distribution.
Theorem 4. A random variable X ∼ HP (λ) if and only if it satisfies the functional equation Proof. The proof is straightforward and hence is omitted. Now we give some results related to MOHP distribution.
Theorem 5. Marshall-Olkin Heavy tailed Pareto distribution is geometric extreme stable.
Then the survival function of U N is given bȳ .
That is, U N ∼ M OHP (αp, λ). Hence MOHP distribution is geometric minimum stable. Similarly, we can establish that MOHP distribution is geometric maximum stable. So we conclude that the MOHP distribution is geometric extreme stable.
The next theorem provides a characterization result of the MOHP distribution.
Theorem 6. Let {X i , i ≥ 1} be a sequence of i.i.d random variables with common survival functionH and N be a geometric random variable with pmf (17), which is independent of X i 's. Let U N = min(X 1 , X 2 , ..., X N ). Then {U N } is distributed as MOHP (p, λ) if and only if {X i } is distributed as HP (λ).
Proof. LetF (x) be the survival function of U N . Then .
, which is the survival function of MOHP (p, λ). This proves the sufficiency part of the theorem and converse of the theorem is straightforward.

Autoregressive Minification Process
In this section, we develop a first order autoregressive (AR(1)) minification process with MOHP distribution as marginal distribution.
That is,F Assuming stationarity,F . .

Conclusion
We have presented a new heavy tailed distribution called Heavy Tailed Pareto distribution that is suitable for applications in various areas such as finance, economics, reliability, survival analysis, actuarial science etc. We have derived explicit expressions for the ordinary moments, median, quantile function, Rényi entropy and Shannon entropy, and the distribution of order statistics. The model parameters are estimated by method of maximum likelihood and conducted Monte Carlo simulations to study the performance of the maximum likelihood estimates of the parameters. The estimates obtained from this method are close to the true parameter values. The Marshall-Olkin Heavy Tailed Pareto distribution was also introduced and studied some of its properties. The role of new distributions in the study of autoregressive minification process was also established and studied some of its properties.