The Exponentiated Generalized Class of Distributions

Abstract: We propose a new method of adding two parameters to a continuous distribution that extends the idea first introduced by Lehmann (1953) and studied by Nadarajah and Kotz (2006). This method leads to a new class of exponentiated generalized distributions that can be interpreted as a double construction of Lehmann alternatives. Some special models are discussed. We derive some mathematical properties of this class including the ordinary moments, generating function, mean deviations and order statistics. Maximum likelihood estimation is investigated and four applications to real data are presented.

1. Introduction Gupta et al. (1998) first proposed a generalization of the standard exponential distribution, called the exponentiated exponential (EE) distribution, defined by the cumulative distribution function (cdf) F (x) = (1 − e −λx ) α for x > 0, λ > 0 and α > 0. This equation is simply the αth power of the standard exponential cumulative distribution.For a full discussion and some of its mathematical properties, see Gupta and Kundu (2001).In a similar manner, Nadarajah and Kotz (2006) proposed the exponentiated gamma (EΓ), exponentiated Fréchet (EF) and exponentiated Gumbel (EGu) distributions, although the way they defined the cdf of the last two distributions is slightly different.For instance, the EGu cumulative distribution (for −∞ < x < ∞) is defined by where µ ∈ R and σ > 0.
In this article, we propose a new class of distributions that extend the exponentiated type distributions and obtain some of its structural properties.Given a continuous cdf G(x), we define the exponentiated generalized (EG) class of distributions by where α > 0 and β > 0 are two additional shape parameters.We note that there is no complicated function in (1) in contrast with the beta generalized family (Eugene et al., 2002), which also includes two extra parameters but involves the beta incomplete function.
(1) has tractable properties especially for simulations, since its quantile function takes a simple form, namely where Q G (u) is the baseline quantile function.
The baseline distribution G(x) is clearly a special case of (1) when α = β = 1.Setting α = 1 gives the exponentiated type distributions defined by Gupta et al. (1998).Further, the EE and EΓ distributions are obtained by taking G(x) to be the exponential and gamma cumulative distributions, respectively.For β = 1 and if G(x) is the Gumbel and Fréchet cumulative distributions, we obtain the EGu and EF distributions, respectively, as defined by Nadarajah and Kotz (2006).Thus, the class of distributions (1) extends both exponentiated type distributions.The probability density function (pdf) of the new class has the form (2) The EG family of densities (2) allows for greater flexibility of its tails and can be widely applied in many areas of engineering and biology.We study some mathematical properties of the class (2) because it extends several well-known distributions in the literature.Note that even if g(x) is a symmetric distribution, the distribution f (x) will not be a symmetric distribution.The two extra parameters in (2) can control both tail weights and possibly adding entropy to the center of the EG density function.
Hereafter, we define the exponentiated-G ("Exp-G" for short) distribution for an arbitrary parent distribution G(x), say X ∼ Exp c G, if X has cumulative and density functions given by H c (x) = G(x) c and h c (x) = c g(x) G(x) c−1 , respectively.This is also called the Lehmann type I distribution, say Exp c (G).For c > 1 and c < 1 and for larger values of x, the multiplicative factor c G(x) c−1 is greater and smaller than one, respectively.The reverse assertion is also true for smaller values of x.The latter immediately implies that the ordinary moments associated with the density function h c (x) are strictly larger (smaller) than those associated with the density g(x) when c > 1 (c < 1).
Note that there is a dual transformation Exp c (1 − G), referred to as the Lehmann type II distribution corresponding to the cdf Thus, (1) encompasses both Lehmann type I (Exp β G for α = 1) and Lehmann type II (Exp α (1 − G) for β = 1) distributions (Lehmann, 1953).Clearly, the double construction Exp β [Exp α (1 − G)] generates the EG class of distributions.The derivations of several properties of the EG class can be facilitated by this double transformation.
The class of EG distributions shares an attractive physical interpretation whenever α and β are positive integers.Consider a device made of β independent components in a parallel system.Furthermore, each component is made of α independent subcomponents identically distributed according to G(x) in a series system.The device fails if all β components fail and each component fails if any subcomponent fails.Let X j1 , • • • , X jα denote the lifetimes of the subcomponents within the jth component, j = 1, • • • , β, with common cdf G(x).Let X j denote the lifetime of the jth component and let X denote the lifetime of the device.Thus, the cdf of X is So, the lifetime of the device obeys the EG family of distributions.
The rest of the article is organized as follows.In Section 2, we present four special models of the EG class corresponding to the Fréchet, normal, gamma and Gumbel distributions.Section 3 provides some general useful expansions for the EG density function.Moments of the EG class are derived in Section 4 with applications to eight special models.Generating function and mean deviations are derived in Sections 5 and 6, respectively.Order statistics are studied in Section 7. Maximum likelihood estimation is investigated in Section 8. Applications to four real data sets are performed in Section 9. Some concluding remarks are given in Section 10.

Special Models
Here, we discuss some special EG distributions.The density function (2) will be most tractable when the cdf G(x) and the pdf g(x) have simple analytic expressions.

Exponentiated Generalized Normal
Let Φ(•) and φ(•) denote the standard normal cumulative and density functions, respectively.The exponentiated generalized normal (EGN) cumulative distribution is where x ∈ R, µ ∈ R is a location parameter, σ > 0 is a scale parameter, α > 0 and β > 0. The EGN density function becomes Plots of the EGN density function for some parameter values are given in Figure 2.

Exponentiated Generalized Gamma
The gamma cumulative distribution (for x > 0) with shape parameter a > 0 and scale parameter b > 0 is G a,b (x) = γ(a, bx)/Γ(a), where Γ(a) = ∞ 0 w a−1 e −w dw is the gamma function and γ(a, x) = x 0 w a−1 e −w dw is the incomplete gamma function.The exponentiated generalized gamma (EGGa) cumulative distribution becomes and the associated density function reduces to Plots of the density function (7) for selected parameter values are given in Figure 3.

Exponentiated Generalized Gumbel
The Gumbel cumulative distribution with parameters µ ∈ R and σ > 0 is G µ,σ (x) = exp{− exp(−(x − µ)/σ)}.The exponentiated generalized Gumbel (EGGu) cumulative distribution becomes and the corresponding pdf reduces to Plots of this density function for selected parameter values are given in Figure 4.

Expansions for the Density Function
For any real non-integer β, we consider the power series expansion which is valid for |z| < 1. Applying ( 10) in (1) and using the binomial expansion for a positive real power yields where the coefficients w j = w j (α, β) are (11) gives the generated cdf F (x) distribution as an infinite power series of the parent G(x).Using again the series expansion (10), we can express (2) (for α real non-integer) as where the coefficients t j = t j (α, β) are Further, (12) can be rewritten as where t j = α β t j /(k + 1) and h j+1 (x) = (j + 1) g(x) G(x) j is the Exp j+1 (G) distribution.
(13) reveals that the EG density function is a linear combination of Exp-G density functions.Thus, some structural properties of the EG class of distributions, such as ordinary and incomplete moments and generating function, can be obtained from well-established properties of the Exp-G distributions.

Moments
Hereafter, we shall assume that G is the cdf of a random variable X and that F is the cdf of a random variable Y having density function (2).The moments of the EG distribution can be obtained from the (r, j)th probability weighted moment (PWM) of X defined by In fact, we have Thus, the moments of any EG distribution can be expressed as an infinite weighted sum of PWMs of the parent distribution.A second formula for τ r,j can be based on the parent quantile function where the integral is now calculated over (0, 1).The PWMs for various distributions will be determined in the following sections using alternatively ( 14) and ( 16).

EGF
The expansion for (3) reduces to and G σ * ,λ (x) is the Fréchet cumulative function with parameters σ * and λ.This equation reveals that the EGF cumulative function can be expressed as an infinite mixture of Fréchet cdf's.Correspondingly, the EGF density function follows a similar mixture where g σ * ,λ (x) = dG σ * ,λ (x)/dx.The (r, j)th PWM of the Fréchet distribution is Setting u = (j + 1) (σ/x) λ , τ r,j reduces to The integral converges absolutely for r < λ.Finally, for r < λ,

EGN
The moments of X ∼ N (µ, σ) can be obtained from the moments of Z ∼ N (0, 1) using E(X r ) = r k=0 µ r−t σ r E(Z r ), and then we can work with the standard normal distribution.Consider the error function erf(•) defined by From equation we can expand the EGN cumulative function (4) (with µ = 0 and σ = 1) as From the series expansion for the error function erf(•) we obtain a series expansion for (5) (with µ = 0 and σ = 1) given by From Cordeiro and Nadarajah (2011, (11)), the normal PWMs can be expressed in terms of the Lauricella functions of type A (Exton, 1978;Aarts, 2000) defined by is the ascending factorial given by (with the convention that (a) 0 = 1).The (r, j)th PWM of the normal distribution is This equation holds when r +j −l is even and it vanishes when r +j −l is odd.So, any EGN moment can be expressed as an infinite weighted linear combination of Lauricella functions of type A.

EGG
Using the power series expansion for the incomplete gamma function we obtain the following series expansion for ( 7) respectively.From (9) of Cordeiro and Nadarajah (2011), the quantities τ r,j can be determined as

EGGu
The expansion for the density function (9) becomes where g µ * ,σ (x) is the Gumbel density function with parameters µ * = µ − σ log(j) and σ.So, the EGGu density function can be expressed as an infinite mixture of Gumbel densities.The PWMs of the Gumbel distribution are Using the binomial expansion, we have By (2.6.21.1) in Prudnikov et al. (1986), the integral becomes Finally,

Exponentiated generalized exponential
For the exponential cumulative function and then the moments of the exponential generalized exponential (EGE) distribution can be readily determined from (15).

Exponentiated Generalized Beta
The cdf of the beta distribution (for 0 < x < 1) is where a > 0 and b > 0. The exponentiated generalized beta (EGB) cumulative distribution is F (x) = [1−{1−I x (a, b)} α ] β and the corresponding density function becomes Using the incomplete beta power series for real non-integer b > 0 we can obtain an expansion for f (x) as follows The PWMs of the beta distribution can be expressed in terms of the generalized Kampé de Fériet function (Exton, 1978;Mathai, 1993;Aarts, 2000;Chaudhry and Zubair, 2002).They are given by (Cordeiro and Nadarajah, 2011) where the generalized Kampé de Fériet function is defined by Hence, the EGB moments can be expressed as an infinite weighted linear combination of generalized Kampé de Fériet functions.

Generating Function
Here, we provide three formulae for the moment generating function (mgf) Clearly, the first one is simply where µ r = E(Y r ) is obtained from (15).A second formula for M (s) comes from (13) as where M j+1 (s) is the generating function of the Exp j+1 (G) distribution.Hence, M (s) can be determined from the Exp-G generating function.
A third formula for M (s) can be derived from (12) as where the quantity We can derive the mgf of several EG distributions directly from ( 20) and ( 21).For example, the mgf's of the EGE (with parameter λ), EGL and EGPa (with parameter ν > 0) distributions are given by Clearly, three representations for the characteristic function (chf) φ(s) = E[exp(i s X)] of the EG distributions are derived from ( 18)-( 20) by φ(s) = M (i s), where i = √ −1.

Mean Deviations
The mean deviations about the mean (δ respectively, where Now, we provide two alternative ways to compute δ 1 (Y ) and δ 2 (Y ).A general equation for m 1 (z) can be derived from (13) as where (24) is the basic quantity to compute the mean deviations for the GE distributions.The mean deviations ( 22) depend only on the first incomplete moment of the Exp-G distributions.So, alternative representations for δ 1 (Y ) and δ 2 (Y ) are A simple application of ( 23) and ( 24) refers to the exponentiated generalized Weibull (EGW) distribution.The exponentiated Weibull density function (for x > 0) with power parameter j + 1, shape parameter c and scale parameter β is and then The last integral reduces to the incomplete gamma function and then (−1) r (j + 1) j r (r + 1) 1+1/c γ(1 + c −1 , (r + 1)(βz) c ).
A second general formula for m 1 (z) can be derived by setting u = G(x) in ( 13) where T j (z) is given by For example, we can obtain the mean deviations of the EGE (with parameter λ), EGL and EGPa (with parameter ν > 0) distributions from ( 25)-( 26).The quantities T j (z) can be derived from the following integrals (for a > 0) using Maple and Mathematica a 1 By changing variables and using the binomial expansion and these integrals, we obtain and for the EGE, EGL and EGPa distributions, respectively.Applications of these equations can be conducted to obtain Bonferroni and Lorenz curves defined for a given probability π by B(π) = m 1 (q)/π µ 1 and L(π) = m 1 (q)/µ 1 , respectively, where q = Q G ([1 − (1 − π 1/β ) 1/α ]) is immediately calculated from the parent quantile function.

Order Statistics
The density f i:n (x) of the ith order statistic, for i Substituting ( 1) and ( 2) in this equation, we can write Using the binomial expansion, f i:n (x) can be expressed as For β real non-integer, by applying (10) to the last term, we obtain which can be rewritten as where the coefficients s l can be calculated as (27) can be rewritten in terms of the Exp-G density functions as (28) is the main result of this section.So, several mathematical properties of the EG order statistics (like ordinary and incomplete moments, generating function, mean deviations) can be obtained from those properties of the Exp-G distribution as shown previously for the EG distribution.

Maximum Likelihood Estimation
We determine the maximum likelihood estimates (MLEs) of the parameters of the EG distribution from complete samples only.Let x 1 , • • • , x n be a random sample of size n from the EG(α, β, γ) distribution, where γ is a p1 vector of unknown parameters in the parent distribution G(x; γ).The log-likelihood function for the vector of parameters θ = (α, β, γ T ) T can be expressed as The log-likelihood can be maximized either directly by using the SAS (Proc NLMixed) or the MaxBFGS routine in the matrix programming language Ox (see, Doornik, 2007) or by solving the nonlinear likelihood equations obtained by differentiating ( 29).The components of the score vector U (θ) are For interval estimation and hypothesis tests on the model parameters, we require the (p + 2) × (p + 2) observed information matrix J = J(θ) given in the Appendix.Under conditions that are fulfilled for parameters in the interior of the parameter space but not on the boundary, the asymptotic distribution of √ n( θ − θ) is N (p+2) (0, I(θ) −1 ), where I(θ) is the expected information matrix.In practice, we can replace I(θ) by the observed information matrix evaluated at θ (say J( θ)).We can construct approximate confidence intervals and confidence regions for the parameters based on the multivariate normal N (p+2) (0, J( θ) −1 ) distribution.The elements of the observed information matrix are in the Appendix.
Further, the likelihood ratio (LR) statistic can be used for comparing the EG distribution with some of its special models.We can compute the maximum values of the unrestricted and restricted log-likelihoods to construct LR statistics for testing some sub-models of the EG distribution.For example, the test of H 0 : α = 1 versus H : H 0 is not true is equivalent to compare the EG and exponentiated type distributions and the LR statistic reduces to where α, β and γ are the MLEs under H and β and γ are the estimates under H 0 .

Applications
In this section, we use four real data sets to compare the fits of the EG distribution with those of three sub-models, i.e., the two exponentiated type distributions and the parent distribution itself.In each case, the parameters are estimated by maximum likelihood (Section 8) using the subroutine NLMixed in SAS.First, we describe the data sets and give the MLEs (and the corresponding standard errors in parentheses) of the parameters and the values of the Akaike Information Criterion (AIC), Consistent Akaike Information Criterion (CAIC) and Bayesian Information Criterion (BIC) statistics.The lower the values of these criteria, the better the fit.Note that over-parameterization is penalized in these criteria, so that the two additional parameters in the EG model do not necessarily lead to smaller values of the AIC, BIC or CAIC statistics.Next, we perform LR tests (Section 8) for formal tests of the additional shape parameters.Finally, we provide histograms of the data sets to show a visual comparison of the fitted density functions.(i) Ethylene data These data were taken from a study by the University of São Paulo, ESALQ (Laboratory of Physiology and Post-colheita Biochemistry), which evaluated the effects of mechanical damage on banana fruits (genus Musa spp.); see Saavedra del Aguila et al. (2010) for more details.The major problem affecting bananas during and after harvest is the susceptibility of the mature fruit to physical damage caused during transport and marketing.Ethylene is a plant hormone important in post-harvest fruit.A high ethylene production can generate a fast senescence of fruit.We use 630 data points on ethylene and assume a normal parent distribution.(ii) Wheaton River data The data are the exceedances of flood peaks (in m 3 /s) of the Wheaton River near Carcross in Yukon Territory, Canada.The data consist of 72 exceedances for the years 1958-1984 rounded to one decimal place.These data were analyzed by Akinsete et al. (2008).
(iii) Stress level data The following 101 data points represent the stress-rupture life of 49 kevlar epoxy strands, which were subjected to constant sustained pressure at the 90% stress level until all had failed, so that we have complete data with exact times of failure.The failure times in hours are given by Cooray and Ananda (2008).(iv) Carbon data These uncensored data on the breaking stress of carbon fibres (in Gba) are ob-tained from Nichols and Padgett (2006).
Table 1 gives a descriptive summary of each sample.The carbon data has negative skewness.The ethylene, Wheaton River and stress level data have positive skewness and kurtosis, larger values of these sample moments being apparent in the ethylene data.We compute the MLEs and the AIC, BIC and CAIC information criteria for each data set.For the ethylene data, we compare the fitted EGN model (with parameters α, β and γ = (µ, σ) T ) with the fitted exponentiated normal (EN), Lehmann II normal (LIIN) and normal distributions.The MLEs of µ and σ for the normal distribution are taking as starting values for the numerical iterative procedure.For the wheaton River data, we compare the fitted EGGu model (with parameters α, β and γ = (µ, σ) T ) with the fitted EGu (exponentiated Gumbel), Lehmann II Gumbel (LIIGu) and Gumbel distributions.The MLEs of µ and σ for the Gumbel distribution are taking as starting values for the iterative procedure.For the stress level data, we compare the EGF model (with parameters α, β and γ = (λ, σ) T ) with the fitted exponentiated Fréchet (EF), Lehmann II Fréchet (LIIF) and Fréchet distributions.The MLEs of λ and σ for the Fréchet distribution are taking as starting values for the iterative procedure.Further, for the carbon data, the fitted EGGa distribution (with parameters α, β and γ = (a, b) T ) is compared with the fitted EGa, Lehmann II gamma (LIIGa) and gamma models.Here, the MLEs of a and b for the gamma distribution are taking as starting values.
The results are reported in Table 2.Note that the three information criteria agree on the model's ranking in every case.For the ethylene, Wheaton River and stress level data, the lowest values of the information criteria correspond to the fitted EG distribution.Clearly, for the carbon data, based on the values of these statistics, we can conclude that the top two models are the EGGa and LIIGa distributions and the other distributions are far worse.
A formal test for the need of the extra parameters in the EG models can be performed using LR statistics (Section 8).Applying these LR tests to our four data sets yield the results in Table 3.For the carbon data, the additional parameters of the EGGa distribution are not, in fact, necessary because the LR  tests provide no indications against the LIIGa distribution.However, for the ethylene, Wheaton River and stress level data sets, we reject the null hypotheses in all three LR tests in favor of the new distributions.The rejection is extremely highly significant for the ethylene and stress level data, and highly or very highly significant for the Wheaton River data.This gives clear evidence of the potential need for the extra parameters in the proposed model when modelling real data.
In order to assess if the model is appropriate, the histograms of the data, the plots of the fitted EGN, EN, LIIN, EGGu, EGu, LIIGu, EGF, EF, LIIF, EGGa, EGa, LIIGa, normal, Gumbel, Fréchet and gamma density functions are displayed in Figure 5.We can conclude that the EG distributions are very suitable to these data.

Concluding Remarks
We propose a new class of exponentiated generalized (EG) distributions which includes as special cases the two classes of Lehmann's (1953) alternatives.The EG class extends several common distributions studied recently such as the exponentiated exponential, exponentiated Weibull, exponentiated gamma, exponentiated Fréchet and exponentiated Gumbel distributions (see, for example, Mudholkar and Srivastava, 1993;Gupta et al., 1998;Gupta and Kundu, 2001;Nadarajah and Kotz, 2006), among several others.Indeed, for any baseline distribution, we Ethylene data Wheaton River data can easily define the corresponding EG distribution.The quantile, moments, generating function and mean deviations of some generated distributions have tractable mathematical properties.Some of these properties are readily obtained from those of the exponentiated and baseline distributions.For example, the moments of the EG distribution can be expressed explicitly in terms of an infinite sum of probability weighted moments of the baseline G distribution.The same happens for the moments of their order statistics.We discuss maximum likelihood estimation and inference on the parameters based on likelihood ratio statistics for testing nested models.Four applications of the new class of distributions to real data are given to show the feasibility of our proposal.We hope this generalization may attract wider applications in statistics.Some suggestions and directions for future research on the new class of models include simulation studies, asymptotic properties of the maximum likelihood estimates and performance and comparison of the Bayesian, bootstrap and Jackknife methods for estimation of the model parameters.Finally, we can also define regression models for the logarithmic of the random variable of the EG class of distributions.

Figure 5 :
Figure 5: Estimated densities of the EG models for the analysed data sets

Table 2 :
MLEs and information criteria