Generating correlated random vector by Johnson system

: This paper aims to generate multivariate random vector with prescribed correlation matrix by Johnson system. The probability weighted moment (PWM) is employed to assess the parameters of Johnson system. By equating the ﬁrst four PWMs of Johnson system with those of the target distribution, a system of equations solved for the parameters is established. With suitable initial values, solutions to the equations are obtained by the Newton iteration procedure. To allow for the generation of random vector with prescribed correlation matrix, approaches to accommodate the dependency are put forward. For the four transformation models of Johnson system, nine cases are addressed. Analytical formulae are derived to determine the equivalent correlation coeﬃcient in the standard normal space for six cases, the rest three ones are handled by an interpolation method. Finally, several numerical examples are given out to check the proposed method.


Introduction
A general problem in stochastic simulation modeling entails representing data with the underlying distribution unknown, and generating random vector with desired correlation matrix and marginal distributions. To achieve this goal, several empirical distributions have been proposed, such as the generalized lambda distribution (GLD) (Ramberg & Schmeiser, 1974), the power transformation (Fleishman, 1978), and the Johnson system (Johnson, 1949). The related works have been summarized by Tadikamalla (1980). Among these families of distributions, the Johnson system stands out for its generality to simulate various probability distributions and the potential ability to model the correlated random variables.
The Johnson system comprises four models: where z is a standard normal variable. γ and δ are parameters determined by the skewness and kurtosis, ξ and λ are parameters determined by the mean and the standard deviation respectively. S N is the normal distribution with mean µ z and standard deviation σ z . S L represents the Lognormal family. S U denotes the unbounded family, that is, the variation range of x is infinite. S B is the bounded family, handling x with a boundary. These four models together cover all types of unimodal probability distributions.
To utilize the Johnson system to fit data or simulate distribution, it is necessary to select an appropriate model and to evaluate the parameters. Based on the skewness and kurtosis, a suitable model for the target random variable can be determined. The central problem is to estimate the parameters. The standardized central moment matching method (Hill et al., 1976) and the percentile method (Slifker & Shapiro, 1980) are examples of various approaches developed for this purpose. This paper focuses on the moment matching method.
Since the statistical information of a random variable is characterized by its statistical moments, the underlying principle for the moment matching method is to relate the moments of the corresponding transformation model to those of the target random variable. The parameters of Johnson system would be obtained by solving the system of nonlinear equations established. In general, the central moment or raw moment is involved, leading to complicated equations. Recently, L-moments have been widely used as good alternatives to conventional moments, which are implemented to characterize the GLD (Karvanen & Nuutinen, 2008) and the power transformation (Headrick, 2011), and much simpler procedure for parameter estimation is achieved. Since L-moments are defined as linear combinations of probability weighted moment (PWM) (Hosking, 1990), it is worthwhile to attempt to perform the moment matching in terms of PWM directly, which may further simplify the problem.
On the issue of simulating correlated random vector, relatively little research has been conducted, which may be due to the tedious and difficult computations for evaluating the correlation coefficient in the standard normal space. When modeling correlated multivariate distributions with samples available, the John-son system is workable. By normalizing each sample value via the associated model in Eq.(1), the correlation matrix in the standard normal space can be calculated directly. However, random vectors with prescribed correlation matrix cannot be generated in this way. To circumvent the computational complexity, Stanfield et al. (1996) develop an alternative approach. Yet, inadequacies are found both theoretically and practically. This method fails to match the kurtosis for some distributions and is unfeasible for a near singular correlation matrix.
In this paper, PWM is employed to characterize the Johnson system. By setting the first four PWMs of Johnson system equal to those of the simulated random variable, the system of equations solved for the parameters is established. With suitable initial values, the solutions are obtained by Newton iteration method. Furthermore, approaches to accommodate the dependency are also put forward. For cases related to S N , S L , S U and case of S N − S B , six analytical formulae are derived to evaluate the equivalent correlation coefficient in the standard normal space. The other three cases involving S L − S B , S U − S B and S B − S B are handled by an interpolation method.

Model selection
With the Lognormal distribution being the boundary, the skewness and kurtosis of a random variable are used for the model selection. Let κ 3 , κ 4 denote the skewness and kurtosis respectively. For normal distribution S N , it follows that κ 3 = 0 and κ 4 = 3. In the case of Lognormal distribution, the following relationships are satisfied (Johnson, 1949): Suppose x is a non-normal random variable, solving the equation in Eq.
(2) for ω > 0, the model is chosen as follows: Since the S L family is actually the Lognormal distribution, the parameters of S L model can be easily obtained.
where µ is the mean of x, σ is the standard deviation of x.
3. Evaluating the parameters of S U and S B models

Probability weighted moment
The PWM of a random variable x is defined as (Greenwood et al., 1979): where F (·) is the cumulative distribution function (CDF) of x. A particular type of PWM, β r = M 1,r,0 , is considered: where f (·) is the probability density function (PDF) of x.
For the sake of computational convenience, Johnson system is rewritten in an inverse form: where µ z and σ z are the mean and the standard deviation of the normal variable respectively. This paper pays attention to evaluate the parameters of S U and S B models.

S U model
The β r of the S U model is: where Φ(·) is the CDF of the standard normal random variable, ϕ(·) is the PDF. Equating the first four PWMs with those of the target random variable x, four equations with four unknowns (ξ, λ, γ, δ) are established. Denote the equations as follows: Solving the system of equations yields the parameters of the S U model. With proper initial values, the solutions can be found by the Newton's iteration method. The Jacobian matrix is: At each iteration, the integrals involved can be evaluated by the Gauss-Hermite quadrature in Appendix.
The initial values of δ 0 and γ 0 are evaluated as follows: δ 0 is given by Hill et al. (1976), γ 0 is given byDraper (1952). If φ > 0.52, the S L model is more appropriate.
Once δ 0 and γ 0 are determined, the initial values ξ 0 and λ 0 are easily calculated by solving the following system of linear equations: The parameters of the S B model are obtained in a similar way as the S U case. β r is: The entries in the Jacobian matrix are: Notice that the random variable x generated by the S B model is located in the interval [ξ, ξ + λ], the initial values of ξ and λ can be determined by the variation range of x.
where α represents a small percentage value, F −1 (·) is the inverse CDF of x.
As long as ξ 0 and λ 0 are known, ξ 0 and λ 0 are calculated as follows (Johnson & Kitchen, 1971): 4. Accommodating the dependency Suppose x 1 and x 2 are two correlated random variables with the correlation coefficient ρ x . Through the transformation model in Eq.
(1), two correlated standard normal variables, z 1 and z 2 , are obtained. Let ρ z denote the correlation coefficient between z 1 and z 2 . This section is devoted to evaluate ρ z for a given value of ρ x . For the four models of Johnson system, there are ten combinations: For two correlated standard normal variables, the following properties hold (Pearson & Young, 1918): where m is a nonnegative integer. Let c denote a constant. Using the formulae in Eq. (20), the derivation below is carried out: Let x 1 , x 2 denote the random variable obtained by S N and S L models in Eq.(8) respectively. The correlation coefficient ρ x is calculated as: Substitute the expression of x 1 and x 2 into Eq.(22), using the formula in Eq.(21) yields: To handle the case of S N − S U conveniently, the following derivation is performed.
Suppose x, y are two random variables. Let y = y 1 + y 2 .
Let x 2 denote the random variable resulted from S U model.
Using the formula in Eq.(24), the problem associated with S N − S U can be transformed into S N − S L . ρ z for ρ x between x 1 (S N ) and x 2 (S U ) is evaluated as: 4.3 S N − S B ρ x and ρ z are related by the following equation: where ϕ(z 1 , z 2 , ρ z ) is the bivariate standard normal PDF.
Substitute the expression of x 1 (S N ) and x 2 (S B ) into Eq. (27), the following formula is obtained after some manipulations: The integral on the right side can be evaluated by the Gauss-Hermite quadrature.

S L − S L
As for the case of S L − S L , x 1 and x 2 both follow the Lognormal distribution. The function relationship between ρ z and ρ x is as follows (Mostafa & Mahmoud, 1964): The problem is converted into S L − S L using the formula in Eq.(24). ρ x between x 1 (S L ) and x 2 (S U ) is estimated by the following equation: (30) For a given value of ρ x , solve the equation for ρ z , the valid solution is restricted by the following conditions: |ρ z | ≤ 1 and ρ x ρ z ≥ 0 (31)

S U − S U
Transform the problem into S L − S U , the equation solved for ρ z is: For the cases of S L − S B , S U − S B and S B − S B , analytical expressions are unobtainable, other approaches should be tried. Suppose x 2 represents the random variable generated by S B model. Substituting the expression of x 1 into Eq.(27) yields: For the three equations above, it is obvious that ρ x is a continuous function with respect to ρ z , which is located in the interval [−1, 1]. According to Weierstrass approximation theorem (Saxe, 2001), ρ x can be approximated to any required accuracy by a polynomial of ρ z .
Choose several values of ρ z over the interval [−1, 1], calculate the integral by Gauss-Hermite quadrature, evaluate the value of ρ x . Then, a polynomial is established by an interpolation method. ρ x = a n ρ n z + · · · + a 1 ρ z + a 0 For a specified ρ x , ρ z can be determined by solving the polynomial equation above, the valid solution is also restricted by conditions in Eq.(31). In general, a satisfactory result would be obtained by a polynomial of degree less than 10. From numerical investigations where |κ 3 | ≤ 20 and |κ 4 | ≤ 120, it is found that the integral with |δ| > 0.5 can be accurately evaluated by a 9-point Gauss-Hermite quadrature. If |δ| < 0.5, more quadrature nodes are required.

Generating multivariate random vector
Suppose X = (x 1 , . . . , x i , . . . , x m ) T is an m-dimensional random vector with correlation matrix R x , the steps for generating X are as follows: 1. Choose a suitable transformation model for each variable x i of X and determine the parameters.
2. Calculate ρ z (i, j) (i = j) for each entry ρ x (i, j) in the matrix R x , obtain the equivalent correlation matrix R z in the standard normal space.
3. Generate m-dimensional vector of independent standard normal variables, U = (u 1 , . . . , u i , . . . , u m ) T , transform U into correlated standard normal vector Z = (z 1 , . . . , z i , . . . , z m ) T by performing Z = LU . L is the lower triangular matrix resulted from Cholesky decomposition of R z , that is, 4. Convert each element z i into x i by the corresponding model. The random vector X with the prescribed marginal distributions and correlation matrix R x could be obtained.
As long as the correlation matrix R x is a positive definite symmetric matrix (Cario & Nelson, 1997), and the marginal distribution of x i can be well simulated by Johnson system, the correlated random vector X can be generated by the proposed method.

Numerical example
Four numerical examples are performed in Matlab to demonstrate the proposed approach. The integrals involved are all evaluated by a 21-point Gauss-Hermite quadrature. The Newton's iteration method is required to get five digits of accuracy.
Consider the T-distribution with 4 degrees of freedom (T (4)), the first four standardized central moments are: µ = 0, σ = 1.4142, κ 3 = 0, κ 4 = +∞. Solving the equation in Eq.(2) yields ω = 1. According to Eq.(4), the S U model is chosen. To enable the calculation of the initial values, κ 4 is set to be 10 16 . The initial values are: ξ 0 = 0, λ 0 = 0.0806, δ 0 = 0.4617, γ 0 = 0. The solutions of the system of equations are: ξ = 0, λ = 1.4399, δ = 1.3760, γ = 0. The graphs of PDF are depicted in Fig.1. For a given percentage p, evaluate the associated percentile t p from T (4) and the percentile x p given by the S U model in Eq.(8), the relative error between t p and x p is calculated: where T −1 (·) is the inverse CDF of T (4), Φ −1 (·) is the inverse CDF of the standard normal variable. 10000 values of p are chosen evenly within the interval [0.01, 0.99]. Evaluating ε for each percentage p as stated in Eq.(38), the average value is 1.55%, the maximum value is 4.63%, the minimum value is 0.0002%. The Johnson system offers satisfactory accuracy for simulating T (4).
Another example is given related to the Gamma distribution Γ(10, 1). The where G −1 (·) is the inverse CDF of Γ(10, 1). The average value of ε is 2.52 × 10 −5 %, the maximum value is 1.81%, the minimum value is 0.18%. Suppose x 1 and x 2 both follow the Gamma distribution Γ(10, 1). Select four values of ρ z that are evenly spaced over the interval [−1, 1], evaluate the double integral in Eq.(35), obtain the associated ρ x , the polynomial is established by Newton's interpolation method.
201 values of ρ z are chosen from −1 to 1 in steps of 0.01. For each value of ρ z , 10 6 bivariate Gamma random vectors are generated by the procedures in Section 4.8, and ρ x is calculated by evaluating the correlation coefficient of bivariate Gamma random samples. Along with the ρ x evaluated by the polynomial in Eq.(39), the function curves are plotted in Fig. 3. Several pairs of (ρ z , ρ x ) values are presented in Table 1. Inspection of Fig.3 and Table 1 indicates that the polynomial in Eq.(39) suffices for a good approximation of the functional relationship between ρ z and ρ x .
Finally, an example for generating multivariate random vector with specified marginal distributions and correlation matrix is worked. Consider the random vector X = (x 1 , x 2 , x 3 , x 4 ) T , x 1 follows the standard normal distribution, x 2 , Lognormal distribution lnN (0 1), x 3 , T-distribution with 4 degrees of freedom, x 4 , Gamma distribution Γ(10, 1). The desired correlation matrix R x is: The equivalent correlation matrix R z evaluated by the proposed method is: 5 × 10 6 random vectors X = (x 1 , x 2 , x 3 , x 4 ) T are generated, the correlation matrix of the samples is: Comparing R * x with R x indicates that the correlation matrix of samples is in close agreement to the desired one.

Conclusion
Through the transformation models of Johnson system, a variety of distributions can be simulated by the standard normal distribution. Random numbers with a specified distribution can be efficiently generated by an elementary transformation of standard normal deviates. In the context of probability weighted moments, this paper develops a simpler approach to evaluate the parameters of Johnson system. To allow for the generation of multivariate non-normal distributions with desired correlation matrix, formulae are derived to calculate the equivalent correlation coefficient in the standard normal space. Extension of Johnson system to the multivariate data generation makes it more applicable to stochastic modeling and simulation study.
where n is the number of the quadrature nodes. t k is the kth zero of the Hermite polynomial H n (t).
H n (t) = (−1) n e t 2 d n dt n e −t 2 The associated weights ω k are given by: Using Eq.(43), the following formula can be easily derived: Extension of Gauss-Hermite quadrature to double integral is as follows: 1 2 · e − t 2 2 2 dt 1 dt 2 = 1 π n k=1 n l=1 ω k ω l g( √ 2t k , √ 2t l ) (47) The nodes t and weights ω of the 21-point Gauss-Hermite are presented in Table 2.