Bayesian Estimation and Prediction for the Power Law Process with Left-Truncated Data

Abstract: The power law process (PLP) (i.e., the nonhomogeneous Poisson process with power intensity law) is perhaps the most widely used model for analyzing failure data from reliability growth studies. Statistical inferences and prediction analyses for the PLP with left-truncated data with classical methods were extensively studied by Yu et al. (2008) recently. However, the topics discussed in Yu et al. (2008) only included maximum likelihood estimates and confidence intervals for parameters of interest, hypothesis testing and goodness-of-fit test. In addition, the prediction limits of future failure times for failure-truncated case were also discussed. In this paper, with Bayesian method we consider seven totally different prediciton issues besides point estimates and prediction limits for xn+k. Specifically, we develop estimation and prediction methods for the PLP in the presence of left-truncated data by using the Bayesian method. Bayesian point and credible interval estimates for the parameters of interest are derived. We show how five single-sample and three two-sample issues are addressed by the proposed Bayesian method. Two real examples from an engine development program and a repairable system are used to illustrate the proposed methodologies.


Introduction
When failure times from different systems during their development programs are collected and analyzed, an approximate straight line pattern in the corresponding log-log plot of the cumulative mean time between failures (MTBF) against the cumulative operating time is usually observed (see, Duane, 1964).Crow (1974) extended the Duane model to the nonhomogeneous Poisson process (NHPP) with a power intensity law, which is also known as AMSAA model due to its adoption by the U.S. Army Materiel Systems Analysis Activity.The NHPP model is generally used to monitor the reliability growth of repairable systems, to assess the reliability growth of software and to predict failure behaviors.Statistically, a NHPP {N (t), t ≥ 0} with the power intensity law λ(t) = (β/α)(t/α) β−1 , α, β > 0, (1.1) is also known as power law process (PLP) or Weibull process.The corresponding mean function is defined by It was shown by Rigdon (2002) that a linear Duane plot does not imply a PLP, and a PLP does not imply a linear Duane plot.
Assume that we conduct a reliability growth test on some repairable system in the time interval (0, t].For the failure truncated case, the number of failures, n, is predetermined.Let 0 < x 1 < x 2 < • • • < x n be the first n ordered failure times of the PLP.The time to the first failure (i.e., x 1 ) can be shown to follow Weibull distribution with scale parameter α and shape parameter β.That is, the probability density function (pdf) of x 1 is (Crowder et al., 1991) with the corresponding distribution function ) denote the conditional density function of x i given x 1 , • • • , x i−1 , we have (1. 2) The joint pdf of x 1 , • • • , x n is thus given by For the time truncated case, the test time t is predetermined.Let 0 < x 1 < • • • < x n < t be the observed failure times of the PLP.Similar to (1.3), the joint pdf of (x 1 , • • • , x n ; N = n) is given by where N = N (t) and the symbol = means "equal by definition".
In practice, left-truncated data occur due to various reasons such as not being able to observe in the early developmental phase of a testing program.Recently, Yu et al. (2008) developed classical methods for statistical inferences and prediction analyses for the PLP with the first r − 1 failure times (i.e., {x i } r−1 i=1 ) being missing.Using their notations, we denote the observed data as Y ft obs = {x r , • • • , x n } for the failure-truncated case and Y tt obs = {x r , • • • , x n ; t} for the time-truncated case.First, the topics discussed in Yu et al. (2008) only included maximum likelihood estimates and confidence intervals for parameters of interest (e.g., α, β and MTBF), hypothesis testing on α and β, and goodnessof-fit test.In addition, the prediction limit of the (n + k)-th future failure time (i.e., x n+k ) for failure-truncated case was also discussed.In this paper, we will consider seven totally different issues (see, Sections 4.2-4.4 and 5) besides point estimates (see, Section 3) and prediction limits for x n+k (see, Section 4.1), and solutions to the seven prediction issues are not yet available to date for both classical and Bayesian methods.Second, the prediction limit of x n+k was available only for the failure-truncated case (Yu et al., 2008).A similar result does not yet exist for the time-truncated case.The main reason is our inability to find an adequate prediction statistic in the framework of classical methods.Fortunately, the Bayesian method can be applied (see, Section 4.1).Finally, there is a computational challenge in obtaining exact solutions from the classical methods and the accuracy of approximate formulae are heavily dependent of large sample sizes, while the Bayesian method can facilitate the computation by employing the conditional sampling procedure (see, two conditional sampling procedures in Sections 3.1 and 3.3).
This article aims to develop Bayesian estimation and prediction methods for the PLP with the first r − 1 failure times being missing.This left-truncated data pattern commonly occurs when (i) the importance of a reliability growth program is eventually recognized only when manufacturers reported the failures several times; and (ii) a new data-recording person may not be able to determine the exact failure times during the early stage of the process.This article is organized as follows.Section 2 presents the posterior and predictive distributions.
Section 3 derives Bayesian point estimates and credible interval estimates for the parameters of interest.Five practical single-sample issues are discussed and then addressed by the Bayesian approach in Section 4. In Section 5, we develop Bayesian methods for three two-sample issues.Two real examples from an engine development program and a repairable system are used to illustrate the proposed methodologies in Section 6.A brief discussion is given in Section 7.

Posterior and Predictive Distributions
In this article, we consider a common scenario in which missing data are produced only in the early stage of the test time (see, Yu et al., 2008).That is, {x i } r−1 i=1 are missing.For the failure-truncated PLP, the joint pdf of the observed data (2.1)For the time-truncated PLP, we also assume that {x i } r−1 i=1 are missing.The joint pdf of the observed data where 0 Combining (2.1) with (2.2), the likelihood function for α and β is then given by (2.3) where (2.4) When prior information is not available, it is reasonable to use noninformative prior distributions.
(3.2) Therefore, the Bayes point estimator and two-sided 100γ% Bayes credible interval (CI) for α are respectively given by where χ 2 (n; γ) denotes the γ percentage point of the chi-square distribution with n degrees of freedom such that Pr{χ When β is unknown, we can integrate (2.10) with respect to α to obtain the following marginal posterior density of Obviously, (3.5) implies that  2008), we can immediately conclude that the Bayes CI for β is identical to the classical confidence interval for β when the joint prior is chosen to be (2.9).
To derive the Bayes estimates for α, we adopt the conditional sampling method.From (2.10), we have where X is defined in (3.1).Hence, the conditional sampling algorithm can be summarized as follows.
The Bayes estimates for α can then be obtained from the above i.i.d.posterior samples {α ( ) } m =1 .For example, the Bayes point estimator of α is then given by α

The Pseudo Bayesian Estimate of M (x n )
In this subsection, we restrict our discussion to the failure-truncated case.If no improvements are incorporated into the repairable system after the time of the n-th failure (i.e., x n ) and the intensity λ(x n ) = (β/α)(x n /α) β−1 remains constant thereafter, then the subsequent times between failures of the system independently follow exponential distribution with the common failure rate λ(x n ) and the MTBF M (x n ) = 1/λ(x n ).Since M (x n ) involves the random variable x n , its posterior density cannot be obtained. 1Following the idea of Higgins and Tsokos (1981), we consider the pseudo Bayes point and CI estimates for λ(x n ) (or M (x n ) equivalently) instead.For this purpose, we noted that (see, Theorem 1 in Yu et al., 2008) λ , For the sake of convenience, we define As a result, we have where By treating λ n as a parameter, we consider the transformation λM = aλ n /Q.Since (3.10) implies that Q may be approximated by the chi-square random variable χ 2 (n * ), a pseudo likelihood function for λM takes the following form If we adopt a noninformative prior distribution of λ n (i.e., g(λ n ) ∝ 1/λ n ), then the pseudo-posterior density of λ n is Therefore, the pseudo Bayes point estimator and the two-sided 100γ% Bayes CI for λ n are respectively given by λn = n * λM /a, (3.12) Usually, n * in (3.11) is not a positive integer.In this case, we can approximate χ 2 (n * ; γ) by where u γ denotes the γ percentage point of the standard normal distribution N (0, 1).On the other hand, there exist algorithms to calculate χ 2 (n * ; γ) for fractional degrees of freedom.For example, the built-in S-plus function qchisq can handle this case.

The Bayesian estimate of M (t)
In this subsection, we consider the time-truncated case.Equivalently, we consider the posterior estimate for When β is known, from (2.6), the posterior density of λ t is Therefore, the Bayes point estimator and the two-sided 100γ% Bayes CI for λ t are respectively given by λt (β) = nβ/t, (3.15) When β is unknown, we consider the following transformation According to (2.10), the joint posterior distribution of (λ t , β) can be shown to be Integrating (3.17) with respect to β yield the following posterior distribution of (3.18) Therefore, the Bayes point estimator of To construct a Bayes CI of λ t , we consider the following conditional sampling method to obtain i.i.d.posterior samples of λ t .

Bayesian Predictions and Estimations for Single-Sample Problems
In this section, we consider five practical single-sample issues.
4.1 Prediction limits for x n+k 4.1.1Shape parameter β is known Let f (x n+k |Y obs ) denote the predictive density function of the (n+k)-th future failure time (i.e., x n+k ) given the observed data Y obs .According to (2.8), the Bayesian UPL x It should be noted that h(α|Y obs ) is given by (2.6) and , where f (Y obs |α) is given by (2.3) and f (x r , x r+1 , • • • , x n+k |α) is also given by (2.3) with (n, τ ) being replaced by (n + k, x n+k ), that is, By using the identity (2.2) in Yu et al. (2008), we immediately have the numerator in the right-hand side of (4.3) being equal to Hence, (4.3) (cf.Calabria, Guida and Pulcini, 1990) and (4.2) become respectively.Substituting (4.5) into (4.1)yields U,B (n, k, γ)] β equals to the 1 − γ percentage point of the Beta(n, k) distribution.From the relationship between the quantiles of beta distribution and F -distribution, we have where F (m, n; γ) represents the γ percentage point of the F -distribution with m and n degrees of freedom.

Estimating the probability of N (τ, T ) ≤ k
Based on Y ft obs or Y tt obs , we are interested in the following question: Issue A1: What is the probability that at most k failures will occur in the future time period (τ, T ] with T > τ ?

Estimating the probability of λ(T ) ≤ λ 0
Based on Y ft obs or Y tt obs , we are interested in the following question Issue B1: Given that a pre-determined target value λ 0 for the failure rate of the system undergoing development testing is not achieved at time τ , what is the probability that the target λ 0 will be achieved at time T with T > τ ?

Shape parameter β is unknown
Consider the following transformation Similar to (3.17) and (3.18), the posterior distribution of λ T is Substituting this expression into (4.21),we have In this subsection, we are interested in the following issue: Issue C1: Given that the target value λ 0 for the system failure-rate is not achieved at τ , how long will it take in order that the system failure rate will be attained at λ 0 ?
Statistically, for a given confidence level γ and the target failure-rate λ 0 , we wish to estimate the time T (T > τ ) that satisfies (4.21).When β is known, it is easy to see from (4.22) that 2τ β (βT From (4.21), we have . (4.26) When β is unknown, the desired T that satisfies (4.25) can be solved by using an iterative algorithm.
Equivalently, we notice that Issue C1 can be re-formulated as follows: Issue D1: Based on Y ft obs or Y tt obs , what is the Bayes UPL of λ(T ) with T being a pre-specified value larger than τ ?Statistically, it is desired to find λ U,B (T ) satisfying γ = Pr{λ(T ) ≤ λ U,B (T )|Y obs } for a given level γ.When β is known, solving λ 0 from (4.26), we obtain When β is unknown, the corresponding λ U,B (T ) equals to the λ 0 satisfying (4.25).Thus, an iterative algorithm can be applied.

Bayesian Predictions and Estimations for Two-Sample Problems
Suppose that the successive failure times of two repairable systems follow the same PLP with intensity function (1.1).Furthermore, for the first system, we assume that the first (r − 1) failure times were missing and the remainder failure times (i.e., Y ft obs or Y tt obs ) have been recorded exactly.We are interested in the following two-sample problems.
Issue A2: How to predict the k-th (k ≥ 1) failure time y k of the second system?Issue B2: Assume that the number of failures in the time interval (0, t 2 ] for the second system is m but that the exact failure times are unknown, how to predict the k-th (1 ≤ k ≤ m) failure time y k of the second system?
Issue C2: What is the probability that at most m failures will occur in (0, t 2 ] for the second system?
In this section, we utilize a Bayesian approach to address the above three issues.

Shape parameter β is known
The predictive density of the k-th failure time y k of the second system is . by (2.6) and (5.1) (5.2) Hence, the Bayes UPL y Note that the integrand in the above integration is exactly the pdf of the F (2k, 2n) distribution; thus n[y (5.3)

Shape parameter β is unknown
Note that the independency of y k and Y obs , then, the predictive density of y k is by (2.10) and (5.1) (5.4) Thus, the Bayes UPL y U,B (k, n, r, γ) for y k satisfies It is easy to show that where V is the solution to the following equation with F (•|m, n) and χ 2 (•|n) being the cdf of F (m, n) and the density of χ 2 (n), respectively.

Prediction limits for
For Issue B2, we first need to find the conditional density f (y k |N (t 2 ) = m).Setting r = 1 and replacing n, t, and x i by m, t 2 and y i respectively in (2.2), we obtain By integrating out the variables y 1 , • • • , y k−1 , y k+1 , • • • , y m from the above joint density, we have and It is noteworthy that (5.7) does not depend on α.

Shape parameter β is known
Since y k is independent of the observed data Y obs for the first system, we also have Given N (t 2 ) = m, the Bayes UPL y Hence, [y (β) U,B (k, m, γ)/t 2 ] β is equal to the γ percentage point of the Beta(k, m − k + 1) distribution.Similar to (4.6), we obtain (5.8)

Shape parameter β is unknown
From (2.10) and (5.7), we have The Bayes UPL y U,B (k, m, n, r, γ) for y k with level γ satisfies where W can be determined by r) . (5.10) In particular, when k = m we have W = (γ − 1 n−r − 1)/m and (5.11)

Estimating the probability of N (t 2 ) ≤ m
For Issue C2, the probability that at most m (a pre-specified value) failures will occur in (0, t 2 ] for the second system is given by Pr{N (t 2 ) = k|Y obs }.

A repairable system failure data
The following failure data from a prototype of a repairable system is given in ReliaSoft Corporation (2005).Briefly, a total of 12 failures for a system tested for t = 6500 hours are given by: (80), ( 175), ( 265), (400), 590, 1100, 1650, 2010, 2400, 3380, 5100, 6400. ReliaSoft Corporation (2005) showed that the above life data followed a PLP.For illustrative purpose, we assume that the exact failure times for the first four failures are unknown.Since these data are time-truncated, we have t = 6500, n = 12 and r = 5.The MLE of β is β = (n − r + 1)/z = 0.4389.

Conclusion
In this article, we consider Bayesian estimation and prediction methods for the PLP in the presence of left-truncated data.Bayesian point and credible interval estimates for the parameters of interest are derived.Bayesian prediction limits of future failure times are available for both failure-and time-truncated cases.We also show how five single-sample and three two-sample issues are addressed by the proposed Bayesian method.
It is interesting to note that (4.6) does not depend on r.In other words, when β is known, the Bayes prediction limit of x n+k remains the same regardless of the absence of the first (r − 1) failure times (i.e., {x i } r−1 i=1 ).This result is not surprising because the conditional distribution of x n+k given x 1 , • • • , x n+k−1 depends only on x n+k−1 but not x 1 , • • • , x n+k−2 (cf.(1.2)).
In addition, we notice that the Bayesian UPL of x n+k under the noninformative prior (2.5) is identical to the classical UPL of x n+k by comparing (4.6) with (3.18) in Yu, Tian and Tang (2008).Most importantly, the prediction limit for x n+k is available only for the failure-truncated case under the classical approach while the prediction limits for x n+k are available for both the failure-and time-truncated cases under Bayesian approach.
By comparing (4.9) with (3.18) and (3.19) in Yu, Tian and Tang (2008), we find that the Bayesian UPL for x n+k under the joint noninformative prior (2.9) is identical to the classical UPL for x n+k .Again, the Bayes prediction limits for x n+k are available for both failure-and time-truncated cases.

Figure 1 :
Figure1: The posterior densities of β and α generated by the conditional sampling method introduced in Section 3.1 with m = 50, 000 for the engine failure data.

Figure 2 :
Figure2: The posterior densities of β and λ(t) generated by the conditional sampling method introduced in Section 3.3 with m = 50, 000 for the repairable system failure data.