STATISTICAL ANALYSIS OF SURVIVAL TIMES BASED ON PROPORTIONAL GENERALIZED ODDS MODELS

: In the area of survival analysis the most popular regression model is the Cox proportional hazards (PH) model. Unfortunately, in practice not all data sets satisfy the PH condition and thus the PH model cannot be used. To overcome the problem, the proportional odds (PO) model ( Pettitt 1982 and Bennett 1983a) and the generalized proportional odds (GPO) model ( Dabrowska and Doksum, 1988) were proposed, which can be considered in some sense generalizations of the PH model. However, there are examples indicating that the use of the PO or GPO model is not appropriate. As a consequence, a more general model must be considered. In this paper, a new model, called the proportional generalized odds (PGO) model, is introduced, which covers PO and GPO models as special cases. Estimation of the regression parameters as well as the underlying survival function of the GPO model is discussed. An application of the model to a data set is presented.


Introduction
When lifetime is subject to right censoring one of the commonly used regression models is the Cox proportional hazards (PH) model, provided that the data satisfies certain PH conditions (Cox 1972). Since it is not uncommon that the PH conditions are violated, different regression models need to be constructed to fit the data. One of them is the proportional odds (PO) model proposed by Pettitt (1982) and Bennett (1983a, b). Just like the PH model, the PO model assumes that the odds, instead of the hazard rates, of the distribution of a lifetime are proportional to the odds of a baseline distribution. It should be mentioned that the PO model is not a simple generalization of the PH model. Studies on the PO model can be found in Pettitt (1984), Bickel (1986), Cheng et al (1995), Cuzik (1988), Dabrowska and Doksum (1988), Shen (1995), Wu (1995), and Murphy et al (1997) among others. Most of the studies mainly focus on estimation of the underlying survival function. In addition, for two independent samples with right-censored lifetimes, Dauxois and Kirmani (2003) developed a method for testing whether the data satisfies the PO assumption.
Besides the PO model a generalization of it was proposed. For two-sample lifetime data, Dabrowska and Doksum (1988) introduced a model, called the generalized proportional odds (GPO) model, and developed a testing method for comparison of two survival functions satisfying the GPO condition. For more details, one refers to Clayton and Cuzick (1986), Dabrowska et al (1989), Banerjee et al (2007) and references therein.
Although the GPO model is quite general and flexible to fit data, there are still many cases where the application of the GPO model is not suitable (See Section 4). To this end, we propose a more general model, called proportional generalized odds (PGO) model. The main difference between the GPO model and the PGO model is that the former assumes a common frailty factor for two groups of survival times under study and the latter allows the two groups to have different frailty factors. As will be seen in Section 4, the GPO model does not fit the data, while the proposed PGO model fits. The major differences between Bennett (1983a) and the current study are that (i) the current PGO model is more general than the PO model and (ii) the current study includes regression models but Bennett (1983a) does not.
The rest of this paper is organized as follows: In Section 2, the PGO model is defined and discussions on the relation between it and other related models are presented. Section 3 discusses semi-parametric estimation of the PGO model. In Section 4, the PGO model is applied to a data set about kidney disease recurrent times. Some discussions are presented in Section 5.

Proportional generalized odds model
Let T ≥ 0 denote a lifetime and z a vector of covariates. Assume that the survival function S(t) = P (T > t) of T is continuous. The ratio [1 − S(t)]/S(t) is called odds, which measures the odds of "failure" against "survival" at time t ≥ 0. Pettitt (1982) and Bennett (1983a, b) introduced the PO model as follows, in which S0 is a baseline survival function. This model assumes that lifetime T is associated with its covariate z only through θ, which is independent of time t. This model actually assumes that the odds ratio (against a baseline) at any time t is a constant θ, free of t. This model has been applied to various data sets where the use of the PH model is not appropriate. As mentioned in Section 1, the PO model is a different model, not an extension of the PH model. In particular, if we let λ(t) denote the hazard function corresponding to S(t) then, under the PO model, the hazard ratio which it is not a constant unless θ(z) = 1. A more general model was introduced by Dabrowska and Doksum (1988) as follows. First, they extended the odds to generalized odds defined as Obviously, it reduces to the usual odds when r = 1. Then they proposed the following GPO model where S1 and S2 are, respectively, survival functions of two independent samples, and r is a (frailty) parameter. Since it is shown that the generalized odds converge to the cumulative hazard − log S(t) as r → ∞ , the GPO model becomes the PH model as r → ∞ . That is, the GPO model covers the PH and PO models as special cases. Studies of the GPO model focus on testing for the hypothesis of equality of S1 and S2 when they satisfy the GPO condition, and the model has been studied extensively in the past two decades. Recently, we tried to apply the model to a data but found that the data does not satisfy the GPO assumption. For this reason, we extend the GPO model by allowing different r for different sample. In particular, we propose the following proportional generalized odds (PGO) model, where S0 denotes a baseline survival function, r > 0 and θ > 0 are parameters. To compare it with the GPO model, let us consider two independent samples. Let Si denote the survival function of sample i, i = 1, 2. Then it is seen that So the PGO model is a generalization of the GPO model. It can be verified that Burr distribution (Burr, 1942) with different scale parameters (See Section 3 below) satisfies the PGO condition but not the PO condition.
In addition to the natural extension, the PGO model can also be obtained from an engineering point of view. Let X1, · · · , Xn, · · · be a random sample of observations with distribution function F and survival function S = 1 − F , and let N be a random variable having a geometric distribution with probability of "success" θ ∈ (0, 1). Then Kirmani and Gupta (2001) showed that the distribution of lifetime Y= min

Estimation
A nonnegative random variable X is said to follow a Burr distribution (Burr, 1942), denoted as X ∼ Bur(α, σ, r), if it has the survival function where the shape parameter α, the scale parameter σ and the frailty parameter r are all positive. Bur(1, 1, r) is called the standard Burr distribution.
We now discuss the PGO regression model. Assume that in model (2.1) survival function S is affected by covariates z only through θ. Specifically, model (2.1) becomes where θ(z) = exp{β′z}, Let T1 ∼ S1 and T2 ∼ S2 be two independent lifetimes satisfying the PGO model Apparently it is a semi-parametric model. We now use the likelihood principle to estimate the baseline S0, parameters β and ri, and survival functions Si(t; z), i = 1, 2, based on censored data. This can be considered as a two-sample problem with covariates. The same method can be used for three or more samples. Define

Then, by Proposition 3, Yi follows a Burr distribution with survival function
Since both samples share the same baseline S0, we combine two samples into one in order to construct a pooled estimate of S0. For simplicity let to denote the distinct observed lifetimes from the combined sample, of which some are from sample 1 and the others are from sample 2. Further, we let , j = 1, 2, . . . denote censored lifetimes from the combined sample. In addition, we use ( ) and , respectively, to denote the corresponding transformed values of ( ) and , . Note that although it is not clearly labeled (for simplicity purpose), it is clear to us as for which sample an observation belongs to. Still for simplicity, we are going to use r for both r1 and r2, Q for both Q1 and Q2 and the density function of Q for q1 and q2, the density functions of Q1 and Q2. With these notations, the likelihood function in terms of Y takes the form of where Oi is the set of subscripts of observed lifetimes tied at t(i), Cj is that of right-censored lifetimes in [t(j), t(j+1)), and the derivative is Jacobian due to the transformation. Notice that y's as a function of S0 are also unknown parameters. To obtain the likelihood estimates of all parameters, following two approximations are used, which are similar to Bennett (1983a). First we replace the derivative by its approximation ′ is the largest observed lifetime before from the same sample. Then the above likelihood function is approximated by Once the likelihood (3.4) is maximized with respect to parameters β, r1, r2, and ( ) i =1, 2, . . . , k, the baseline survival function S0(t) can be estimated by

An application
This section applies the proposed PGO model to a data set about kidney patients recurrent times.
The data in Table 1 contains 38 kidney patients' second recurrent times to infection at point of insertion of the catheter for kidney patients using portable dialysis equipment, the risk variables are age, sex and type of disease coded as 0 = glomerulo nephritis (GN), 1 = acute nephritis (AN), 2 = polycystic kidney disease (PKD), 3 = other; in the event column, 1 denotes that time to infection was observed and 0 denotes that it was censored. The original data containing the first two consecutive recurrent times and sex had been studied by McGilchrist and Aisbett (1991) using a frailty model and by McGilchrist (1993) and Lam (1994) using the proportional hazards model. They all found that "sex" was not significant.
As an illustration we divide the data into two groups: AN and PKD in one group and GN and Other in another group. To see whether the Cox's PH model, the PO model or the GPO model is suitable for the data, we ignore covariates "age" and "sex" and then for each of the groups we obtained the estimate of the survival function using three different methods: Kaplan-Meier (KM) estimate, the PO model, and the GPO model. If, for example, the PO model fits the data, then there will be some θ > 0 such that  . In other words, when taking log on both sides of the above equation, the two log(odds) curves (vs. time t) should be parallel if the model works. However, as is seen in Figure 1, the two log(hazard) curves based on the estimated hazard functions are not parallel (Fig 1(a)); the two log(odds) curves based on PO or GPO are not parallel either (Fig 1(b), 1(c)). By contrast, the two curves based on the PGO model are nearly parallel (Fig 1(d)). This is evidence that the PGO model is more appropriate to fit this data than the other three models. In addition, goodness-of-fit tests for the hypothesis that the two groups (AN and PKD vs. GN and Other) satisfy the PO model using Neyman's smooth test and Kolmogorov-Smirnov test (see Kraus, 2009) have p-values of 0.014 and 0.094, respectively. This confirms our observation in Figure  1. By the way, if the data is divided into four groups, AN, PKD, GN, and Other, then the PH assumption is satisfied (p-value=0.503); however, when the data is divided into two groups as described above, besides Figure 1(a), we performed a test and the result shows that the PH assumption is not satisfied (p-value=0.01329), which is consistent with Figure 1(a).
We fit the PGO model to this data using likelihood (3.4). The genetic algorithm was employed for the computation. The maximum likelihood estimates of parameters r1, r2, and the coefficients of "age" and "sex" are, respectively, e 5.5 , e 7.7 , 0.006 and -1.29. A likelihood ratio test was performed for "age" and "sex" and found that none of them is significant (p-value=0.82 for "age" and 0.17 for "sex").

Discussions
This paper proposes a model which generalizes existing PH, PO and GPO models. The example in Section 4 supports the proposed model. Although the proposed PGO model is more flexible than the existing models, it remains to be done to develop a method to verify whether a data satisfies the PGO condition. We believe that it is not an easy but a 'must' task. One of the most amazing part of the PH model is the partial likelihood estimate of the model parameters. Unfortunately, we are not able to construct a similar estimator for the proposed PGO model. So, large sample properties of our maximum likelihood estimators are not discussed. In addition, still because of the lack of partial likelihood function, estimations of the baseline survival function and other parameters are obtained in one step, instead of two steps like the PH model. This one step calculation involves heavier computation and more computer time. For our data in Section 4 Newton-Raphson algorithm was used first but it failed to converge; then we employed the genetic algorithm which worked very well. Without theoretical results, simulation studies are usually desired in order to investigate the performance of a proposed model. Unfortunately simulations are not done in this paper due to technical difficulties but we decided to conduct a thorough simulation study in the future. To summarize, this paper is a preliminary study of the problem; the above mentioned issues need to be studied in order to make the proposed model useful in practice.