COMPARISON OF COX REGRESSION AND PARAMETRIC MODELS FOR SURVIVAL ANALYSIS OF GENETIC VARIANTS IN HNF 1 B GENE RELATED TO AGE AT ONSET OF CANCER

Semi-parametric Cox regression and parametric methods have been used to analyze survival data of cancer; however, no study has focused on the comparison of survival models in genetic association analysis of age at onset (AAO) of cancer. The Hepatocyte nuclear factor-1beta (HNF1B) gene has been associated with risk of endometrial and prostate cancers; however, no study has focused on the effect of HNF1B gene on the AAO of cancer. This study examined 23 single nucleotide polymorphisms (SNPs) within the HNF1B gene in the Marshfield sample with 716 cancer cases and 2,848 non-cancer controls. Cox proportional hazards models in PROC PHREG and parametric survival models (including exponential, Weibull, log-normal, log-logistic, and gamma models) in PROC LIFEREG in SAS 9.4 were used to detect the genetic association of HNF1B gene with the AAO. The Akaike information criterion (AIC) and Bayesian information criterion (BIC) were used to compare the Cox models and parametric survival models. Both AIC and BIC values showed that the Weibull distribution is the best model for all the 23 SNPs and the Gamma distribution is the second best. The top two SNPs are rs4239217 and rs7501939 with time ratio (TR) =1.08 (p<0.0001 for the AA and AG genotypes, respectively) and 1.07 (p=0.0004 and 0.0002 for CC and CT genotypes, respectively) based on the Weibull model, respectively. This study shows that the parametric Weibull distribution is the best model for the genetic association of AAO of cancer and provides the first evidence of several genetic variants within the HNF1B gene associated with AAO of cancer.


Introduction
Survival analysis methods, including non-parametric Kaplan-Meier method (including log-rank test and Wilcoxon test), semi-parametric Cox proportional hazards model as well as parametric methods (such as exponential, Weibull, gamma, log-normal, and log-logistic models) have been used in cancer survival studies.However, previous studies have shown inconsistent results for survival analysis methods in cancer survival studies.For example, the Cox model is similar to the exponential model (Pourhoseingholi et al., 2007); while the Weibull and exponential models were similarly the best models in the survival analysis of stomach cancer and AAO of cancer was defined by the date of the earliest cancer diagnosis in the registry.Covariates included in this study were age, gender, alcohol use in the past month (yes or no), obesity status, and smoking status (never smoking, current smoking and past smoking).Obesity was defind as a body mass index (BMI) ≥ 30 kg/m 2 .Genotyping data using the ILLUMINA Human660W-Quad_v1_A were available for 3894 individuals.The genotypes of 23 SNPs within the HNF1B gene were available in this data.

Descriptive Statistics and Quality Control
Categorical variables were presented as frequencies and percentages, while continuous variables were reported as the means ± standard deviation (SD).HelixTree Software (http://www.goldenhelix.com/SNP_Variation/HelixTree/index.html) was used to assess control genotype data for conformity with Hardy-Weinberg equilibrium (HWE).Genotype call rates and minor allele frequency (MAF) were also calculated.To account for population stratification, the principal-component analysis approach (Price et al., 2006) in HelixTree software was used to identify outlier individuals (Wang et al., 2012).Based on the principal components analysis of the first 5 principal components using HelixTree and genomewide genotype data, we removed outlier individuals.Consequently, 3564 Caucasian individuals were included in the analysis (716 cancer cases and 2848 controls).

Cox Proportional Hazards Model
The proportional hazards model or Cox regression model (Cox, 1972) is widely used in the analysis of time-to-event data to explain the effect of explanatory variables on hazard rates (Cantor, 2007;George et al., 2014).
where ℎ(|) is the hazard function at time t for a subject with a set of predictors x1,…,xp, h0(t) is the baseline hazard function, and β1,…,βp are the model parameters describing the effect of the predictors on the overall hazard.Then the hazard ratio (HR) is defined as the ratio of predicated hazard rates under two different values of a predictor variable (George et al., 2014).The PHREG procedure in SAS was used to fit the Cox regression model by considering censoring and maximizing the partial likelihood function.

Parametric Survival Models
Some commonly assumed parametric distributions in survival models include exponential, Weibull, gamma, log-normal, and log-logistic (Klein and Moeschberger, 2003).
where  is the time to event; x1,…,xp, and β1,…,βp are predictor variables and their corresponding coefficients, respectively; ε is the error term assumed to have a particular parametric distribution; and ln(ε) is the natural log of the error term (George et al., 2014).The exponentials of the β coefficients may be interpreted as the time ratio (TR) (Hernán et al., 2005;George et al., 2014;Kasza et al., 2014).If TR >1, the event is less likely to occur as it means it will take longer for the event to happen; whereas if TR <1, the event is more likely to happen.The LIFEREG procedure in SAS fits parametric survival models, where the link function can be GENETIC VARIANTS IN HNF1B GENE RELATED TO AGE AT ONSET OF CANCER taken from a class of distributions that include exponential, Weibull, log-normal, log-logistic, and gamma distributions.

Evaluation Criteria for Goodness of Fit
The Akaike information criterion (AIC) was used as a measure of goodness of model fit that balances model fit against model simplicity (Akaike, 1979(Akaike, , 1981)); while the Bayesian information criterion (BIC) was used as a similar measure (Simonoff, 2003). and where x is the random variable,  ̂ is the maximum likelihood estimate, k is the number of parameters, and n is the sample size.Note that model with smaller AIC and BIC values fits the data better.

Survival Analysis of Age at Onset of Cancer
The assessment of the association between genotypes of each SNP and AAO was initially performed using the log-rank test and Wilcoxon test in Kaplan-Meier (KM) survival analysis using LIFETEST procedure.The KM survival curves were used to plot the survival function.The PHREG procedure in SAS was used to fit the Cox model while the LIFEREG procedure was used to fit parametric survival models including the exponential, Weibull, log-normal, log-logistic, and gamma distributions.Multivariate Cox regression analysis and parametric survival analyses were conducted to detect associations of each SNP with AAO adjusting for gender, alcohol use in the past month, smoking status and obesity status.The Akaike information criterion (AIC) and Bayesian information criterion (BIC) were used to compare the Cox regression and parametric survival models.Descriptive statistics, the KM survival analysis, Cox regression, and parametric model analyses were conducted with SAS v.9.4 (SAS Institute, Cary, NC, USA).SAS codes are listed in Appendix.

Genotype Quality Control and Descriptive Statistics
All the 23 SNPs were in HWE in the controls (p>0.001) with MAF >5%.The demographic characteristics of the subjects in the study are presented in Table 1.There were slightly more females than males in both cases and controls.Age ranged from 46 to 90 years and AAO of cancer ranged from 23 to 90 years.

LIFEREG
The estimates of AIC and BIC consistently showed that the Weibull distribution was the best model for all the 23 SNPs and the Gamma distribution was the second best.Table 2 shows AIC and BIC for the different models for 5 SNPs associated with AAO in the Weibull model (p<0.05).The top two SNPs were rs4239217 and rs7501939, in that order.For rs4239217, the Weibull distribution was the best model (AIC and BIC are 5582.38 and 5623.38,respectively) and the Gamma distribution was the second (AIC and BIC are 5583.94and 5629.6,respectively).For the SNP rs7501939, the AIC and BIC values showed consistent results that the Weibull distribution was the best model (AIC and BIC are 5582.38 and 5623.38,respectively) and the Gamma model was the second best (AIC and BIC are 5583.94and 5629.6,respectively).

Survival Analysis of Age at Onset using the Weibull Model
The results of the parametric survival analysis using the Weibull model is presented in Table 3.Five SNPs (rs3110649, rs757210, rs430796, rs4239217 and rs7501939) showed associations with AAO of cancers.The top SNP was rs4239217 with TR=1.08 and the second signal was rs7501939 with TR=1.07.The Kaplan-Meier survival curves for different genotypes of SNPs rs4239217 and rs7501939 are shown in Figures 1 and 2, respectively.For rs4239217, the mean AAO was approximately 5.5 and 4.6 years later for individuals with AA genotype and AG genotype compared to those with GG genotype (p=0.0002based on the log-rank test and p<0.0001 based on the Wilcoxon test).For rs7501939, the mean AAO was approximately 5.2 and 4.3 years later for individuals with CC genotype and CT genotype compared to those with TT genotype (p=0.0058based on the log-rank test and p=0.0002 based on Wilcoxon test).

Discussion
In the present study, we explored the associations of 23 HNF1B SNPs with the AAO of cancer.Using the AIC and BIC, the Weibull distribution was found to be the best model for genetic association of polymorphisms within the HNF1B gene with the AAO of cancer.To our knowledge, this is the first study to compare the Cox regression and parametric survival models in genetic association analysis of AAO of cancer.It is also the first candidate gene study to provide evidence of several genetic variants (rs3110649, rs757210, rs430796, rs4239217 and rs7501939) within the HNF1B gene which may be involved in the AAO of cancer.
Semi-parametric Cox regression and parametric survival methods (such as exponential, Weibull, gamma, log-normal, and log-logistic models) have been used in cancer survival studies.However, previous studies have shown inconsistent results for survival analyses in cancers.For example, one study showed that the Cox regression is similar to the exponential model (Pourhoseingholi et al., 2007); while another study showed parametric models such as Weibull model, lognormal and gamma models may perform better than Cox model in oral cancer (Köhler and Kowalski, 2012).Furthermore, some studies favored the Weibull model in stomach cancer (Moghimi-Dehkordi et al., 2008), gastric cancer (Baghestani et al., 2009;Zhu et al., 2011) and colorectal cancer (Baghestani et al., 2015).In addition, one study found that the log-normal survival model may have a good fit for the gallbladder cancer (Wang et al., 2010); while the log-logistic model with gamma frailty is the best model in gastrointestinal cancer in northern Iran (Ghadimi et al., 2011).Several studies have used the semiparametric Cox model to examine the association of genetic variants with AAO of colorectal cancer (Jones et al., 2004 Wang et al., 2015).However, no study was found to have examined associations between genetic variants and AAO of cancer using parametric models (including exponential, Weibull, log-normal, log-logistic and gamma models).Therefore, the current study is the first attempt to compare the Cox regression with parametric survival models in genetic association analysis of AAO of cancer.Furthermore, in consistent with some previous studies (such as Moghimi-Dehkordi et al., 2008; Baghestani et al., 2009;Zhu et al., 2011;Baghestani et al., 2015), our results showed that the Weibull distribution is the best model for genetic associations of all 23 SNPs within the HNF1B gene with AAO of cancer.Moreover, we found that the Gamma distribution is the second best model.The differences of these comparisons may be due to different cancer types, sample size, and sample origins.On the other hand, we performed genetic association study of the AAO of cancer using parametric survival models.
It was shown that HNF1B regulated the expression of polycystic kidney and hepatic disease-1 (PKHD1) and therefore HNF1B may function as a tumor suppressor gene in chromophobe renal cell carcinogenesis (Rebouissou et al., 2005); while HNF1B may be involved in the development of ovarian cancers, gastric, pancreatic, and colorectal cell lines (Terasawa et al., 2006) Wang et al., 2014).Several studies also found that rs4430796 was associated with lung cancer in Chinese population (Sun et al., 2011), and prostate cancer in Korean men (Kim et al., 2008), Chinese men (Zhang et al., 2012), and African American men (Chornokur et al., 2013).However, no study has focused on the effect of HNF1B gene on AAO of cancer.In the present study, we provided the first evidence that two previously cancer risk associated SNPs (rs430796 and rs7501939) within HNF1B gene were associated with AAO of cancer.We added that 3 more SNPs (rs3110649, rs757210, and rs4239217) were associated with AAO of cancer.
Studies suggest that type 2 diabetes (T2D) might share the same genetic link to prostate cancer.HNF1A S319 was indicated to be associated with earlier AAO of T2D in women (Hegele et al., 2000).Two SNPs (rs7501939 and rs4430796) of HNF1B were reported to be associated with T2D in Chinese as well as in Caucasians (Gudmundsson et al., 2007).Recently, the results of some GWAS studies provided support for a shared genetic contribution to the risk of T2D and prostate cancer.For example, in the study by Gudmundsson et al. (2007), the A allele of rs4430796 and C allele of rs7501939 variants in HNF1B/TCF2 showed positive associations with prostate cancer (OR>1.0)but was protective against T2D (OR<1.0).A later study confirmed the association of rs4430796 with T2D and prostate cancer and suggested that T2D had a protective effect on prostate cancer risk (Piece et al., 2010).A meta-analysis examined the two variants (rs4430796 and rs7501939) and found they had pleiotropic effects on T2D and prostate cancer (Elliott et al., 2010).The present study added that the HNF1B gene may play a role in the development of cancer.
There are some limitations in this study.First, the definition of cancer status in the Marshfield sample was broad (including any diagnosed cancer omitting minor skin cancer) which may result in genetic and phenotypic heterogeneity into the genetic association analysis.It would be more informative to investigate the association of HNF1B with specific types of cancer.Second, our current findings might be subject to type I error and findings need to be replicated in additional samples.In addition, in the present study, we just used the original Cox regression model and parametric survival models.Interestingly, investigators have extended original Cox regression and parametric models.For example, Li et al. (2016) recently developed the proportional generalized odds (PGO) model, which covers the proportional odds (PO) model (Bennett, 1983;Pettitt. 1984) and the generalized proportional odds (GPO) model (Dabrowska and Doksum, 1988).On the other hand, Musrafa et al. (2016) proposed a new four parameters Weibull model called the Weibull Generalized Flexible Weibull extension (WGFWE) distribution and Alkarni (2016) introduced a new family of models for lifetime data called generalized extended Weibull power series family of distributions by compounding generalized extended Weibull distributions and power series distributions; while Pu et al. (2016) proposed a new class of five parameters gamma-exponentiated or generalized modified Weibull (GEMW) distribution.In the present study, we just tested two parameters Weibull model.In the future, it will be prospective to test and apply these extended survival models in the genetic association of AAO of complex diseases.
There are also several strengths in this study.First, our sample size was relatively large for this type of study.Second, we compared the semi-parametric Cox regression and parametric models in the genetic association of AAO of cancer.Third, we examined 23 SNPs within the HNF1B gene and especially identified 2 cancer and T2D associated SNPs (rs4430796 and rs7501939) influencing the AAO of cancer.
In conclusion, the results demonstrate that the parametric Weibull model performed better than Cox regression and other parametric models (including exponential, log-normal, log-logistic and gamma models) for the genetic association of AAO of cancer.Furthermore, this study provides evidence of several genetic variants within the HNF1B gene influencing AAO of cancer.These findings may serve as a resource for replication in other populations.Future functional study of this gene may help to better characterize the genetic architecture of the AAO of cancer.

APPENDIX
The following program using PROC PHREG showed one SNP rs4239217, sex, alcohol use, smoking status, and obesity with the AAO of cancer.The rs4239217 has 3 genotypes -A_A, A_G and G_G, respectively; while the G_G genotype was considered as the reference.

Table 2 :
Results of the Cox Regression and Parametric Models in the Multivariate Survival Analysis of AAO of Cancer ,b AIC and BIC for rs3110649 adjusted for sex, alcohol use, smoking status, and obesity; c,d AIC and BIC for rs757210 adjusted for sex, alcohol use, smoking status, and obesity; e,f AIC and BIC for rs430796 adjusted for sex, alcohol use, smoking status, and obesity; g,h AIC and BIC for rs4239217 adjusted for sex, alcohol use, smoking status, and obesity; i,j AIC and BIC for rs7501939 adjusted for sex, alcohol use, smoking status, and obesity.438Comparison of Cox regression and Parametric Models for Survival Analysis of Genetic Variants in HNF1B gene Related to Age at Onset of Cancer a

Table 3 :
Survival Analysis of the 5 SNPs Associated with AAO Using the Weibull Model Minor allele; b Minor allele frequency; c Hardy-Weinberg equilibrium test p-value; d Regression coefficient for AAO of cancer based on the Weibull model; e 95%CI of regression coefficient for AAO of cancer based on the Weibull model; f p-value for AAO of cancer based on Weibull model; g Time ratio (TR) for the genotype comparing with reference; h 95%CI of TR for the genotype comparing with reference . a