MINIMUM PROFILE HELLINGER DISTANCE ESTIMATION FOR A TWO-SAMPLE LOCATION-SHIFTED MODEL

Minimum Hellinger distance estimation (MHDE) for parametric model is obtained by minimizing the Hellinger distance between an assumed parametric model and a nonparametric estimation of the model. MHDE receives increasing attention for its efficiency and robustness. Recently, it has been extended from parametric models to semiparametric models. This manuscript considers a two-sample semiparametric location-shifted model where two independent samples are generated from two identical symmetric distributions with different location parameters. We propose to use profiling technique in order to utilize the information from both samples to estimate unknown symmetric function. With the profiled estimation of the function, we propose a minimum profile Hellinger distance estimation (MPHDE) for the two unknown location parameters. This MPHDE is similar to but different from the one introduced in Wu and Karunamuni (2015), and thus the results presented in this work is not a trivial application of their method. The difference is due to the two-sample nature of the model and thus we use different approaches to study its asymptotic properties such as consistency and asymptotic normality. The efficiency and robustness properties of the proposed MPHDE are evaluated empirically though simulation studies. A real data from a breast cancer study is analyzed to illustrate the use of the proposed method.


Introduction
Minimum distance estimation of unknown parameters in a parametric model is obtained by minimizing the distance between a nonparametric distribution esti-mation (such as empirical, kernel, etc) and an assumed parametric model.Some well-known examples of minimum distance estimation include least-squares esti-mation and minimum Chi-square estimation.Among different minimum distance estimations, minimum Hellinger distance estimation (MHDE) receives increasing attention for its superior properties in efficiency and robustness.The idea of the estimation using Hellinger distance was firstly introduced by Beran (1977) for parametric models.Simpson (1987) examined the MHDE for discrete data.Yang (1991) and Ying (1992) studied censored data in survival analysis by using the MHDE.Woo and Sriram (2006) and Woo and Sriram (2007) employed the MHDE method to investigate mixture complexity in finite mixture models.The MHDEs for mixture models were also studied by many literatures such as Lu et al. (2003) and Xiang et al. (2008).Other applications of the MHDE method can be referred to Takada (2009), N'drin and Hili (2013) and Prause et al. (2016).
For any given θ, since X1 -θ0, . . ., Xn0 -θ0, Y1 -θ1, . . ., Yn1 − θ1 are i.i.d.r.v.s from f , we can estimate the unknown f using the following kernel density estimator based on the pooled sample: where  0 = 0/,  1 = 1 −  0 = 1/, kernel function K is a symmetric density function, the bandwidth bn is a sequence of positive constants such that bn → 0 as n → ∞, and  ̂0 and  ̂1 are kernel density estimators of f0 and f1, respectively.To be specific, f0 and f1 have and Even though ρ0 and ρ1 depend on n, we depress their dependence for notation simplicity.We generally require that ni/n → ρi as n → ∞ with ρi ∈ (0, 1), i = 0, 1.Based on (2), f0 and f1 can also be estimated respectively by and To obtain the MPHDE of θ, we firstly profile the unknown nuisance parameter f out by minimizing the sum of the squared Hellinger distance for the two samples, i.e.where in the last equality we represent  ̂ as a functional T which only depends on  ̂0 and  ̂1.
As there is no explicit expression of the solution to the above optimization in (5),  ̂ has to be calculated numerically.In this manuscript, the computation was implemented by the R function "nlm" with the medians of Xi and Yi to be the initial values of  0 and  1 , respectively.The numerical optimization leads to satisfactory results in our simulation and data application studies.All of them successfully achieve convergence.model is identifiable if ρ0 ∈ (0, 0.5) ∪ (0.5, 1).If f is unimodal, then this mixture model is identifiable even when ρ0 = 0.5.Therefore the identifiability is not a problem for the MPHDE and we will assume from now on that the mixture model is identifiable for the sake of simplicity.
Remark 3.For one-sample location model (． − ), the Hellinger distance is between the location model, involving both f and θ together, and its nonparametric estimation.For this two-sample model, in order to use the information about the nuisance parameter f contained in both the first and second samples, the Hellinger distance is between f and its estimation that involves the nuisance density estimation and the location parameters of our interest.

Asymptotic Properties
In this section, we discuss the asymptotic distribution of the MPHDE  ̂ given in (5) for the two-sample semiparametric location-shifted model (1).Note that  ̂ given in ( 5) is a bit different than the MPHDE defined in Wu and Karunamuni (2015) for general semiparametric models in the sense that the former incorpo-rates the model assumption in the nonparametric estimation of f while the later uses a completely nonparametric estimation of f not depending on the model at all.In this sense, we can not apply the asymptotics obtained in Wu and Karuna-muni (2015) to our model (1).
Instead we will directly derive below the existence,consistency and asymptotic normality of  ̂.Let F be the set of all densities with respect to (w.r.t.) Lebesgue measure on the real line.We first give in the next theorem the existence and uniqueness of the MPHDE  ̂.
The following theorem is a consequence of Theorem 1 which gives the consis-tency of the MPHDE  ̂ defined in (5).
Theorem 2. Suppose that the kernel K in ( 3) and ( 4) are absolutely continuous, has compact support and bounded first derivative, and the bandwidth bn satisfies bn → 0 and and furthermore the MPHDE  ̂ → .
The next Theorem 3 gives the expression of the different  ̂−  which will be used to establish the asymptotic normality of θˆ in Theorem 4.
Theorem 3. Assume that the conditions in Theorem 2 are satisfied.Further suppose f has uniformly continuous first derivative.Then where With (6) and some regularity condition we can immediately derive the asymptotic distribution of  ̂−  given in the next theorem.

Simulation Studies
We assess the empirical performance of the proposed MPHDE in Section 2 for the two-sample location-shifted model.Five hundred simulations are run for each parameter configuration.We consider a parameter setting of ( 0 ,  1 ) ⊤ = (0,1) ⊤ and simulate four different distributions for f (x): normal, Student's t, triangular and Laplace.We set the standard deviation to be 1 for normal distribution, the degrees of freedom to be 4 for t distribution.The triangular distribution has density function   1 are for θ0, n0 = n1 = 20 and the case that the first sample is contaminated.The results for θ1, n0 = n1 = 50 or the case that the second sample is contaminated are very similar to those in Figure 1 and thus omitted to save space.
Figure 1 presents the average α-IFs over 500 simulation runs for the MPHDE, MLE and LSE of θ0 under normal, t, triangular and Laplace distributions.Regardless of the population distribution, the α-IF of the MPHDE are bounded and converge to the same small constant when the value of the outlying observation gets larger and larger on either side, while the α-IFs of the MLE and LSE are unbounded in general.Therefore, compared to the MLE and LSE methods, the MPHDE has a little lower efficiency but this limitation is compensated by its excellent robustness.In summary, the MPHDE method always results in reasonable estimates no matter data is contaminated or not, whereas the MLE and LSE methods under contaminated data lead to significantly biased estimates.

Data Applications
In this section, we demonstrate the use of the proposed MPHDE method through analyzing a breast cancer data collected in Calgary, Canada (Feng et al., 2016).Breast cancer is regarded as the most common cancer and the second leading cause of cancer death for females in North America.
Existing studies suggest that it would be more informative to use some protein expression levels as indicators of biological behavior (Feng et al., 2015).These biomarkers could reflect genetic properties in cancer formation and cancer aggressiveness.Our dataset has 316 patients diagnosed with breast cancer between years 1985 and 2000.Two interested biomarkers measured on these patients are Ataxia telangiectasia mutated (ATM) and Ki67.ATM is a protein to support maintaining genomic stability.Comparing with normal breast tissue, ATM could be significantly reduced in the tissue with breast cancer.Ki67 is a protein expressed exclusively in proliferating cells.It is often used as a prognostic marker in breast cancer.
Let  (1) and  (2) denote the location parameters in the distributions of the protein expression level of ATM and Ki67 biomarkers, respectively.Our research focuses on the comparison of the protein expression levels across both cancer stages (Stage) and lymph node (LN).As for cancer stage,   To compare the two biomarkers ATM and Ki67, we calculate the MPHDEs  0 () and  1 () for both k = 1 and k = 2.The parameter estimates (Est.), estimated standard errors (SE), 95% confidence intervals (CI) and p-values are reported in Table 3.Based on the results in this

Concluding Remarks
In this paper, we propose to use MPHDE for the inferences of the two-sample semiparametric location-shifted model.Compared with commonly used least-squares and maximum likelihood approaches, the proposed method leads to ro-bust inferences.Simulation results demonstrate satisfactory performance and the analysis for the breast cancer data exemplifies its utility in real practice.

Appendix
The proofs of Theorems 1, 2, 3 and 4 are presented in this section.The techniques used in the proofs are similar to those in Karunamuni and Wu (2009).

Remark 3 .
Distributions satisfying ∫  ′′ ()  = 0 include those with support on the whole real line, such as normal and t distributions.The distributions satisfying ∫  ′′ ()  ≠ 0 include those with finite support and its first derivative evaluated at boundary of support is non-zero, such as f(x) = Remark 4. If the two samples in (1) are actually a single sample from the mixture  0 (• − 0 ) +  1 (• − 1 ) with known classification for each data point, then by comparing the lower bound of asymptotic variance described inWu and Karunamuni (2015) with the results in our Theorem 4, we can conclude that the proposed MPHDE  ̂ defined in (5) is efficient, in the semiparametric sense, for any f .In addition, if ∫  ′′ ()  = 0 , then this semiparametric model is an adaptive model and the proposed MPHDE  ̂ is an adaptive estimator.

and we set c = 1 . 16 ( 1 −
The Laplace distribution has density function (we set b = 1.The bandwidth bn is chosen to be bn =  −1/5 according to the bandwidth requirement in Theorem 4. The biweight kernel () = 15  2 ) 2 for |t| ≤ 1 is employed in the simulation studies.We consider both smaller sample sizes n0 = n1 = 20 and larger sample sizes n0 = n1 = 50.As a comparison, we also give both least-squares estimation (LSE) and max-imum likelihood estimation (MLE).For the two-sample location-shifted model (1) under our consideration, simple calculation shows that the LSEs of θ0 and θ1 are essentially the sample means  ̅ and  ̅ respectively.With f assumed known, straight calculation says that the MLEs of θ0 and θ1 are sample means for normal case and sample medians for Laplace case, while there is no explicit expression of the MLEs for Student's t and Triangular populations.Tables1 and 2display the simulation results of MPHDE, LSE and MLE methods for sample sizes n0 = n1 = 20 and n0 = n1 = 50, respectively.In the tables, the term Bias represents the average of biases over the 500 repetitions; the terms RMSE and SE are the average of root mean squared errors and empirical standard errors, respectively; and the term CR represents the empirical coverage rate for 95% confidence intervals.From Tables1 and 2we can see that all the three estimation approaches have fairly small bias.In terms of standard errors, the MPHDE has worse performance than the LSE and the MLE regardless of sample size.To investigate the robustness properties of the proposed MPHDE and make comparison, we examine the performance of the three methods under data con-tamination.In this simulation, the data from model (1) is intentionally contami-nated by a single outlying observation.This is implemented, say for n0 = n1 = 20,by replacing the last observation X20 with an integer number z varying from −20 and 20.To quantify the robustness, the α-influence function (α-IF) discussed by Lu et al. (2003) is used.The α-IF for parameter θi, i = 0, 1, is defined as () =   ( ̂ −  ̂), where  ̂ represents the estimate based on the contaminated data with outlying observation X20 = z and  ̂ denotes the estimate based on the uncontaminated

2 )
and  1 () (k = 1, 2) denote the location parameters in the distributions of protein expression level for Stage I and Stage II/III patients, respectively.Regarding LN status,  0 () and  1 () (k = 1, denote the location parameters in the distributions of protein expression level for negative LN (LN-) and positive LN (LN+) patients, respectively.Figure 2 displays the boxplots for ATM and Ki67 expression levels across both cancer stages and LN statuses, respectively.From this figure we do see the difference in location of both ATM and Ki67 variables across both cancer stages and LN statuses, especially for Ki67 considering the smaller variation in expression level.

Figure 1 :
Figure 1: The average α-IFs under (a) normal distribution, (b) Student's t dis-tribution, (c) triangular distribution and (d) Laplace distribution.Thin-solid line represents the zero horizontal baseline, and the thick-solid, dot-dashed and dashed lines represent respectively the MPHDE, LSE and MLE approaches.
data.The α-IF is calculated by using the change in the estimate before and after contamination divided by the contamination rate, i.e. 1/ni.We can similarly calculate the α-IF when outlying observations contaminate the second sample.The simulation results in Figure

Table 3 :
Breast cancer data analysis results based on MPHDE.
level in negative LN group than in positive LN group (p = 0.019), while Ki67 has lower expression level in negative LN group than in positive LN group (p < 0.001).