Bandwidth selection for kernel based interval estimation of a density

It is always useful to have a confidence interval, along with a single estimate of the parameter of interest. We propose a new algorithm for kernel based interval estimation of a density, with an aim to minimize the coverage error. The bandwidth used in the estimator is chosen by minimizing a bootstrap estimate of the absolute value of the coverage error. The resulting confidence interval seems to perform well, in terms of coverage accuracy and length, especially for large sample size. We illustrate our methodology with data on the eruption durations for the Old Faithful geyser in USA. It seems to be the first bandwidth selector in the literature for kernel based interval estimation of a density.


Introduction
We consider the problem of construction of a two sided confidence interval for ( 0 ), where  is the unknown density generating the given data and  0 is a given design point.A density function may be arbitrarily specified at a point  0 .This technical difficulty is overcome by assuming that  is a continuous function.In the sequel we assume that  is continuous and  0 is an interior point of the support of .
One of the most well known estimators of  is a kernel density estimator (KDE) defined as follows.
 Corresponding author.
In contrast, far less seems to be known regarding the choice of ℎ for constructing a confidence interval for ( 0 ) using  ̂( 0 ).For instance, in [1] the authors mention that there seems to be no automatic method for practical interval estimation for ( 0 ) available in the literature.From the simulation study in [6] we see that the bandwidth which is appropriate (in terms of coverage accuracy) for confidence interval construction is not easy to determine.No data based method for selecting such a ℎ was suggested by the author.Chen proposed empirical likelihood confidence intervals for density estimation, but again no bandwidth selection method was provided (see [2]).Fiorio has discussed two programs, viz."asciker" and "bsciker" in Stata, to compute asymptotic and bootstrap confidence intervals for kernel density estimation.However these programs assume that the search for the correct bandwidth has been performed beforehand (see page 173 in [8]).Therefore these algorithms cannot be used to determine the appropriate amount of smoothing for kernel based interval estimation.In this paper we propose an algorithm for data based choice of h with an aim to minimize the coverage error of the resulting confidence interval.
The bias b is not negligible even for a bandwidth minimizing the mean squared error.There are two approaches to tackle the bias b, viz.either to estimate the bias explicitly, or to reduce it substantially by under smoothing (see [6]).In [6], the author showed that under smoothing method produces confidence intervals with greater coverage accuracy than those obtained by explicit bias correction.There are several other practical advantages of the under smoothing method.For instance, in the under smoothing approach no estimator of the bias is required.
In the under smoothing approach we essentially construct a confidence interval for E( ̂( 0 )) using a small value of the bandwidth, such that the same interval can be used to perform inference on ( 0 ) Horowitz suggested to perform under smoothing with n − , k>1/(2r+1), where r is the kernel order (see [7]).From [6] we see that the bandwidth which minimizes the asymptotic coverage error of a two sided under smoothed interval is of the form ℎ =   −1/(+1) , where H is a constant depending on  () .However, Hall pointed out that substantial under smoothing is not a practical proposition.He suggested to use h=c1.05\γ̂−1/5 for under smoothing  ̂( 0 ), where 0< c < 1. γ ̂ is the sample standard deviation.The values of c which give good coverage accuracy for given  0 , n and distribution are not easy to determine empirically (see numerical results in [6]).We provide a solution to this problem.
Given  1 ,....,   and a bandwidth h, a two sided under-smoothed bootstrap 1 −α confidence interval of ( 0 ) is defined as where  ̂ is the αth quantile of a bootstrap approximation of the sampling distribution of S. It is in fact the αth quantile of the conditional distribution of S*= ( ̂ *  ( 0 , ℎ) −  ̂( 0 , ℎ))/σ ̂ * , given  1 ,....,   . ̂ *  ( 0 ), σ ̂ * are the versions of  ̂( 0 ), and σ ̂, obtained by replacing  1 ,....,   by Efron's (1979)  The exact coverage probability (1− )(ℎ ̂), of the confidence interval using ℎ ̂, is hard to compute.However for any given , we can approximate the coverage probability using Monte-Carlo simulations.In a simulation study, in Section 3, we compute the Monte-Carlo estimates of (1− )(ℎ ̂) for different choices of ,  0 and n.We also report the average width and the variance of the widths of the confidence intervals.These results are compared with the findings in [6].where   is a compact interval with endpoints equal to scale invariant bandwidths, which are smaller than the bandwidth minimizing the MISE.As mentioned earlier, Hall suggested to use ℎ ̂= 1.05cγ ̂  −1/5 , 0< c ≤ 1, for under smoothing (see [6])}.Motivated by this proposal we use

Our proposal
Hall considered a wide range of values of c varying from 0.1 to 1, and showed that widely different values of c are appropriate under different circumstances (see Table 1 in page 687 in [6]).Motivated by this, we use  1 =0.1 and  2 =1.With these choices of  1 ,  2 ,   covers all the under smoothing bandwidths considered by Hall in the simulation study in [6].
The proposed two sided under smoothed bootstrap 1− α confidence interval of ( 0 ) is defined as 2.1 Some computational details
We draw B 2 bootstrap re-samples.For each re-sample we compute  * .There are B 2 values of  * corresponding to the re-samples.Now  ̂ is the αth sample quantile based on these B 2 values.
1.As mentioned earlier  * (1 − ) is a two sided confidence interval for E( ̂( 0 )).The above mentioned algorithm essentially imitates the Mone-Carlo (MC) method of approximating the exact coverage probability of (1 − α)(ℎ), for any given  and h.In the MC method we draw random samples from a given distribution, and for each sample we compute I(1 − ) by the re-sampling method described earlier.The MC estimate of (1 − α)(ℎ) is the number of the intervals containing E( ̂( 0 )) divided by the number of random samples drawn.We imitate this procedure, replacing the actual distribution by the empirical distribution.
We note that  ̂( 0 )=  * (  ̂ * ( 0 )), where  * denotes the expectation with respect to the empirical distribution.So the bootstrap version of I(1 − ) is a confidence interval for  ̂( 0 ), given  1 ,....,   .In our method the 1st stage re-samples, drawn from the empirical distribution, mimic the role played by the random samples drawn from the actual distribution in the MC method.
2. We use the same 1st stage re-samples and 2nd stage re-samples (obtained by resampling each 1st stage re-sample in step [ii] of the above algorithm) to compute  * (1− α) (h) for different values of h, as required in a numerical minimization algorithm.This feature reduces the computational burden.
3. Given a confidence interval, Monte-Carlo approximation of its coverage probability essentially involves estimating an average of a random function using Monte-Carlo simulations.From [5] we see that much larger number of Monte-Carlo re-samples are required for approximating a bootstrap quantile estimator accurately, than the same required for approximating a bootstrap estimator of the expectation of some random function.Therefore we use different number of re-samples, viz.B 2 and B 1 , to approximate the bootstrap estimators of the quantiles and the coverage probability by Monte-Carlo method.

Monte Carlo sample size for bootstrap-resampling
From [10] we see that the selection of appropriate B 1 and B 2 are not easy problems.As a rule of thumb, [5] suggested that for Monete-Carlo approximation of bootstrap moment estimators the number of bootstrap re-samples should be 50 to 200.For approximating bootstrap quantile estimators the number of bootstrap re-samples should be at least 1000 (see [5] ) .We use this rule of thumb, and use B 1 =200, B 2 =1000.

Simulation
Hall conducted simulations to study the effect of the choice of h on the cover-age probability of an under smoothed bootstrap confidence interval (1 -α)(ℎ)was examined for six combinations of  and  0 (see [6]).The author used h = c1.05γ̂n−1/5 , where 0 < c ≤ 1, for under smoothing the density estimator.In his simulations  equals to the N(0, 1) density and the (1/2)N(0, 1)+(1/2)N(3, 1) density, and  0 equal to 0, 0.75 and 1.5.The notation pN(µ 1 ,  1 2 ) + (1 2 ) represents a two component mixed normal distribution, where µ  , σ  2 are the mean and variance of the ith mixing component.For both these test den-sities,  0 = 0 is the peak of the density.Hall reported the Monte Carlo estimates of the exact coverage probability (1 − α)(ℎ), along with the average and stan-dard deviation of the interval length.It was observed that the coverage accuracy of the confidence interval for f at the peak was less than the same at other point.
In [1], the authors considered the problem of interval estimation of (0), where  is a standard normal density.From their simulations (page 513, in [1] we see that neither the coverage error nor the length of their 95 percent interval seem to decrease as n is increased more than two times.This is perhaps due to the fact that random bandwidth proposed by Chan Lee and Peng is suitable for point estimation of  at  0 .In [7], the author pointed out that nonparametric point estimation and interval estimation are different tasks that require different degrees of smoothing. In this section we study effect of the proposed random bandwidth ℎ ̂on the coverage probability and the average length of I(1 − α), for different choices of  and  0 and α =0.05.We consider the above mentioned choices of  and  0 as in [6].Both these densities are unimodal, with peak at  0 = 0.In addition we consider two more test densities, viz. equal to the (1/2)N(−1, 1/2)+(1/2)N(1, 1/2) density and the gamma(2,1) density.For the (1/2)N(−1, 1/2) + (1/2)N(1, 1/2) density there are two peaks of same height at −1 and 1, and a trough at 0. We estimate this density at  0 equal to 0 and 1.For the gamma density peak occurs at 1.We estimate the height of the gamma density and  0 equal to 1 and 4.474, which is the 95th percentile.To compute the Monte-Carlo estimate of the coverage probability of a confidence interval we draw m random samples of a specific size from a test distribution, and compute the confidence interval for each sample.So there are m such intervals.The Monte-Carlo estimate of the coverage probability is equal to number of intervals containing (0), divided by m.In Table 1 we use  1 =0.1 and c 2 = 1.
In Table 2 we report the Monte-Carlo estimates of the coverage probability, average length and variance of the confidence intervals using ℎ = 1.05γ̂n−1/5 , for different choices of c and  equal to the (1/2)N(−1, 1/2) + (1/2)N(1, 1/2) density and the gamma(2,1) density.If the mean or the variance of the length of the confidence interval exceeds 100, we write "large".
In Table 1 we report the Monte-Carlo estimate of the coverage probability, average length and variance of the proposed confidence interval I(1 − α)(ℎ ̂), in (2.2), for 10 combinations of  and  0 .We compute each estimate for n = 50 and n = 100.To compute Monte-Carlo estimate we draw m = 300 samples from each test density.We have the following observations.Table 1: Monte Carlo estimates of (1 − α)(ℎ ̂) for h eaual to ℎ ̂ and α = 0.05 (i) The confidence interval I(1 − α)(ℎ ̂), using the proposed random bandwidth ℎ ̂ in (2.1), seems to perform consistently.The coverage error, the mean and the variance of the interval length seem to reduce as sample size is increased for all choices of  and  0 .
(ii) From the simulation study in [6] and our Table 2, we see that the coverage probability and length of the confidence intervals using ℎ = 1.05γ̂n−1/5 ,\ 0 < c ≤ 1, can vary widely depending on estimation point  0 and c.
(iii) In contrast, the simulations in Table 1 indicate that for a given distribution the coverage accuracy of the confidence interval using ℎ ̂ does not seem to vary drastically with the change in  0 , especially for n=100.This is due to the fact that proposed bandwidth selector is a function of the estimation point  0 , and so the resulting bandwidth ℎ ̂ automatically adjusts the amount of smoothing depending on  0 .
(iv) From the simulations in [6] we see that for  equal to the (1/2)N(0, 1) + (1/2)N(3, 1) density and  0 equal to the peak, the coverage probability of the under smoothed confidence interval is poor especially for c > 0.5 in ℎ = 1.05γ̂n−1/5 .From our Table 2 we see that a similar observation is also true for  0 equal the trough between the two peaks of the (1/2)N(-1, 1/2) + (1/2)N(1, 1/2) density.Hall pointed out that the coverage error of confidence interval for estimation  at the peak is in general higher than the same at other points, as the bias in a kernel density estimator is more pronounced at a peak.We observe that the same argument is also true for  0 equal to a trough.Moreover from Table 2 we see that while estimating the gamma density at the peak the under smoothed confidence interval using ℎ = 1.05γ̂n−1/5 performs poorly for every choice c.However, simulations in Table 1 suggest that the proposed confidence interval I(1 − α)(ℎ ̂)performs well in estimating  at the peak as well as the trough, in terms of the coverage accuracy, especially for n=100 and irrespective of .
(v) From the simulations in [6] and our Tables 1 and 2, we see that the mean and the variance of the length proposed confidence interval compares well with the lengths of the corresponding confidence intervals using ℎ = 1.05γ̂n−1/5 in [6].

Faithful data analysis
A well known data set in the context of density estimation is the data on the durations (in minutes) of eruptions for the Old Faithful geyser in Yellowstone National Park, Wyoming, USA.It is available in the R software (see data set ``faithful" in R).The histogram based on the frequency density of the raw data is plotted in Figure 1.We construct the 95 percent confidence intervals at 30 equidistant grid points using the proposed method.These upper and lower limits of the confidence intervals are marked as red and blue ``bubbles" in Figure 1.We also plot the kernel density estimates using the plug-in bandwidths proposed by Sheather and Jones (see [9]) and the least square cross validation bandwidth.The cross validation and plug-in density estimates are numbered as 1 and 2 in Figure 1.We observe the following.
The data is strongly bi-modal.The upper limits of the 95 percent confidence intervals seem to close to the frequency density of the raw data at the grid points (see the red "bubbles" in Figure 1").The left-peak in the cross validation based curve is taller than the same in the plug-in curve.Both the density estimates are within the 95 percent confidence interval near the two peaks.It seems that the left peak of the underlying density can be taller than the same in the plug-in based curve.The cross validation based density estimate seems to be reasonable near the left-peak.The cross validation curve is always within the confidence intervals at the grid points.The plug-in curve seems to lie outside confidence interval at the grid points in the left tail region.So for this data set the cross validation density estimate seems to be a more reasonable fit.
Clearly the confidence intervals, along with the point estimates of the density, enable a more detailed analysis of the data than that based on a single density estimate.

Final Remarks
From the above simulation study it appears that the confidence interval I(1 − α)(ℎ ̂) in (2.2) performs well for all the test densities, especially for n=100.Simulations in our Table 2 suggest that if  is a density with positive support and  0 is the peak, the under smoothed confidence interval for ( 0 ) using ℎ = 1.05γ̂n−1/5 performs poorly for all the different choices of c mentioned in [6].In contrast, the coverage error or the average length of I(1 − α)(ℎ ̂) does not seem to vary drastically for different choices of  0 .So the proposed bandwidth selector can be recommended safely for interval estimation of ( 0 ), especially for large sample size.We observe that using a confidence band in conjunction with the usual global density estimates more detailed information can be extracted from the faithful eruption duration data, than that obtained using a single density estimate.
classical bootstrap re-sample  1 * ,....,  * . in (1.1) and (1.2).The (exact) coverage probability of I(1 -α) is defined as [6] suggested to select h with an aim to minimize the absolute value of the coverage error, viz.CE=|β(1− α)-(1− α)|.However β(1−α) is a function of the unknown .So for practical data based choice of h, CE has to be estimated based on  1 ,....,   .Using Efron's (1979) classical bootstrap method we propose an estimate of the CE and it is minimized (with respect to h) for data based choice of the bandwidth.Let ℎ ̂ denote the proposed data based bandwidth.The details of our proposal are given in Section 2.

Figure 1 :
Figure 1: Fig 1: the 95 percent confidence bands and the density estimates using plug-in bandwidth (the red curve) and the least square cross validation bandwidth (the blue curve) based on the eruption durations data.