Nonparametric inference for $P(X

We propose two classes of nonparametric point estimators of $\theta=P(X<Y)$ in the case where $(X,Y)$ are paired, possibly dependent, absolutely continuous random variables. The proposed estimators are based on nonparametric estimators of the joint density of $(X,Y)$ and the distribution function of $Z=Y - X$. We explore the use of several density and distribution function estimators and characterise the convergence of the resulting estimators of $\theta$. We consider the use of bootstrap methods to obtain confidence intervals. The performance of these estimators is illustrated using simulated and real data. These examples show that not accounting for pairing and dependence may lead to erroneous conclusions about the relationship between $X$ and $Y$.


Introduction
The study of stress-strength models have received considerable attention for many years due to its applicability in diverse areas. The main interest in this kind of models is the quantity θ = P (X < Y ), where X and Y are random variables. In medicine for example, if X and Y are the outcomes of a control and an experimental treatment respectively, the parameter θ can be interpreted as the effectiveness of treatment Y (Ventura et al., 2011). This quantity is also related to the Receiver Operating Characteristic (ROC) curves, where θ is interpreted as an index of accuracy (Zhou, 2008). In engineering and reliability studies θ is also a quantity of interest because it may represent the probability that the strength of a component (Y ) exceeds the stress (X) coming from external factors (Kotz et al., 2003).
Stress-strength models were introduced by Birnbaum (1956) who proposed a nonparametric estimator of θ based on the Mann-Whitney statistic for the case where X and Y are independent. There is a large amount of literature related to the study of point and interval estimation of θ using different approaches (see Kotz et al., 2003 for a good survey on this). For instance, in the case where X and Y are independent, Sun et al. (1998) proposes a Bayesian approach using reference priors; Baklizi and Eidous (2006) propose an estimator based on kernel estimators of the densities of X and Y (which can be straightforwardly generalised to the use of other nonparametric density estimators); Zhou (2008) proposes the use of bootstrap and asymptotic intervals; Jing et al. (2009) estimate θ using the empirical likelihood; Montoya (2008) and Díaz-Francés and Montoya (2013) propose the use of the profile likelihood for conducting inference about θ; and Ventura et al. (2011) propose the use of Bayesian inference with Jeffreys and matching priors as well as modified profile likelihoods for the cases where X and Y are normal or exponential random variables.
It is important to mention that the parameter θ may not be available in a closed form in many cases (see Azzalini andChiogna, 2004 andBrown, 2001 for an example of this).
This makes difficult (if at all feasible) to find a reparameterisation involving θ, which complicates the use of the classical approach. In particular, the use of the profile likelihood might be difficult if this reparameterisation is not available (Díaz-Francés and Montoya, 2013). Alternative inferential approaches that overcome this difficulty are Bayesian inference, nonparametric estimation, and the use of bootstrap methods, which allow for obtaining confidence and credible intervals for the parameter of interest (Baklizi and Eidous, 2006;Zhou, 2008;Rubio and Steel, 2013).
New interest has been focused on the estimation of θ in the case where X and Y are dependent random variables. For example Barbiero (2011) assumes that (X, Y ) are jointly normally distributed; Rubio and Steel (2013) suppose that X and Y are marginally distributed as skewed scale mixture of normals and construct the corresponding joint distribution using a Gaussian Copula; Domma and Giordano (2012a) construct the joint distribution of (X, Y ) using a Farlie-Gumbel-Morgenstern copula with marginal distributions belonging to the Burr system; Domma and Giordano (2012b) consider Dagum distributed marginals and construct their joint distribution using a Frank copula; among others (Nadarajah, 2005;Gupta et al., 2012). In these papers, the importance of taking the assumption of dependence between X and Y into consideration is illustrated using simulated and real data sets.
We propose two classes of nonparametric estimators of θ for the case where (X, Y ) are paired, possibly dependent, continuous random variables. This scenario is of interest since paired observations are produced in many experimental designs (see e.g. Sprott, 2000 andCox andReid, 2000 for examples of this). The estimators proposed here are based on nonparametric estimators of the density of (X, Y ) and the distribution function of Z = Y − X. This approach avoids making distributional assumptions over (X, Y ) and allows for interval esti-mation of θ via nonparametric bootstrap. In addition, this method can be easily implemented in R using already existing packages. In Section 2 we introduce these estimators and prove some asymptotic properties for the choice of several nonparametric estimators. We also detail how to combine kernel density estimation (KDE) with the methods proposed here. In Section 3 we present two examples, using simulated and real data, which illustrate the importance of accounting for pairing and dependence of the observations when conducting inference about θ.

Nonparametric estimators of θ
Let (X, Y ) be a pair of absolutely continuous random variables with joint density f X,Y : R 2 → R + . By definition, we have that (1) Alternatively, by defining the variable Z = Y − X we obtain where f Z , F Z and S Z are the density function, the cumulative distribution function, and the survival function of Z, respectively. These equivalent expressions suggest the following nonparametric methods for estimating the parameter θ.

Estimator I
Let (x, y) be a sample from (X, Y ) of size n and suppose that these observations are collected in couples (x i , y i ), i = 1, . . . , n. The first proposed estimator, based on expression (1), consists of substituting the density f X,Y by a nonparametric density estimator as follows.

Algorithm 1
1: Using the sample (x, y) construct a nonparametric estimatorf X,Y of the density f X,Y . 2: Define the estimatorθ = Note that Algorithm 1 involves both a two-dimensional density estimation and the calculation of a double integral. Several nonparametric density estimators can be employed for this purpose such as kernel density estimators (Parzen, 1962), shape-restricted estimators (Cule et al., 2010), among others (Scott, 1992). This choice has, of course, implications on the performance of the estimators. In Section 2.4 we present some asymptotic properties ofθ for different choices off X,Y . The integration step can be conducted using quadrature or Monte Carlo methods.

Estimator II
Again, let (x, y) be a sample from (X, Y ) of size n and suppose that these observations are collected in couples (x i , y i ), i = 1, . . . , n. Define the vector of differences z = y − x. The second proposed estimator, based on expression (2), is constructed as follows.
2: Using the sample z construct a nonparametric estimatorF Z of the distribution function of Z . 3: Define the estimatorθ = 1 −F Z (0).
For the nonparametric distribution estimatorF Z in step 2 we can employ the empirical cumulative distribution function (ECDF) or the induced distribution estimators obtained by integrating a nonparametric density estimatorf Z , which lead toθ = ∞ 0f Z (z)dz. In this line, several univariate nonparametric estimators off Z can be considered such as kernel density estimators (Parzen, 1962), shape-restricted density estimators (Cule et al., 2010) and smooth shaperestricted estimators (Dümbeng and Rufibach, 2009;Cule et al., 2010;Dümbeng and Rufibach, 2011).
Note that the use of both, Estimator I and Estimator II, avoids making assumptions on the distribution of (X, Y ) and the sort of dependence between the variables X and Y . The relationship between these variables, which can be either dependent or independent, is implicitly included in the nonparametric estimators of the density (distribution). In addition, the use of nonparametric bootstrap coupled with either Algorithm 1 or Algorithm 2 allows for obtaining a variety of bootstrap confidence intervals for these estimators (DiCiccio and Efron, 1996).

Use of Estimator I and Estimator II with kernel density estimators
In this section we discuss the use of KDE in Algorithms I and II. Recall that the use of KDE involve the choice of two elements: a kernel function and a bandwidth parameter (or bandwidth matrix in a multivariate framework). Here, we present a brief discussion on appropriate choices for these elements in our context.

Estimator I
Let H be a symmetric, positive definite, 2 × 2 bandwidth matrix and k 2 be a two-dimensional kernel function (Parzen, 1962). Define also k H (t) = (det H) − 1 2 k 2 (H − 1 2 t), t ∈ R 2 . If we consider the use of a KDE in step 1 of Algorithm 1, then the estimatorθ can be written as which can be calculated using quadrature or Monte Carlo methods. The implementation of this estimator requires the specification of the kernel function k 2 and the bandwidth matrix H. A natural first choice is the use of a bivariate Gaussian kernel φ 2 = k 2 . The choice of the bandwidth matrix H can be crucial for the performance of KDE, which has fostered an extensive study of several bandwidth matrix estimators (see Duong and Hazelton, 2005 for a good survey on this). However, appropriate bandwidth matrices for estimating the distribution involved in (3) seem to have been little studied to our knowledge. Nevertheless, as a first approach one can consider bandwidth matrix estimators employed in KDE such as the plugin and cross-validation bandwidth estimators, which are implemented in the R package 'ks' (Duong, 2011).

Estimator II
Let k 1 be a one-dimensional kernel and h > 0 be the corresponding bandwidth (also termed smoothing parameter). If we consider the use of a univariate KDE in step 2 of Algorithm 2, then the estimatorθ can be written aŝ where Again, a natural first choice is the Gaussian kernel Φ = K 1 . The choice of the bandwidth h in the context of density estimation has been extensively studied, we refer the reader to Jones et al. (1996) for a good survey on this. However, the choice of this parameter in the context of kernel distribution function estimation has received less attention. Quintela-del-Río and Estévez-Pérez (2012) present a compendium of appropriate bandwidth parameters in the context of kernel distribution estimator, they also implement these in the R package 'kerdiest'.

Results on the convergence of the proposed estimators
The convergence of Estimator I coupled with KDE is difficult to assess given the limited literature about the choice of appropriate bandwidth matrices for estimating a bivariate distribution.
Despite this limitation, one can expect a good performance of this estimator for moderate or large samples and the use of any reasonable bandwidth matrix since kernel estimators converge in terms of the mean square and mean absolute errors to the true density.
The following result shows that, even using a diagonal bandwidth matrix, the resulting estimator of θ is weakly consistent under rather mild conditions. The use of more appropriate bandwidth matrices is therefore expected to produce better estimators.
for u ≥ 0. Let {h n } ∞ n=1 be a sequence of positive numbers such that lim n→∞ h n = 0 and lim n→∞ nh 2 n = ∞. Define the sequence of bandwidth matrices H n = diag(h n ). Suppose also that one of the following conditions holds (i) ||t|| 2 k 2 (t) → 0 as ||t|| → ∞ and f X,Y is almost surely continuous. ( Then,θ is a weakly consistent estimator of θ, this is,θ P → θ, as n → ∞.

Proof. First, note that
where MAE denotes the mean absolute error which is also the L 1 distance. Under the stated assumptions we have that lim n→∞ MAE(f X,Y , f X,Y ) = 0, in probability, by the Theorem in Devroye and Wagner (1979).
Although the use of the shape-restricted density estimator in Cule et al. (2010) does not involve a tuning parameter, a study of the asymptotic properties of the induced distribution estimator seems not to have been done yet. However, since this density estimator has smaller mean integrated squared error than those obtained with KDE methods (Cule et al., 2010), the use of this method in Algorithm 1 is also expected to produce good estimators of θ for moderate or large samples.
On the other hand, given the immediate relationship between the Estimator II and the estimation of the distribution F Z , it follows that the asymptotic properties ofθ are inherited from those of the estimatorF Z evaluated at 0. Some specific asymptotic results are presented below for different estimators ofF Z (0).
The following result shows that the use of the empirical distribution for estimatingF Z (0) produces consistent and asymptotically normal estimators of θ.
(ii) The estimatorθ is asymptotically normal, this is as n → ∞.
Proof. The results follow by the law of large numbers and the central limit theorem (van der Vaart, 1998).
The use of kernel distribution estimators can also produce consistent and asymptotically normal estimators of θ under certain conditions as indicated in the following theorem. Then, it follows that 1.θ is a strongly consistent estimator of θ.  Fernholz (1991)

and the law of large numbers. (ii) The asymptotic normallity ofθ follows by Corollary 2.4 from Fernholz (1991).
By relaxing the assumptions of the previous theorem it is possible to prove that the use of kernel distribution estimators also produces weakly consistent estimators of θ.

Theorem 4 Suppose that k 1 is bounded in R with
for u ≥ 0. Let {h n } ∞ n=1 be a sequence of positive bandwidths such that lim n→∞ h n = 0 and lim n→∞ nh n = ∞. Suppose also that one of the following conditions holds (i) |t|k 1 (t) → 0 as |t| → ∞ and f Z is almost surely continuous.
Then,θ is a weakly consistent estimator of θ.

Proof. First, note that
where MAE denotes the mean absolute error. Under the stated assumptions we have that lim n→∞ MAE(f Z , f Z ) = 0, in probability, by the Theorem in Devroye and Wagner (1979).
Note that Theorems 1 and 4 simply require a well-behaved kernel function and the boundedness of the target density. The assumptions on the bandwidth parameters are also rather mild since most of the popular choices satisfy these conditions. The use of shape-restricted estimators, by their nature itself, require additional assumptions on the target density. The following result presents such conditions that produce consistent estimators of θ.
Theorem 5 LetF Z be the shape-restricted nonparametric estimator of F Z proposed in Dümbeng and Rufibach (2009) and suppose that the log-density log(f Z ) is Lipschitz continuous and log(f Z ) ′ is Hölder continuous of order β ∈ [1, 2] on a compact interval I ⊂ R. Then,θ is a weakly consistent estimator of θ.

Proof. The result is a consequence of Corollary 4.2 from Dümbeng and Rufibach (2009).
The results presented in this section show that both estimators have good asymptotic performance under mild conditions. An important difference between Estimator I and Estimator II is that the former involves a two-dimensional density (distribution) estimation while the latter involves a one-dimensional density (distribution) estimation. This represents an advantage of Estimator II over Estimator I since the convergence rate of the resulting estimators as well as the ease of implementation is tied to the dimensionality of the problem. However, an interesting feature of Estimator I is that it can be implemented in the context of censored and missing observations since the use of KDE in these contexts has been studied, for example, in Titterington and Mill (1983) and Wells and Yeo (1996).

Examples
In this section, we present two examples that illustrate the implementation of the estimators proposed in Section 2. In the first example we use a sample simulated from a bivariate sinh-arcsinh distribution (Jones and Pewsey, 2009). As detailed in Jones and Pewsey (2009) (i) Kernel 2D. Based on Algorithm 1, this estimator employs a two-dimensional Gaussian kernel density estimator with the bandwidth matrix Hscv implemented in the R package 'ks' (Duong, 2011). The required integration step is conducted using quadrature methods.
(ii) MLE 2D. This estimator is based on Algorithm 1. The corresponding two-dimensional density estimation is conducted using the shape-restricted estimator from Cule et al. (2010).
This estimator is also implemented using the command dlcd from the R package 'Logcon-cDEAD' (Cule et al., 2009). The integration of this density is conducted using a Monte Carlo method.
(iii) SMLE 2D. This estimator is based on Algorithm 1. The corresponding two-dimensional density estimation is conducted using the smooth shape-restricted estimator (Cule et al., 2010) implemented in the command dslcd from the R package 'LogconcDEAD' (Cule et al., 2009). The integration of this density is conducted using a Monte Carlo method.
(iv) MLE 1D. EstimatesF Z in Algorithm 2 by integrating the shape-restricted density estimator proposed in Cule et al. (2010). The density estimation is implemented using the command dlcd from the R package 'LogconcDEAD' (Cule et al., 2009).
(vii) ECDF. This estimator employs the empirical distribution function for estimatingF Z in Algorithm 2.
In order to assess the impact of the assumptions of pairing and dependence in the estimation of θ, we also consider the following estimators: (viii) Independent. (Baklizi and Eidous, 2006) This estimator assumes that X and Y are independent variables and that the corresponding samples are unpaired. The estimator is defined as wheref X andf Y are Gaussian kernel density estimators obtained with samples of X and Y respectively. For both KDE we employ the bandwidth h = 4σ 5 3n 1 5 , whereσ is the sample standard deviation and n is the sample size. This bandwidth is known as the Silverman's rule of thumb.
(ix) Paired. (Baklizi and Eidous, 2006). This estimator is the same as (5) but assuming that the samples of X and Y are paired. This additional assumption is taken into consideration in the bootstrap methods used to calculate confidence intervals for θ ⋆ .
Bootstrap samples and bootstrap confidence intervals (Normal, Basic, Percentile and BCa) are obtained using the R packages 'boot' (Canty and Ripley, 2012) and 'simpleboot' (Peng, 2008). R source code for these examples is available upon request.

Simulated data
In this example we use a simulated sample of size n = 100 from a bivariate sinh-arcsinh distribution (Jones and Pewsey, 2009) with parameters (σ 1 , σ 2 , ρ, ǫ 1 , ǫ 2 , δ 1 , δ 2 ) = (1, 1, 0.75, 0, 1, 1, 2).  Figure 1b shows the bootstrap distribution of the estimators of θ previously described. We can observe a considerable influence of the assumptions of pairing and dependence in the location and spread of the bootstrap distributions of the estimators of θ. We can also notice the influence of these assumptions in the point estimators and bootstrap confidence intervals shown in Table 1. In this case, not including these assumptions leads to underestimating θ. Finally, we can observe that the estimator ECDF is slightly larger than the others, which seems to be a result of its discrete nature.

Real data
In this section we study the data set presented in Venkatraman and Begg (1996), which contains 72 lesion scores obtained using both a clinical scheme without a dermoscope (X Test), and a dermoscopic scoring scheme (Y Test). Their main interest is to assess the information provided by the use of the dermoscope. Here, we analyse the subset of 51 non-diseased patients (diagnosed using a biopsy) and compare the nonparametric inferences for θ obtained using the estimators described in the introduction of this section. It is important to note that the population correlation coefficient of this sample is 0.794, which suggests that the entries are correlated. Table 2 shows point estimators and four types of bootstrap confidence intervals of θ. Figure   2 shows the bootstrap distributions of the estimators of θ. We can note a discrepancy of the point estimators under the assumptions of dependence and independence of the tests. Interval inference is also different; in the cases where pairing and dependence are not considered we can observe that the value θ = 0.5 is included in some of the bootstrap confidence intervals, leading to different conclusions about the relationship of the tests. This is in line with the conclusions in Rubio and Steel (2013)

Discussion
We introduced two classes of nonparametric estimators of θ = P (X < Y ) for the case of paired, possibly dependent, observations. The proposed estimators avoid making assumptions on the distribution and the dependence structure of (X, Y ) which are implicitly considered by estimating nonparametrically either the joint distribution of (X, Y ) or the distribution of the difference Z = Y − X. We proved that the combination of the proposed approach with several nonparametric distribution estimators produce estimators of θ with appealing asymptotic properties. In addition, we have shown that confidence intervals for θ, based on these estimators, can be obtained using bootstrap methods that are easy to implement using already existing R packages. The nonparametric distribution estimators explored in the context of Estimator I perform similarly. They are also comparable in terms of their ease of implementation and the required CPU usage. In the context of Estimator II, we empirically found that the estimators of θ based on smooth distribution estimators exhibit a better performance than those based on discrete distribution estimators such as the empirical distribution. The example presented in Section 3.2 show that not accounting for dependence between X and Y may lead to opposite conclusions about θ = 0.5, and consequently about the relationship between these variables.