Improved Tolerance Limits by Combining Analytical and Experimental Data: An Information Integration Methodology

We propose a coherent methodology for integrating different sources of information on a response variable of interest, in order to accurately predict percentiles of its distribution. Under the assumption that one of the sources is more reliable than the other(s), the approach combines factors formed from the data into an additive linear regression model. Quantile regression, designed for quantifying the goodness of fit precisely at a desired quantile, is used as the optimality criterion in model-fitting. Asymptotic confidence interval construction methods for the percentiles are adopted to compute statistical tolerance limits for the response. The approach is demonstrated on a materials science case study that pools together information on failure load from physical tests and computer model predictions. A small simulation study assesses the precision of the inferences. The methodology gives plausible percentile estimates. Resulting tolerance limits are close to nominal coverage probability levels.


Introduction
The aim of this paper is to propose a method for integrating different sources of information on a response variable of interest, in order to make inferences on the percentiles of its distribution.The approach is general enough to be applicable in a variety of fields.To set the stage, consider the basic problem in engineering design of how to manage variability and risk in the properties of materials.Albeit costly, physical testing of the materials has traditionally been the primary method of quantifying uncertainty in vital characteristics like failure load.Recent advances in the sophistication of analytical physics-based engineering models and associated hardware/software technology, are steadily becoming increasingly important contributors in this endeavor.Although potentially less precise, the lower cost of the latter (the indirect method) makes it an attractive contender to the former (the direct method), since it does not involve physical manipulation of the material.A sound uncertainty management strategy in this context might arguably be the effective integration of the information obtained from each method.The aim of this paper is to propose a coherent methodology for such a purpose.
Fields as diverse as archaeology, astronomy, engineering, geophysics, medicine, military operations, and psychology, have developed and applied statistical methods to integrate heterogeneous sources of data (Draper et al., 1992).The concept is related to meta analysis, with an early effort by DuMouchel and Harris (1983) employing a Bayesian approach to combine the results from different studies.(See Hedges and Olkien, 1987, for a comprehensive account of meta analysis.)Craig et al. (2001), present a Bayesian forecasting approach that integrates data from computer models and expert opinions.Recently, a comprehensive methodology has been advanced by Reese et al. (2004), who show how information from computer models, expert opinions, and physical experiments, can be integrated via a multi-stage hierarchical Bayesian regression framework.
By their very nature, Bayesian approaches tend to suffer from prior sensitivity problems.Opting for a nonparametric frequentist stance, we propose a novel approach for the integration of heterogeneous sources of information, by combining data via a regression factor model.Our method is motivated by the need to analyze a specific setup where only test data and computer prediction data in the form of a range of values are available.In this context we seek to make inferences on specific extreme percentiles of the response.Selecting quantile regression (Koenker and Bassett, 1978) as our model-fitting criterion, provides us with an inferential mechanism that targets precisely the desired quantile.Associated confidence interval construction methods can then be used to calculate tolerance limits such as A basis and B basis values (see Definition 1) for the quantile(s) in question.
Although conceived in the context of a materials science problem, our approach has obvious potential applications in any field where percentile inference (extreme or otherwise) is the main goal.Nuclear Engineering safety assessment places a premium on quantifying the risk of extreme events such as reactor core meltdown.Environmental Engineering seeks to measure probabilities of high toxicity levels by combining data from various geobiotic sites.Civil Engineering, Coastal Engineering, the insurance industry, and several others, are all in the business of establishing safety margins to avert catastrophic events.It is often the case that such assessments involve the integration of information from more than one source.This article is a refinement and condensation of the main findings in Trindade and Uryasev (2006), where specific connections with financial measures of risk are alluded to.The remainder of our paper is organized as follows.Section 2 introduces the regression modeling framework under which the data are to be combined.The model-fitting criterion, quantile regression, and associated inference techniques are covered in Section 3. The methodology is illustrated in Section 4 by applying it to a case study from the aerospace industry involving material failure loads.We conclude in Section 5 with a Monte Carlo study on the precision of the inferences derived from the methodology.

The Modeling Framework
The setup for the problem we wish to address is as follows.Data are collected on a response variable of interest obtained through two mechanisms: direct observation and indirect observation.Assume that the direct data is more precise than the indirect data, that is to say, the direct measurements are believed to be closer to the truth.In the case study we will shortly consider, the response is material strength and the direct data is obtained by applying increasing loads to samples of such material until failure is observed.In addition, failure load predictions obtained via methods that do not involve physical testing of the material are also available, and comprise the indirect data.The latter can encompass a broad spectrum of methods, from analytical physics-based computer models and simulations, to subjective expert opinions.Primary questions of interest are: how to effectively integrate direct and indirect data in a coherent manner so as to use all available information, and how to quantify the contribution of the indirect data in the presence of the direct data.In the case study, the hope is that indirect model data which is cheaper to generate, can contribute to accurate prediction of failure loads, reducing the need for more expensive direct test data.The material may be available in different formulations, changes in its composition and/or structure deliberately introduced in the manufacturing process so as to affect the response.
We consider only the simplest case when direct data and only one type of indirect data are available.The basic idea underlying the approach is to use a regression framework, with each of the direct data values as the response in turn; the remainder and all the indirect data serving as explanatory variables (covariates).Let Y ij denote the response obtained from the j-th direct data value corresponding to the ith formulation, i = 1, . . ., I, and j = 1, . . ., N i .We condense the data from each source, direct and indirect, into a pair of summary statistics such as the mean and standard deviation.Other data reduction measures of location (e.g.median) and dispersion (e.g.lower semi-deviation) could also be used for more skewed data.Let (m i(j) , s i(j) ) denote respectively the sample mean and standard deviation for the direct data in formulation i, obtained from Y i,1 , . . ., Y i,j−1 , Y i,j+1 , . . ., Y i,N i .Letting (µ i , σ i ) denote the indirect data mean and standard deviation in formulation i, we fit the regression model: where c 0 , . . ., c 4 are unknown regression coefficients to be estimated from the data, and ε ij is the residual error corresponding to the jth direct data value in the ith formulation.Evidently, if only the indirect data mean, µ i is available, then the c 2 σ i term will be absent.This basic setup can easily be extended to accommodate data from several (K say) indirect data sources (µ i ), k = 1, . . ., K, which can occur if competing approaches are being considered, e.g. two numerical models plus expert opinions (K = 3).Extensions to additional sources of variability are likewise immediate.Of course, the basic linear model can also be made more flexible by the inclusion The motivation behind model (2.1) is two-fold and can be heuristically argued as follows.Firstly, it provides a way to combine the direct and indirect information; leaving Y ij out of the (m i(j) , s i(j) ) calculation in a sense justifies its use as the (obvious) response at that level of i and j.Secondly, the approach "pools" together information from all sources when making inferences on a particular source, in much the same way as the Stein effect improves estimation by using information from all coordinates when estimating each coordinate (Berger, 1982).
In reliability applications where failure load is the response, accurate inference on low quantiles of its distribution are often desired.To ensure ample safety margins, engineering specifications usually call for estimates of the 1st and 10th quantiles, along with associated 95% lower confidence bounds.The resulting tolerance limits are repectively known as A basis and B basis.Precise definitions of these terms are as follows.

Definition 1 (A basis and B basis)
Let {X 1 , . . ., X n } be a random sample of observations from random variable X with continuous distribution function Equivalently, representation (2.2) means that ξ τ is a lower 95% tolerance bound for the (1 − τ )th quantile of X.An A basis value is defined to be ξ 0.01 , and a B basis value, ξ 0.10 .Thus in repeated sampling, the calculated A basis (B basis) value would fall below the true 1st (10-th) percentile, 95% of the time.
With inference on such extreme quantiles of the response in mind, minimization of residual error criteria such as squared or absolute deviations with their focus on measures of central tendency, may be inappropriate.Quantile regression, to be discussed next, provides a promising alternative.

Consider observations
for i = 1, . . ., n, from the linear regression model where the errors {ε i } are independent and identically distributed (i.i.d.) with cdf F ε and pdf and the quantile function of ε, are related according to, where is an estimator of the τ th quantile of Y based on the sample y 1 , . . ., y n , Koenker and Bassett (1978) proposed that a natural estimator of β(τ ) can be obtained by minimizing the criterion, over all b ∈ R p+1 .The resulting estimators, β(τ ), called regression quantiles by Koenker and Bassett (1978), are therefore given by, The methodology that results in the regression quantiles is called Quantile Regression.Using it to fit model (2.1), leads to the following estimated equation for the τ th quantile of the failure load, as a function of the generic design vector, x = {1, µ, σ, m, s}: (3.5) A basis and B basis values are obtained as a 95% lower confidence bound for the respective quantile, τ = 0.01 or τ = 0.10.Several methods are available for the construction of such a confidence bound; we present three of the most promising.
(i) Studentization.This is due to Koenker (1994), and is based on the asymptotic distribution of the regression quantiles derived by Koenker and Bassett (1978) under mild regularity conditions.With Ω = X X and z .05denoting the 5th percentile from a standard normal distribution, the studentization method results in the A basis (τ = 0.01) or B basis (τ = 0.10) value where ŝn (τ ) is an estimate of the sparsity function s(τ ), obtained by using for example the Hall-Sheather bandwidth.
(ii) Direct.A nonparametric approach making direct use of the empirical quantile function, proposed by Zhou and Portnoy (1996).This results in the A basis (τ = 0.01) or B basis (τ = 0.10) value Various bootstrap methods have also been suggested by Koenker (1994), and Zhou and Portnoy (1996).Generally speaking, the bootstrap B basis (A basis) value would be the 5th empirical quantile of the sampling distribution of the estimator of the 10th (1st) quantile.
Approaches (i) and (ii) were originally presented in the context of homoscedastic errors, that is, all ε i having the same variance.By introducing weights w 1 , . . ., w n in the criterion function (3.4), which becomes Zhou and Portnoy (1998) generalize (i) and (ii) to accommodate heteroscedasticity (different error variances).The only adjustment needed in (3.6) and (3.7), is to redefine Ω = X Diag(w 1 , . . ., w n )X with the weights vector w = [w 1 , . . ., w n ] estimated either via least absolute deviations or least squares.The latter permits a closed form representation for the estimator as ŵ = X(X X) −1 X | ξ|, where ξ is the vector of quantile regression residuals resulting from a median quantile regression fit (τ = 0.5).
Finally, and in analogy with R 2 of ordinary regression, Koenker and Machado (1999) point out that a measure of goodness-of-fit relative to the intercept-only model can also be defined in the framework of quantile regression.That is, if β0 (τ ) denotes the estimated regression quantile in the intercept-only model, and β(τ ) the estimated regression quantiles in the full model (3.5), we define (3.9) The R 1 (τ ) criterion thus assess the appropriateness of the fitted conditional quantile function of Y at τ , on a scale of 0 to 1. Values close to 0 (1) suggest a poor (good) fit.
Remark.Trindade and Uryasev (2006) demonstrate that whereas the fitted quantile regression criterion V( β(τ )) is not easily interpretable, the modified criterion V C ( β(τ )) defined as, can be visualized as the distance between the mean and the tail mean beyond the smaller of the τ th or (1 − τ )th quantile, for the fitted residual error distribution, That is,

Integrating Data on Composite Material Strength: A Case Study from the Aerospace Industry
In this section we apply the data integration and quantile regression methodology to a dataset supplied by The Boeing Company involving the strength of a composite material available in several different formulations.The information available within each formulation consisted of: (i) an upper and lower failure load prediction limit for each formulation stemming from an analytical model (model data); and (ii) actual observed failure loads obtained through physical testing of formulation samples (test data).The test and model data comprise the direct and indirect data respectively, and we refer to it as such in the context of this case study.The dataset considered consisted of 18 formulations with exactly 5 test points per formulation.
Figure 1 shows the case study data augmented with 10th and 90th percentile lines, estimated via weighted quantile regression (least squares weights).If U i and L i denote respectively the upper and lower model failure load prediction limits in the ith formulation, the model mean and standard deviation is computed as As is evident, the method gives plausible percentile estimates.Information is pooled from all formulations when making inferences on a particular formulation.Each formulation has an upper and lower failure load prediction limit stemming from an analytical model (model points), and actual observed failure loads obtained through physical testing of formulation samples (test points).The percentile lines are fitted via weighted quantile regression, using both model and test data as covariates.
In direct analogy with ordinary regression, and to quantify the contribution from a subset of covariates given a subset already in the model, we can compute the partial R 1 (τ ).Simplify the notation by suppressing dependence on the fitted parameters ĉ(τ ), and let the subscripts m, t, and mt denote respectively results of weighted quantile regression runs using as covariates model only data, test only data, and both model and test data.For example, V C mt (τ ) denotes the weighted quantile regression goodness-of-fit criterion attained under model (2.1), while V C t (τ ) denotes the same attained under the subset model The partial R 1 (τ ) of the model data given the test data can then be defined as .
Figure 2 shows a schematic of all the partial R 1 (.10) values, as a function of the terms present in the quantile regression model.

Intercept + test
Intercept + model The diagram in Figure 2 suggests that the model data contributes little (1.8%) in the presence of the test data, whereas the test data contributes 19.1% in the presence of the model data.Table 2 shows the change in the fitted coefficients (and standard errors) as we add more covariate terms into the 10th quantile regression model.The signs of the coefficients remain consistent throughout; positive (negative) for the mean (standard deviation) terms, agreeing with the intuitively obvious fact that quantile estimates should increase with increasing mean failure loads, and decrease with increasingly variable failure loads.Only the mean terms in the subset models (model only and test only) are significant; sample sizes are generally too small to accommodate a more complex regression model.
Table 1: Fitted coefficient values under 10-th quantile regression modeling with different sets of covariates.The first set includes only model mean (c 1 ) and standard deviation (c 2 ) data; the second only test mean (c 3 ) and standard deviation (c 4 ) data; the third both model and test data.Standard errors appear in parentheses.

Covariates
Estimates of quantile regression coefficients used c 0 (0.1)A broader perspective of the extent of the goodness-of-fit attained for these data is presented in Figure 3, which shows the value of R 1 (τ ) plotted as a function of τ .These values are obtained as in Figure 2, for all values of 0 < τ < 1.The different line types indicate whether only test, only model, or both test and model, data are used as covariates.The small difference in fits obtained by using test only vs. model and test as covariates, with model only fits lagging about 10% below these, is consistent with our earlier statements.Slightly better fits for all are attained in the right tails (τ > 0.5).

Assessing the Precision of the Inferences on Simulated Data
As a check on the precision of the results, the performance of the methodology was assessed in a controlled environment consisting of failure data simulated from two-parameter Weibull distributions.The choice of shape (α) and scale (β) parameters, constituting a formulation, is made randomly and uniformly over the rectangle, 10 < α < 80 and 40 < β < 120.(This region encompasses plausible values of the Weibull parameters for the case study data, determined by a maximum likelihood fit to the test data within each formulation.)A random draw of fixed sample size N i ≡ n is then made in each of I formulations, for all combinations of n = 5, 10, 20, 30, and I = 5, 10, 20, 50.The n points constitute the test data used to form (m i(j) , s i(j) ).The true mean and standard deviation of the formulation Weibull distribution form the model data, Regression model (2.1) is then fitted via weighted quantile regression, and the B basis calculated using the studentization approach.This is replicated 1,000 times for each (n, I) combination.The percentage of time that the calculated B basis fell below the true 10th percentile within each formulation is recorded.
Averaging these percentages across all formulations gives the B basis percent coverage probabilities presented on Table 2.
For small formulation and test point numbers, coverages are well below the nominal 95% level.The nominal level is attained at about 20 formulation, each with 20 test points.The direct method was found to be unsuitable for extreme quantiles like the 10th; the value in parenthese of equation (3.7) was often negative.Because of the reasonable studentization coverages obtained in the range of sample sizes present in the study, the computationally intensive resampling methods were not implemented.
Figure 4 shows the resulting B basis line for the case study data, computed via this studentized approach at each of the fitted 10th percentile lines.The large variability present in the estimates is reflected in the distance between the two lines.This is not however inconsistent with the small amount of data available, especially test points within formulations.One would naturally expect better results in larger studies.

Summary
We have shown how predictions on material strength from computer or analytical models can be coherently integrated with physical test data, in order to improve the accuracy of estimates for percentiles of the failure load distribu-tion.The method is general enough to be applied to any situation where one seeks to combine quantitive data on a particular response variable from a more trustworthy source, with data obtained from one or more less reliable sources.The information integration is accomplished in the farmework of a regression model, the parameters of which can be estimated via weighted quantile regression, a criterion specifically designed to achieve a good fit at a chosen percentile.In the case of Weibull-distributed failure loads, tolerance limits calculated via a one-sided asymptotic confidence bound were found to achieve good coverage probabilities at moderate sample sizes.An argument analogous to the partial R 2 of ordinary regression indicates that the model information by itself explains about 65% of the "variability" seen in percentiles of the failure load distribution.The incorporation of the test data in the form of summary statistics serving as covariates, accounts for approximately 20% of the remaining variability.disjoint sets consisting of r and n i = N i − r values, respectively, where 0 ≤ r ≤ N i , is constant for all i.The r values will be used to form the covariates (the covariate set), while the n i values will be used as responses (the response set).
(b) Let Y (q) i( 1) , . . ., Y (q) i(n i ) denote the direct values in the qth response set, q = 1, . . ., p i .Form y i , the vector of length n i p i consisting of the responses ordered as follows, i(N i ) denote the direct data values in the qth covariate set, q = 1, . . ., p i .Form m (q) i and s (q) i , the sample mean and standard deviation of the direct data values in the qth covariate set: If r = 0, the direct data is not used as covariates (only indirect data).If r = 1, then s (e) Form the design matrix, X i , for formulation i.This consists of the direct and indirect data summary statistics arranged in a matrix of n i p i rows and 2K + 3 columns (the first column consists of the number 1).The rows correspond to those in y i .If # = v/n i denotes the largest integer greater than or equal to v/n i , the vth row of X i is just 1, µ (1) i , σ (1) i , µ (2) i , σ (2) i , . . ., µ 3. Form the regression design matrix, X, with n rows and l + 1 = 2K + 3 columns, comprised of the vertical concatenation of the formulation design matrices X i , X = [X 1 , . . ., X I ].
Note that throughout the paper we used r = 4.An example follows.
and β j (τ ) = β j , for j = 1, . . ., p. Viewed as a function of the design point x, Q Y (τ |x) describes how the τ th quantile surface of Y varies as a function of the factors X 1 , . . ., X p .Let y = [y 1 , . . ., y n ] denote the observed vector of responses, so that y = Xβ + ε.For any real number z, define (z) + = max{0, z} and (z) − = min{0, z}.In analogy with the fact that any number b(τ ) that satisfies, b(τ ) = arg min b∈R 1 n

Figure 1 :
Figure 1: The Case Study Data With Estimated 10th and 90th Percentiles.Each formulation has an upper and lower failure load prediction limit stemming from an analytical model (model points), and actual observed failure loads obtained through physical testing of formulation samples (test points).The percentile lines are fitted via weighted quantile regression, using both model and test data as covariates.

Figure 2 :
Figure 2: Partial R 1 (0.10) values for the case study data.Values are computed from weighted quantile regression fits using test only (t), model only (m), and both test and model (mt) data as covariates.

Figure 3 :
Figure 3: The entire quantile regression process for the case study data.The goodness-of-fit criterion R 1 (τ ) from weighted quantile regression fits, plotted as a function τ .The line types indicate whether only test, only model, or both test and model, data are used as covariates.

Figure 4 :
Figure 4: B basis values for case study data.The estimated 10-th percentile and B basis lines are based on quantile regression fits, with both test and model data as covariates.
If r = N i , we use all the direct data as both response and covariates.sample mean and standard deviation of the indirect data from the kth source in formulation i, for all k = 1, . . ., K.
y, the regression response vector of length n = I i=1 n i p i comprised of the vertical concatenation of the formulation response vectors, y = [y 1 , . . ., y I ].

Table 2 :
Percent coverages for B basis values computed via studentized weighted quantile regression.The data within a formulation are simulated from a Weibull distribution, with shape (α) and scale (β) parameters selected randomly and uniformly over the rectangle, 10 < α < 80, 40 < β < 120.Coverages are based on 1,000 replications.Nominal level is 95%.