Abstract: The image de-noising is the process to remove the noise from the image naturally corrupted by the noise. The wavelet method is one among the various methods for recovering infinite dimensional objects like curves, densities, images etc. The wavelet techniques are very effective to remove the noise because of its ability to capture the energy of a signal in few energy transform values. The wavelet methods are based on shrinking the wavelet coefficients in the wavelet domain. This paper concentrates on selecting a threshold for wavelet function estimation. A new threshold value is pro posed to shrink the wavelet coefficients obtained by wavelet decomposition of a noisy image by considering that the sub band coefficients have a gener alized Gaussian distribution. The proposed threshold value is based on the power of 2 in the size 2J × 2 J of the data that can be computed efficiently. The experiment has been conducted on various test images to compare with the established threshold parameters. The result shows that the proposed threshold value removes the noise significantly.
The Power function distribution is a flexible life time distribution that has applications in finance and economics. It is, also, used to model reliability growth of complex systems or the reliability of repairable systems. A new weighted Power function distribution is proposed using a logarithmic weight function. Statistical properties of the weighted power function distribution are obtained and studied. Location measures such as mode, median and mean, reliability measures such as reliability function, hazard and reversed hazard functions and the mean residual life are derived. Shape indices such as skewness and kurtosis coefficients and order statistics are obtained. Parametric estimation is performed to obtain estimators for the parameters of the distribution using three different estimation methods; namely: the maximum likelihood method, the L-moments method and the method of moments. Numerical simulation is carried out to validate the robustness of the proposed distribution.
Abstract: Among many statistical methods for linear models with the multicollinearity problem, partial least squares regression (PLSR) has become, in recent years, increasingly popular and, very often, the best choice. However, while dealing with the predicting problem from automobile market, we noticed that the results from PLSR appear unstable though it is still the best among some standard statistical methods. This unstable feature is likely due to the impact of the information contained in explanatory variables that is irrelevant to the response variable. Based on the algorithm of PLSR, this paper introduces a new method, modified partial least squares regression (MPLSR), to emphasize the impact of the relevant information of explanatory variables on the response variable. With the MPLSR method, satisfactory predicting results are obtained in the above practical problem. The performance of MPLSR, PLSR and some standard statistical methods are compared by a set of Monte Carlo experiments. This paper shows that the MPLSR is the most stable and accurate method, especially when the ratio of the number of observation and the number of explanatory variables is low.
Abstract: Receiver operating characteristic (ROC) curve is an effective and widely used method for evaluating the discriminating power of a diagnostic test or statistical model. As a useful statistical method, a wealth of literature about its theories and computation methods has been established. The research on ROC curves, however, has focused mainly on cross-sectional design. Very little research on estimating ROC curves and their summary statistics, especially significance testing, has been conducted for repeated measures design. Due to the complexity of estimating the standard error of a ROC curve, there is no currently established statistical method for testing the significance of ROC curves under a repeated measures design. In this paper, we estimate the area of a ROC curve under a repeated measures design through generalized linear mixed model (GLMM) using the predicted probability of a disease or positivity of a condition and propose a bootstrap method to estimate the standard error of the area under a ROC curve for such designs. Statistical significance testing of the area under a ROC curve is then conducted using the bootstrapped standard error. The validity of bootstrap approach and the statistical testing of the area under the ROC curve was validated through simulation analyses. A special statistical software written in SAS/IML/MACRO v8 was also created for implementing the bootstrapping algorithm, conducting the calculations and statistical testing.
Abstract: Particulate matter smaller than 2.5 microns (PM2.5) is a com monly measured parameter in ground-based sampling networks designed to assess short and long-term air quality. The measurement techniques for ground based PM2.5 are relatively accurate and precise, but monitoring lo cations are spatially too sparse for many applications. Aerosol Optical Depth (AOD) is a satellite based air quality measurement that can be computed for more spatial locations, but measures light attenuation by particulates throughout in entire air column, not just near the ground. The goal of this paper is to better characterize the spatio-temporal relationship between the two measurements. An informative relationship will aid in imputing PM2.5 values for health studies in a way that accounts for the variability in both sets of measurements, something physics based models cannot do. We use a data set of Chicago air quality measurements taken during 2007 and 2008 to construct a weekly hierarchical model. We also demonstrate that AOD measurements and a latent spatio-temporal process aggregated weekly can be used to aid in the prediction of PM2.5measurements.
Abstract: The primary advantage of panel over cross-sectional regression stems from its control for the effects of omitted variables or ”unobserved heterogeneity”. However, panel regression is based on the strong assump tions that measurement errors are independently identically ( i.i.d.) and normal. These assumptions are evaded by design-based regression, which dispenses with measurement errors altogether by regarding the response as a fixed real number. The present paper establishes a middle ground between these extreme interpretations of longitudinal data. The individual is now represented as a panel of responses containing dependently non-identically distributed (d.n.d) measurement errors. Modeling the expectations of these responses preserves the Neyman randomization theory, rendering panel regression slopes ap proximately unbiased and normal in the presence of arbitrarily distributed measurement error. The generality of this reinterpretation is illustrated with German Socio-Economic Panel (GSOEP) responses that are discretely distributed on a 3-point scale.
Abstract: This article extends the recent work of V¨annman and Albing (2007) regarding the new family of quantile based process capability indices (qPCI) CMA(τ, v). We develop both asymptotic parametric and nonparametric confidence limits and testing procedures of CMA(τ, v). The kernel density estimator of process was proposed to find the consistent estimator of the variance of the nonparametric consistent estimator of CMA(τ, v). Therefore, the proposed procedure is ready for practical implementation to any processes. Illustrative examples are also provided to show the steps of implementing the proposed methods directly on the real-life problems. We also present a simulation study on the sample size required for using asymptotic results.
In this paper, we introduce a new generalized family of distri- butions from bounded support (0,1), namely, the Topp-Leone-G family.Some of mathematical properties of the proposed family have been studied. The new density function can be symmetrical, left-skewed, right-skewed or reverse-J shaped. Furthermore, the hazard rate function can be constant, in- creasing, decreasing, J or bathtub hazard rate shapes. Three special models are discussed. We obtain simple expressions for the ordinary and incomplete moments, quantile and generating functions, mean deviations and entropies. The method of maximum likelihood is used to estimate the model parame- ters. The flexibility of the new family is illustrated by means of three real data sets.
Abstract: In the United States, diabetes is common and costly. Programs to prevent new cases of diabetes are often carried out at the level of the county, a unit of local government. Thus, efficient targeting of such programs re quires county-level estimates of diabetes incidence−the fraction of the non diabetic population who received their diagnosis of diabetes during the past 12 months. Previously, only estimates of prevalence−the overall fraction of population who have the disease−have been available at the county level. Counties with high prevalence might or might not be the same as counties with high incidence, due to spatial variation in mortality and relocation of persons with incident diabetes to another county. Existing methods cannot be used to estimate county-level diabetes incidence, because the fraction of the population who receive a diabetes diagnosis in any year is too small. Here, we extend previously developed methods of Bayesian small-area esti mation of prevalence, using diffuse priors, to estimate diabetes incidence for all U.S. counties based on data from a survey designed to yield state-level estimates. We found high incidence in the southeastern United States, the Appalachian region, and in scattered counties throughout the western U.S. Our methods might be applicable in other circumstances in which all cases of a rare condition also must be cases of a more common condition (in this analysis, “newly diagnosed cases of diabetes” and “cases of diabetes”). If ap propriate data are available, our methods can be used to estimate proportion of the population with the rare condition at greater geographic specificity than the data source was designed to provide.