In the present paper, we propose the new Janardan-Power Series (JPS) class of distributions, which is a result of combining the Janardan distribution of Shanker et.al (2013) with the family of power series distributions. Here, we examine the fundamental attributes of this class of distribution, including the survival, hazard and reverse hazard functions, limiting behavior of the cdf and pdf, quantile function, moments and distribution of order statistics. Moreover, the particular case of the JPS distribution such as the JanardanBinomial (JB), Janardan-Geometric (JG), Janardan-Poisson (JP) and the Janardan-Logarithmic (JL) distributions, are introduced. In addition, the JP distribution is analyzed in details. Eventually, an example of the proposed class applied on some real data set.
Abstract: We have developed an automated linking scheme for PUBMED citations with GO terms using SVM (Support Vector Machine), a classifica tion algorithm. The PUBMED database has been essential to life science re searchers with over 12 million citations. More recently GO (Gene Ontology) has provided a graph structure for biological process, cellular component, and molecular function of genomic data. By text mining the textual content of PUBMED and associating them with GO terms, we have built up an ontological map for these databases so that users can search PUBMED via GO terms and conversely GO entries via PUBMED classification. Conse quently, some interesting and unexpected knowledge may be captured from them for further data analysis and biological experimentation. This paper reports our results on SVM implementation and the need to parallelize for the training phase.
Abstract: The image de-noising is the process to remove the noise from the image naturally corrupted by the noise. The wavelet method is one among the various methods for recovering infinite dimensional objects like curves, densities, images etc. The wavelet techniques are very effective to remove the noise because of its ability to capture the energy of a signal in few energy transform values. The wavelet methods are based on shrinking the wavelet coefficients in the wavelet domain. This paper concentrates on selecting a threshold for wavelet function estimation. A new threshold value is pro posed to shrink the wavelet coefficients obtained by wavelet decomposition of a noisy image by considering that the sub band coefficients have a gener alized Gaussian distribution. The proposed threshold value is based on the power of 2 in the size 2J × 2 J of the data that can be computed efficiently. The experiment has been conducted on various test images to compare with the established threshold parameters. The result shows that the proposed threshold value removes the noise significantly.
The Power function distribution is a flexible life time distribution that has applications in finance and economics. It is, also, used to model reliability growth of complex systems or the reliability of repairable systems. A new weighted Power function distribution is proposed using a logarithmic weight function. Statistical properties of the weighted power function distribution are obtained and studied. Location measures such as mode, median and mean, reliability measures such as reliability function, hazard and reversed hazard functions and the mean residual life are derived. Shape indices such as skewness and kurtosis coefficients and order statistics are obtained. Parametric estimation is performed to obtain estimators for the parameters of the distribution using three different estimation methods; namely: the maximum likelihood method, the L-moments method and the method of moments. Numerical simulation is carried out to validate the robustness of the proposed distribution.
Abstract: Among many statistical methods for linear models with the multicollinearity problem, partial least squares regression (PLSR) has become, in recent years, increasingly popular and, very often, the best choice. However, while dealing with the predicting problem from automobile market, we noticed that the results from PLSR appear unstable though it is still the best among some standard statistical methods. This unstable feature is likely due to the impact of the information contained in explanatory variables that is irrelevant to the response variable. Based on the algorithm of PLSR, this paper introduces a new method, modified partial least squares regression (MPLSR), to emphasize the impact of the relevant information of explanatory variables on the response variable. With the MPLSR method, satisfactory predicting results are obtained in the above practical problem. The performance of MPLSR, PLSR and some standard statistical methods are compared by a set of Monte Carlo experiments. This paper shows that the MPLSR is the most stable and accurate method, especially when the ratio of the number of observation and the number of explanatory variables is low.
Abstract: Receiver operating characteristic (ROC) curve is an effective and widely used method for evaluating the discriminating power of a diagnostic test or statistical model. As a useful statistical method, a wealth of literature about its theories and computation methods has been established. The research on ROC curves, however, has focused mainly on cross-sectional design. Very little research on estimating ROC curves and their summary statistics, especially significance testing, has been conducted for repeated measures design. Due to the complexity of estimating the standard error of a ROC curve, there is no currently established statistical method for testing the significance of ROC curves under a repeated measures design. In this paper, we estimate the area of a ROC curve under a repeated measures design through generalized linear mixed model (GLMM) using the predicted probability of a disease or positivity of a condition and propose a bootstrap method to estimate the standard error of the area under a ROC curve for such designs. Statistical significance testing of the area under a ROC curve is then conducted using the bootstrapped standard error. The validity of bootstrap approach and the statistical testing of the area under the ROC curve was validated through simulation analyses. A special statistical software written in SAS/IML/MACRO v8 was also created for implementing the bootstrapping algorithm, conducting the calculations and statistical testing.
Abstract: Particulate matter smaller than 2.5 microns (PM2.5) is a com monly measured parameter in ground-based sampling networks designed to assess short and long-term air quality. The measurement techniques for ground based PM2.5 are relatively accurate and precise, but monitoring lo cations are spatially too sparse for many applications. Aerosol Optical Depth (AOD) is a satellite based air quality measurement that can be computed for more spatial locations, but measures light attenuation by particulates throughout in entire air column, not just near the ground. The goal of this paper is to better characterize the spatio-temporal relationship between the two measurements. An informative relationship will aid in imputing PM2.5 values for health studies in a way that accounts for the variability in both sets of measurements, something physics based models cannot do. We use a data set of Chicago air quality measurements taken during 2007 and 2008 to construct a weekly hierarchical model. We also demonstrate that AOD measurements and a latent spatio-temporal process aggregated weekly can be used to aid in the prediction of PM2.5measurements.