Abstract: Fisher’s exact test (FET) is a conditional method that is frequently used to analyze data in a 2 × 2 table for small samples. This test is conservative and attempts have been made to modify the test to make it less conservative. For example, Crans and Shuster (2008) proposed adding more points in the rejection region to make the test more powerful. We provide another way to modify the test to make it less conservative by using two independent binomial distributions as the reference distribution for the test statistic. We compare our new test with several methods and show that our test has advantages over existing methods in terms of control of the type 1 and type 2 errors. We reanalyze results from an oncology trial using our proposed method and our software which is freely available to the reader.
Abstract: The motivation behind this paper is to investigate the use of Softmax model for classification. We show that Softmax model is a nonlinear generalization for the logistic discrimination, that can approximate the posterior probabilities of classes where other Artificial neural network (ANN) models don't have this ability. We show that Softmax model has more flexibility than logistic discrimination in terms of correct classification. To show the performance of Softmax model a medical data set on thyroid gland state is used. The result is that Softmax model may suffer from overfitting.
Abstract: While conducting a social survey on stigmatized/sensitive traits, obtaining efficient (truthful) data is an intricate issue and estimates are generally biased in such surveys. To obtain trustworthy data and to reduce false response bias, a technique, known as randomized response technique, is now being used in many surveys. In this study, we performed a Bayesian analysis of a general class of randomized response models. Suitable simple Beta prior and mixture of Beta priors are used in a common prior structure to obtain the Bayes estimates for the proportion of a stigmatized/sensitive attributes in the population of interest. We also extended our proposal to stratified random sampling. The Bayes and the maximum likelihood estimators are compared. For further understanding of variability, we have also compared the prior and posterior distributions for different values of the design constants through graphs and credible intervals. The condition to develop a new randomized response model is also discussed.
Abstract: Panel data transcends cross-sectional data by tapping pooled inter- and intra-individual differences, along with between and within individual variation separately. In the present study these micro variations in ill-being are predicted by psychological indicators constructed from the British Household Panel Survey (BHPS). Panel regression effects are corrected for errors-in-variables, which attenuate slopes estimated by traditional panel regressions. These corrections reveal that unhappiness and life dissatisfaction are distinct variables that have different psychological causations.
Abstract: In this article, a group acceptance sampling plan (GASP) for lot resubmitting is developed to ensure quality of the product lifetime assuming that the product’s lifetime follows the half logistic distribution. The parameters of the GASP are determined by satifying the specified producer’s and consumer’s risks according to the experiment termination time and the number of testers. A comparison between this proposed group sampling and the ordinary group sampling plan is discussed. This proposed plan is justified with an illustration.
Abstract: A new extension of the generalized gamma distribution with six parameter called the Kummer beta generalized gamma distribution is introduced and studied. It contains at least 28 special models such as the beta generalized gamma, beta Weibull, beta exponential, generalized gamma, Weibull and gamma distributions and thus could be a better model for analyzing positive skewed data. The new density function can be expressed as a linear combination of generalized gamma densities. Various mathematical properties of the new distribution including explicit expressions for the ordinary and incomplete moments, generating function, mean deviations, entropy, density function of the order statistics and their moments are derived. The elements of the observed information matrix are provided. We discuss the method of maximum likelihood and a Bayesian approach to fit the model parameters. The superiority of the new model is illustrated by means of three real data sets.
Abstract: Medical data and biomedical studies are often imbalanced with a majority of observations coming from healthy or normal subjects. In the presence of such imbalances, agreement among multiple raters based on Fleiss’ Kappa (FK) produces counterintuitive results. Simulations suggest that the degree of FK’s misrepresentation of the observed agreement may be directly related to the degree of imbalance in the data. We propose a new method for evaluating agreement among multiple raters that is not affected by imbalances, A-Kappa (AK). Performance of AK and FK is compared by simulating various degrees of imbalance and illustrate the use of the proposed method with real data. The proposed index of agreement may provide some insight by relating its magnitude to a probability scale. Existing indices are interpreted arbitrarily. This new method not only provides a measure of overall agreement but also provides an agreement index on an individual item. Computation of both AK and FK may further shed light into the data and be useful in the interpretation and presenting the results.
Abstract: Constrained general linear models (CGLMs) have wide applications in practice. Similar to other data analysis, the identification of influential obser vations that may be potential outliers is an important step beyond in CGLMs. We develop local influence approach for detecting influential observations in CGLMs. The procedure makes use of the normal curvature and the direction achieving the maximum curvature to assess the local influences of minor perturbation of CGLMs. An illustrative example with a real data set is also reported.
Abstract: support vector machines (SVMs) constitute one of the most popular and powerful classification methods. However, SVMs can be limited in their performance on highly imbalanced datasets. A classifier which has been trained on an imbalanced dataset can produce a biased model towards the majority class and result in high misclassification rate for minority class. For many applications, especially for medical diagnosis, it is of high importance to accurately distinguish false negative from false positive results. The purpose of this study is to successfully evaluate the performance of a classifier, keeping the correct balance between sensitivity and specificity, in order to enable the success of trauma outcome prediction. We compare the standard (or classic) SVM (C SVM) with resampling methods and a cost sensitive method, called Two Cost SVM (TC SVM), which constitute widely accepted strategies for imbalanced datasets and the derived results were discussed in terms of the sensitivity analysis and receiver operating characteristic (ROC) curves.
Abstract: Ranked set sampling and some of its variants have been applied successfully in different areas of applications such as industrial statistics, economics, environmental and ecological studies, biostatistics, and statistical genetics. Ranked set sampling is a sampling method that more efficient than simple random sampling. Also, it is well known that Fisher information of a ranked set sample (RSS) is larger than Fisher information of a simple random sample (SRS) of the same size about the unknown parameter of the underlying distribution in parametric inference. In this paper, we consider the Farlie-Gumbel-Morgenstern (FGM) family and study the information measures such as Shannon’s entropy, Rényi entropy, mutual information, and Kullback-Leibler (KL) information of RSS data. Also, we investigate their properties and compare them with a SRS data.