Abstract: Frequentist and Bayesian hypothesis testing are often viewed as “two separate worlds” by practitioners. While theoretical relationships of course exist, our goal here is to demonstrate a practical example where one must be careful conducting frequentist hypothesis testing, and in that context illustrate a practical equivalence between Bayesian and frequentist testing. In particular, if the sample size is random (hardly unusual in prac tical problems where the sample size may be “all available experimental units”), then choosing an α level in advance such as 0.05 and using it for every possible sample size is inadmissible. In other words, one can find a dif ferent overall procedure which has the same overall type I error but greater power. Not coincidentally, this alternative procedure is based on Bayesian testing procedures.
Abstract: In this work we present a combined approach to contingency tables analysis using correspondence analysis and log-linear models. Several investigators have recognized relations between the aforementioned method ologies, in the past. By their combination we may obtain a better under standing of the structure of the data and a more favorable interpretation of the results. As an application we applied both methodologies to an epi demiological database (CARDIO2000) regarding coronary hert disease risk factors.a
Abstract: In this paper, freight transportation is taken into account. One of the models used for modelling “Origin-Destination” freight flows is log regression model obtained by applying a log-transformation to the tradi tional gravity model. Freight flows between ten provinces of Turkey is ana lyzed by using generalized maximum entropy estimator of the log-regression model for freight flow. The data set is gathered together from the axle load survey performed by Turkish Directorate of Highways and other so cioeconomic and demographic variables related with provinces of interest. Relations between considered socioeconomic and demographic variables and freight flows are figured out and results are discussed.
Abstract: : Normally, one may think that the distribution of closed birth interval of any specific order may be the same as the distribution of most recent closed birth interval of the same order. But it is not true. Here the distinction between the distribution of a specific order of usual closed birth interval and most recent closed birth interval of the same order is examined. In this context, firstly we demonstrate the distinction between the most recent closed birth interval and usual closed birth interval empirically by considering a real data set. Further, the distinction between these distributions is demonstrated theoretically, by taking certain hypothetical values of fertility parameters involved in the stochastic model proposed for the purpose.
Abstract: State lotteries employ sales projections to determine appropri ate advertised jackpot levels for some of their games. This paper focuses on prediction of sales for the Lotto Texas game of the Texas Lottery. A novel prediction method is developed in this setting that utilizes functional data analysis concepts in conjunction with a Bayesian paradigm to produce predictions and associated precision assessments.
Efficiency analysis is very useful and important to measure the performance of the firms in com- petitive market of rapidly developing country like Bangladesh. The more efficient firms, and the decision making units (DMUs) are usually referred as benchmarking units for the development. In this study, efficiency scores are obtained using the non-parametric Data Envelopment Anal- ysis (DEA) technique for 1007 manufacturing firms in Bangladesh from the enterprise survey data. The DEA is used to calculate weights for inputs and outputs by assigning the maximum efficiency score for a DMU under evaluation. Total 29 firms are found efficient under variable returns to scale assumption. The significant determinants behind the inefficiency found in this analysis include mainly the firm size, manager’s experience in respective sector, annual losses due to power outage, number of production workers.
Abstract: Existing indices of observer agreement for continuous data, such as the intraclass correlation coefficient or the concordance correlation coefficient, measure the total observer-related variability, which includes the variabilities between and within observers. This work introduces a new index that measures the interobserver variability, which is defined in terms of the distances among the ‘true values’ assigned by different observers on the same subject. The new coefficient of interobserver variability (CIV ) is defined as the ratio of the interobserver and the total observer variability. We show how to estimate the CIV and how to use bootstrap and ANOVAbased methods for inference. We also develop a coefficient of excess observer variability, which compares the total observer variability to the expected total observer variability when there are no differences among the observers. This coefficient is a simple function of the CIV . In addition, we show how the value of the CIV , estimated from an agreement study, can be used in the design of measurements studies. We illustrate the new concepts and methods by two examples, where (1) two radiologists used calcium scores to evaluate the severity of coronary artery arteriosclerosis, and (2) two methods were used to measure knee joint angle.
Abstract: Clustering is an extremely important task in a wide variety of ap plication domains especially in management and social science research. In this paper, an iterative procedure of clustering method based on multivariate outlier detection was proposed by using the famous Mahalanobis distance. At first, Mahalanobis distance should be calculated for the entire sample, then using T 2 -statistic fix a UCL. Above the UCL are treated as outliers which are grouped as outlier cluster and repeat the same procedure for the remaining inliers, until the variance-covariance matrix for the variables in the last cluster achieved singularity. At each iteration, multivariate test of mean used to check the discrimination between the outlier clusters and the inliers. Moreover, multivariate control charts also used to graphically visual izes the iterations and outlier clustering process. Finally multivariate test of means helps to firmly establish the cluster discrimination and validity. This paper employed this procedure for clustering 275 customers of a famous two wheeler in India based on 19 different attributes of the two wheeler and its company. The result of the proposed technique confirms there exist 5 and 7 outlier clusters of customers in the entire sample at 5% and 1% significance level respectively.
Abstract: Scientific interest often centers on characterizing the effect of one or more variables on an outcome. While data mining approaches such as random forests are flexible alternatives to conventional parametric models, they suffer from a lack of interpretability because variable effects are not quantified in a substantively meaningful way. In this paper we describe a method for quantifying variable effects using partial dependence, which produces an estimate that can be interpreted as the effect on the response for a one unit change in the predictor, while averaging over the effects of all other variables. Most importantly, the approach avoids problems related to model misspecification and challenges to implementation in high dimensional settings encountered with other approaches (e.g., multiple linear regression). We propose and evaluate through simulation a method for constructing a point estimate of this effect size. We also propose and evaluate interval estimates based on a non-parametric bootstrap. The method is illustrated on data used for the prediction of the age of abalone.
Abstract: Spread of airborne plant diseases from a propagule source is classically assessed by fitting a gradient curve to aggregated data coming from field experiments. But, aggregating data decreases information about processes involved in disease spread. To overcome this problem, individual count data can be collected; it was done in the case of short-distance spread of wheat brown rust. However, for such data, the gradient curve is a limited model since heterogeneity of hosts is ignored and, consequently, overdisper sion occurs. So, we propose a parametric frailty model in which the frailties represent propensities of hosts to be infected. The model is used to assess dispersal of propagules and heterogeneity of hosts.