Abstract: In this paper we endeavour to provide a largely non-technical description of the issues surrounding unbalanced factorial ANOVA and review the arguments made for and against the use of Type I, Type II and Type III sums of squares. Though the issue of which is the `best' approach has been debated in the literature for decades, to date confusion remains around how the procedures differ and which is most appropriate. We ultimately recommend use of the Type II sums of squares for analysis of main effects because when no interaction is present it tests meaningful hypotheses and is the most statistically powerful alternative.
Abstract: It is always useful to have a confidence interval, along with a single estimate of the parameter of interest. We propose a new algorithm for kernel based interval estimation of a density, with an aim to minimize the coverage error. The bandwidth used in the estimator is chosen by minimizing a bootstrap estimate of the absolute value of the coverage error. The resulting confidence interval seems to perform well, in terms of coverage accuracy and length, especially for large sample size. We illustrate our methodology with data on the eruption durations for the Old Faithful geyser in USA. It seems to be the first bandwidth selector in the literature for kernel based interval estimation of a density.
Abstract: Friedman’s test is a rank-based procedure that can be used to test for differences among t treatment distributions in a randomized complete block design. It is well-known that the test has reasonably good power under location-shift alternatives to the null hypothesis of no difference in the t treatment distributions. However the power of Friedman’s test when the alternative hypothesis consists of a non-location difference in treatment distributions can be poor. We develop the properties of an alternative rank-based test that has greater power than Friedman’s test in a variety of such circumstances. The test is based on the joint distribution of the t! possible permutations of the treatment ranks within a block (assuming no ties). We show when our proposed test will have greater power than Friedman’s test, and provide results from extensive numerical work comparing the power of the two tests under various configurations for the underlying treatment distributions.
Abstract: In alcohol studies, drinking outcomes such as number of days of any alcohol drinking (DAD) over a period of time do not precisely capture the differences among subjects in a study population of interest. For example, the value of 0 on DAD could mean that the subject was continually abstinent from drinking such as lifetime abstainers or the subject was alcoholic, but happened not to use any alcohol during the period of interest. In statistics, zeros of the first kind are called structural zeros, to distinguish them from the sampling zeros of the second type. As the example indicates, the structural and sampling zeros represent two groups of subjects with quite different psychosocial outcomes. In the literature on alcohol use, although many recent studies have begun to explicitly account for the differences between the two types of zeros in modeling drinking variables as a response, none has acknowledged the implications of the different types of zeros when such modeling drinking variables are used as a predictor. This paper serves as the first attempt to tackle the latter issue and illustrate the importance of disentangling the structural and sampling zeros by using simulated as well as real study data.
Abstract: The association between bivariate binary responses has been studied using Pearson’s correlation coefficient, odds ratio, and tetrachoric correlation coefficient. This paper introduces a copula to model the association. Numerical comparisons between the proposed method and the existing methods are presented. Results show that these methods are comparative. However, the copula method has a clearer interpretation and is easier to extend to bivariate responses with three or more ordinal categories. In addition, a goodness-of-fit test for the selection of a model is performed. Applications of the method on two real data sets are also presented.
Abstract: Many nations’ defence departments use capabilitybased planning to guide their investment and divestment decisions. This planning process involves a variety of data that in its raw form is difficult for decisionmakers to use. In this paper we describe how dimensionality reduction and partition clustering are used in the Canadian Armed Forces to create visualizations that convey how important military capabilities are in planning scenarios and how much capacity the planned force structure has to provide the capabilities. Together, these visualizations give decisionmakers an overview of which capabilities may require investment or may be candidates for divestment.
Abstract: Crude oil being the primary source of energy is been unquestioningly the main driving engine of every country in this world whether it is the oil producer economy and/or oil consumer economy. Crude oil, one of the key strategic products in the global market, may influence the economy of the exporting and importing countries. Iran is one of the major crude oil exporting partners of the Organization of the Petroleum Exporting Countries (OPEC). Analysis of the risk measures associated with the Iranian oil price data is of strategic importance to the Iranian government and policy makers in particular for the short-and long-term planning for setting up the oil production targets. Oil price risk-management focuses mainly on when and how an organization can best prevent the costly exposure to the price risk. Value-at-Risk (VaR) is the commonly accepted instrument of risk-measure and is evaluated by analysing the negative tail of the probability distributions of the returns/profit and loss. Among several approaches for calculating VaR, the most common approaches are variance-covariance approach, historical simulation and Monte-Carlo simulation. Recently, copula functions have emerged as a powerful tool to model and simulate multivariate probability distributions. Copula applications have been noted predominantly in the areas of finance, actuary, economics and health and clinical studies. In addition, copulas are useful devices to deal with the non normality and non-linearity issues which are frequently observed in cases of financial time series data. In this paper we shall apply copulas namely; Frank copula, Clayton copula and Gumbel copula to analyse the time series crude oil price data of Iran in respect of OPEC prices. Data considered are; i. Monthly average prices for a barrel of Iranian and OPEC crude oil, from January 1997 to December 2008, ii. Seasonal number of barrels of Iran’s crude oil export, from January 1997 to December 2008. The results will demonstrate copula simulated data are providing higher and lower relative change values on the upper and lower tails respectively in comparison to the original data.
Abstract: In compositional data, an observation is a vector with non-negative components which sum to a constant, typically 1. Data of this type arise in many areas, such as geology, archaeology, biology, economics and political science among others. The goal of this paper is to extend the taxicab metric and a newly suggested metric for com-positional data by employing a power transformation. Both metrics are to be used in the k-nearest neighbours algorithm regardless of the presence of zeros. Examples with real data are exhibited.
Abstract: In the paper, we propose power weighted quantile regression(PWQR), which can reduce the effect of heterogeneous of the conditional densities of the response effectively and improve efficiency of quantile regression). In addition to PWQR, this article also proves that all the weighting of those that the actual value is less than the estimated value of PWQR and the proportion of all the weighting is very close to the corresponding quantile. At last, this article establishes the relationship between Geomagentic Indices and GIC. According to the problems of power system security operation, we make GIC risk value table. This table can have stronger practical operation ability, can provide power system security operation with important inferences.
Abstract: We analyze the cross-correlation between logarithmic returns of 1108 stocks listed on the Shanghai and Shenzhen Stock Exchange of China in the period 2005 to 2010. The results suggest that the estimated distribution of correlation coefficients is right shifted in the tumble time of Chinese stock market. Due to the large share of maximum eigenvalue, the principal correlation component in Chinese stock market is dominant and other components only have trivial effects on the market condition. The same-signed corresponding vector elements enable us to propose the maximum eigenvalue series as an indicator for collective behavior in the equity market. We provide the evidence that the largest eigenvalue series can be used as an effective indicative parameter to describe the collective behavior of stock returns, which is found to be positively correlated to market volatility. By using time-varying windows, we find the positive correlation diminishes when the market volatility reaches both highest and lowest level. By defining a stability rate, we display that the collective behavior of stocks tends to be more homogeneous in the context of crisis than the regular time. This study has implications for the arising discussions on correlation risk.