Abstract: This paper aims to generate multivariate random vector with prescribed correlation matrix by Johnson system. The probability weighted moment (PWM) is employed to assess the parameters of Johnson system. By equat ing the first four PWMs of Johnson system with those of the target distri bution, a system of equations solved for the parameters is established. With suitable initial values, solutions to the equations are obtained by the New ton iteration procedure. To allow for the generation of random vector with prescribed correlation matrix, approaches to accommodate the dependency are put forward. For the four transformation models of Johnson system, nine cases are addressed. Analytical formulae are derived to determine the equivalent correlation coefficient in the standard normal space for six cases, the rest three ones are handled by an interpolation method. Finally, several numerical examples are given out to check the proposed method.
Abstract: In any sport competition, there is a strong interest in knowing which team shall be the champion at the end of the championship. Besides this, the end result of a match, the chance of a team to be qualified for a specific tournament, the chance of being relegated, the best attack, the best defense, among others, are also subject of interest. In this paper we present a simple method with good predictive quality, easy implementation, low computational effort, which allows the calculation of all the interesting quantities above. Following Lee (1997), we estimate the average goals scored by each team by assuming that the number of goals scored by a team in a match follows a univariate Poisson distribution but we consider linear models that express the sum and the difference of goals scored in terms of five covariates: the goal average in a match, the home-team advantage, the team’s offensive power, the opponent team’s defensive power and a crisis indicator. The methodology is applied to the 2008-2009 English Premier League.
Abstract: Particulate matter smaller than 2.5 microns (PM2.5) is a com monly measured parameter in ground-based sampling networks designed to assess short and long-term air quality. The measurement techniques for ground based PM2.5 are relatively accurate and precise, but monitoring lo cations are spatially too sparse for many applications. Aerosol Optical Depth (AOD) is a satellite based air quality measurement that can be computed for more spatial locations, but measures light attenuation by particulates throughout in entire air column, not just near the ground. The goal of this paper is to better characterize the spatio-temporal relationship between the two measurements. An informative relationship will aid in imputing PM2.5 values for health studies in a way that accounts for the variability in both sets of measurements, something physics based models cannot do. We use a data set of Chicago air quality measurements taken during 2007 and 2008 to construct a weekly hierarchical model. We also demonstrate that AOD measurements and a latent spatio-temporal process aggregated weekly can be used to aid in the prediction of PM2.5measurements.
Abstract: The modified autoregressive (mAR) index has been proposed as a description of the clustering of shots of similar duration in a motion picture. In this paper we derive robust estimates of the mAR index for high grossing films at the US box office using a rank-based autocorrelation function resis tant to the influence of outliers and compare this to estimates obtained using the classical, moment-based autocorrelation function. The results show that (1) The classical mAR index underestimates both the level of shot clustering in a film and the variation in style among the films in the sample; (2) there is a decline in shot clustering from 1935 to the 1950s followed by an increase from the 1960s to the 1980s and a levelling off thereafter rather than the monotonic trend indicated by the classical index, and this is mirrored in the trend of the median shot lengths and interquartile range; and (3) the rank mAR index identifies differences between genres overlooked when using the classical index.
Summary: Longitudinal binary data often arise in clinical trials when repeated measurements, positive or negative to certain tests, are made on the same subject over time. To account for the serial corre lation within subjects, we propose a marginal logistic model which is implemented using the Generalized Estimating Equation (GEE) ap proach with working correlation matrices adopting some widely used forms. The aim of this paper is to seek some robust working correla tion matrices that give consistently good fit to the data. Model-fit is assessed using the modified expected utility of Walker & Guti´errez Pe˜na (1999). To evaluate the effect of the length of time series and the strength of serial correlation on the robustness of various working correlation matrices, the models are demonstrated using three data sets containing respectively all short time series, all long time series and time series of varying length. We identify factors that affect the choice of robust working correlation matrices and give suggestions under different situations.
Abstract: Count data often have excess zeros in many clinical studies. These zeros usually represent “disease-free state”. Although disease (event) free at the time, some of them might be at a high risk of having the putative outcome while others may be at low or no such risk. We postulate these zeros as a one of the two types, either as ‘low risk’ or as ‘high risk’ zeros for the disease process in question. Low risk zeros can arise due to the absence of risk factors for disease initiation/progression and/or due to very early stage of the disease. High risk zeros can arise due to the presence of significant risk factors for disease initiation/ progression or could be, in rare situations, due to misclassification, more specific diagnostic tests, or below the level of detection. We use zero inflated models which allows us to assume that zeros arise from one of the two separate latent processes-one giving low-risk zeros and the other high-risk zeros and subsequently propose a strategy to identify and classify them as such. To illustrate, we use data on the number of involved nodes in breast cancer patients. Of the 1152 patients studied, 38.8% were node- negative (zeros). The model predicted that about a third (11.4%) of negative nodes are “high risk” and the remaining (27.4%) are at “low risk” of nodal positivity. Posterior probability based classification was more appropriate compared to other methods. Our approach indicates that some node negative patients may be re-assessed for their diagnosis about nodal positivity and/or for future clinical management of their disease. The approach developed here is applicable to any scenario where the disease or outcome can be characterized by count-data.
Abstract: Polya tree, by embedding parametric families as a special case, provides natural suit to test goodness of fit of a parametric null with non parametric alternatives. For this purpose, we present a new construction on Polya tree for random probability measure, which aims to perform an easy multiple χ 2 test for goodness of fit. Examples of data analyses are provided in simulation studies to highlight the performance of the proposed methods.
Abstract: This article considers hypothesis testing using Bayes factor in the context of categorical data models represented in two dimensional contingency tables. The study includes multinomial model for a general I × J table data. Other data characteristics such as low as well as polarized cell counts and size of the tables are also considered. The objective is to investigate the sensitivity of Bayes factor taking these features into account so as to understand the performance of non-informative priors itself. Consistency has been studied based on different types of data and using Dirichlet prior with eight different choices for multinomial model followed by a bootstrap simulation. Study has emphasized the reasonable choice of values for the parameters that normally represents the underlying physical phenomena, though partially vague in nature.
Abstract: We propose two classes of nonparametric point estimators of θ = P(X < Y ) in the case where (X, Y ) are paired, possibly dependent, absolutely continuous random variables. The proposed estimators are based on nonparametric estimators of the joint density of (X, Y ) and the distri bution function of Z = Y − X. We explore the use of several density and distribution function estimators and characterise the convergence of the re sulting estimators of θ. We consider the use of bootstrap methods to obtain confidence intervals. The performance of these estimators is illustrated us ing simulated and real data. These examples show that not accounting for pairing and dependence may lead to erroneous conclusions about the rela tionship between X and Y .
Abstract: Background: A fixed effects meta-analysis of ten exercise training in trials heart failure patients was conducted. The aim of this current work was to compare different approaches to meta-analysis using the same dataset from the previous work on ten exercise training trials in heart failure patients. Methods: The following different meta-analysis techniques were used to analyse the data and compared the effects of exercise training on BNP, NT pro-BNP and peak VO2 before and after exercise training: (1) Trial level (traditional) level MA i) Follow up (post-exercise training intervention) outcome only. ii) Baseline-follow up difference (2) Patient level MA by Post-Stage ANCOVA i)naive model does not take into account trial level ii) Single Stage iii) Two Stage (3) Post outcome only i) Single stage ii) Pre-post outcome difference Single stage Results: The Individual patient data (IPD) analyses produced smaller effect sizes and 95% confidence intervals compared to conventional meta analysis. The advantage of the one-stage model is that it allows sub-group analyses, while the two-stage model is considered more robust but limited for sub-analyses. Conclusions: Our recommendation is to use one-stage or two-stage ANCOVA analysis, the former allows sub-group analysis, while the latter is considered to be more technically robust.