Abstract: Missing data is a common problem in statistical analyses. To make use of information in data with incomplete observation, missing values can be imputed so that standard statistical methods can be used to analyze the data. Variables with missing values are often categorical and the miss ing pattern may not be monotone. Currently, commonly used imputation methods for data with a non-monotone missing pattern do not allow di rect inclusion of categorical variables. Categorical variables are converted to numerical variables before imputation. For many applications, the imputed numerical values for those categorical variables must then be converted back to categorical values. However, this conversion introduces bias which can seriously affect subsequent analyses. In this paper, we propose two direct imputation methods for categorical variables with a non-monotone missing pattern: the direct imputation approach incorporated with the expectation maximization algorithm and the direct imputation approach incorporated with a new algorithm: the imputation-maximization algorithm. Simulation studies show that both methods perform better than the method using vari able conversion. An application to real data is provided to compare the direct imputation method and the method using variable conversion.
Abstract: The association between bivariate binary responses has been studied using Pearson’s correlation coefficient, odds ratio, and tetrachoric correlation coefficient. This paper introduces a copula to model the association. Numerical comparisons between the proposed method and the existing methods are presented. Results show that these methods are comparative. However, the copula method has a clearer interpretation and is easier to extend to bivariate responses with three or more ordinal categories. In addition, a goodness-of-fit test for the selection of a model is performed. Applications of the method on two real data sets are also presented.
Abstract: Family background factor can be a very important part of a person’s life. One of the main interests of this paper is to investigate whether the family background factors alter performance on mathematical achievement of the stronger students the same way that weaker students are affected. Using large sample of 2000, 2001 and 2002 mathematics participation in Alberta, Canada, such questions have been investigated by means of quantile regression approach. The findings suggest that there may be differential family-background-factor effects at different points in the conditional distribution of mathematical achievements.
Abstract: When comparing two independent groups, the shift function compares all of the quantiles in a manner that controls the probability of at least one Type I error, assuming random sampling only. Moreover, it provides a much more detailed sense of how groups compare, versus using a single measure of location, and the associated plot of the data can yield valuable insights. This note examines the small-sample properties of an ex tension of the shift function where the goal is to compare the distributions of two specified linear sums of the random variables under study, with an emphasis on a two-by-two design. A very simple method controls the proba bility of a Type I error. Moreover, very little power is lost versus comparing means when sampling is from normal distributions with equal variances.
This paper examines the performance of different kind of GARCH models with Gaussian, Student-t and generalized error distribution for Colombo Stock Exchange (CSE), in Sri Lanka. Analyzing the daily closing price index of CSE from January 02, 2007 to March 10, 2013. It was found that the Asymmetric GARCH models give better result than symmetric GARCH model. According to distributional assumption these models under Student-t as well as generalized error provided better fit than normal distributional assumption. The Non-Parametric Specification test suggest that the GARCH, EGARCH, TARCH and APARCH models with Student-t distributional assumption are the most successful model for CSE.
Abstract: We propose a simple method for evaluating agreement between methods of measurement when the measured variable is continuous and the data consists of matched repeated observations made with the same method under different conditions. The conditions may represent different time points, raters, laboratories, treatments, etc. Our approach allows the values of the measured variable and the magnitude of disagreement to vary across the conditions. The coefficient of individual agreement (CIA), which is based on the comparison of the between and within-methods mean squared deviation (MSD) is used to quantify the magnitude of agreement between measurement methods. The new approach is illustrated via two examples from studies designed to compare (a) methods of evaluating carotid stenosis and (b) methods of measuring percent body fat.
Pub. online:4 Aug 2022Type:Research ArticleOpen Access
Journal:Journal of Data Science
Volume 18, Issue 3 (2020): Special issue: Data Science in Action in Response to the Outbreak of COVID-19, pp. 455–472
Abstract
We propose a varying coefficient Susceptible-Infected-Removal (vSIR) model that allows changing infection and removal rates for the latest corona virus (COVID-19) outbreak in China. The vSIR model together with proposed estimation procedures allow one to track the reproductivity of the COVID-19 through time and to assess the effectiveness of the control measures implemented since Jan 23 2020 when the city of Wuhan was lockdown followed by an extremely high level of self-isolation in the population. Our study finds that the reproductivity of COVID-19 had been significantly slowed down in the three weeks from January 27th to February 17th with 96.3% and
95.1% reductions in the effective reproduction numbers R among the 30 provinces and 15 Hubei cities, respectively. Predictions to the ending times and the total numbers of infected are made under three scenarios of the removal rates. The paper provides a timely model and associated estimation and prediction methods which may be applied in other countries to track, assess and predict the epidemic of the COVID-19 or other infectious diseases
Abstract: This paper introduces a new four parameters model called the Weibull Generalized Flexible Weibull extension (WGFWE) distribution which exhibits bathtub-shaped hazard rate. Some of it’s statistical properties are obtained including ordinary and incomplete moments, quantile and generating functions, reliability and order statistics. The method of maximum likelihood is used for estimating the model parameters and the observed Fisher’s information matrix is derived. We illustrate the usefulness of the proposed model by applications to real data.
Abstract: Mosaic plots are state-of-the-art graphics for multivariate categor ical data in statistical visualization. Knowledge structures are mathematical models that belong to the theory of knowledge spaces in psychometrics. This paper presents an application of mosaic plots to psychometric data arising from underlying knowledge structure models. In simulation trials and with empirical data, the scope of this graphing method in knowledge space theory is investigated.