Abstract: Two-part random effects models have been used to fit semi-continuous longitudinal data where the response variable has a point mass at 0 and a con tinuous right-skewed distribution for positive values. We review methods pro posed in the literature for analyzing data with excess zeros. A two-part logit-log normal random effects model, a two-part logit-truncated normal random effects model, a two-part logit-gamma random effects model, and a two-part logit-skew normal random effects model were used to examine effects of a bottle-weaning intervention on reducing bottle use and daily milk intake from bottles in toddlers aged 11 to 13 months in a randomized controlled trial. We show in all four two-part models that the intervention promoted bottle-weaning and reduced daily milk intake from bottles in toddlers drinking from a bottle. We also show that there are no differences in model fit using either the logit link function or the probit link function for modeling the probability of bottle-weaning in all four models. Furthermore, prediction accuracy of the logit or probit link function is not sensitive to the distribution assumption on daily milk intake from bottles in toddlers not off bottles.
In square contingency tables, analysis of agreement between row and column classifications is of interest. For nominal categories, kappa co- efficient is used to summarize the degree of agreement between two raters. Numerous extensions and generalizations of kappa statistics have been pro- posed in the literature. In addition to the kappa coefficient, several authors use agreement in terms of log-linear models. This paper focuses on the approaches to study of interrater agreement for contingency tables with nominal or ordinal categories for multiraters. In this article, we present a detailed overview of agreement studies and illustrate use of the approaches in the evaluation agreement over three numerical examples.
Abstract: The paper considers the problem of comparing measures of lo cation associated with two dependent groups when values are missing at random, with an emphasis on robust measures of location. It is known that simply imputing missing values can be unsatisfactory when testing hypothe ses about means, so the goal here is to compare several alternative strategies that use all of the available data. Included are results on comparing means and a 20% trimmed mean. Yet another method is based on the usual median but differs from the other methods in a manner that is made obvious. (It is somewhat related to the formulation of the Wilcoxon-Mann-Whitney test for independent groups.) The strategies are compared in terms of Type I error probabilities and power.
Abstract: High resolution of NMR spectroscopic data of biosamples are a rich source of information on the metabolic response to physiological variation or pathological events. There are many advantages of NMR techniques such as the sample preparation is fast, simple and non-invasive. Statistical analysis of NMR spectra usually focuses on differential expression of large resonance intensity corresponding to abundant metabolites and involves several data preprocessing steps. In this paper we estimate functional components of spectra and test their significance using multiscale techniques. We also explore scaling in NMR spectra and use the systematic variability of scaling descriptors to predict the level of cysteine, an important precursor of glutathione, a control antioxidant in human body. This is motivated by high cost (in time and resources) of traditional methods for assessing cysteine level by high performance liquid chromatograph (HPLC).
Abstract: Here we develop methods for applications where random change points are known to be present a priori and the interest lies in their estimation and investigating risk factors that influence them. A simple least square method estimating each individual’s change point based on one’s own observations is first proposed. An easy-to-compute empirical Bayes type shrinkage is then proposed to pool information from separately estimated change points. A method to improve the empirical Bayes estimates is developed. Simulations are conducted to compare least-square estimates and Bayes shrinkage estimates. The proposed methods are applied to the Berkeley Growth Study data to estimate the transition age of the puberty height growth.
Abstract: Receiver operating characteristic (ROC) methodology is widely used to evaluate diagnostic tests. It is not uncommon in medical practice that multiple diagnostic tests are applied to the same study sample. A va riety of methods have been proposed to combine such potentially correlated tests to increase the diagnostic accuracy. Usually the optimum combina tion is searched based on the area under a ROC curve (AUC), an overall summary statistics that measures the distance between the distributions of diseased and non-diseased populations. For many clinical practitioners, however, a more relevant question of interest may be ”what the sensitivity would be for a given specificity (say, 90%) or what the specificity would be for a given sensitivity?”. Generally there is no unique linear combination superior to all others over the entire range of specificities or sensitivities. Under the framework of a ROC curve, in this paper we presented a method to estimate an optimum linear combination maximizing sensitivity at a fixed specificity while assuming a multivariate normal distribution in diagnostic tests. The method was applied to a real-world study where the accuracy of two biomarkers was evaluated in the diagnosis of pancreatic cancer. The performance of the method was also evaluated by simulation studies.
Abstract: For longitudinal binary data with non-monotone non-ignorable missing outcomes over time, a full likelihood approach is complicated alge braically, and maximum likelihood estimation can be computationally pro hibitive with many times of follow-up. We propose pseudo-likelihoods to estimate the covariate effects on the marginal probabilities of the outcomes, in addition to the association parameters and missingness parameters. The pseudo-likelihood requires specification of the distribution for the data at all pairs of times on the same subject, but makes no assumptions about the joint distribution of the data at three or more times on the same sub ject, so the method can be considered semi-parametric. If using maximum likelihood, the full likelihood must be correctly specified in order to obtain consistent estimates. We show in simulations that our proposed pseudo likelihood produces a more efficient estimate of the regression parameters than the pseudo-likelihood for non-ignorable missingness proposed by Troxel et al. (1998). Application to data from the Six Cities study (Ware, et.al, 1984), a longitudinal study of the health effects of air pollution, is discussed.
Abstract: The aim of this paper is to determine the effectiveness of cross association in detecting the similarity between correlated geological columnar sections. For this purpose, cross association is used to compare several geological columnar sections which are arbitrarily selected from different localities in central and north Jordan. It turns out, for most of the study cases, that the sections which consist of the same rock units (formations) are statistically classified as similar (p-value .05), while sections of different rock units (formations) are statistically classified as dissimilar (p-value .05).
Abstract: Applications of multivariate statistical techniques, including graphical models, are seldom found in e-commerce studies. However, as this paper demonstrates, we find that probabilistic graphical models are useful in this area, both because of their ability to handle large numbers of potentially interrelated variables, and because of their ability to communicate statistical relationships clearly to both the researcher and the ultimate business audience. We show an application of this methodology to intranets, internal corporate information systems employing Internet technology. In particular, we study both the interrelationships among intranet benefits and the interrelationships among intranet applications. This approach confirms some hypothesized relationships, and uncovers heretofore-unanticipated relationships among intranet variables, providing guidance for business professionals seeking to develop effective intranet systems. The techniques described here also have potential applicability in other e-commerce arenas, including business-to-consumer and business-to-business applications.
Abstract: Examining the daily Dow Jones Industrial Average (DJI) we find evidence both of higher-order anomalies and predictability. While most researchers are only aware of the relatively harmless anomalies that occur just in the mean, the first part of this article provides empirical evidence of more dangerous kinds of anomalies occurring in higher-order moments. This evidence casts some doubt on the common practice of fitting standard time series models (e.g., ARMA models, GARCH models, or stochastic volatility models) to financial time series and carrying out tests based upon autocorre lation coefficients without making proper provision for these anomalies. The second part of this article provides evidence in favor of the predictability of the returns on the DJI and, more interestingly, against the efficient market hypothesis. The special value of this evidence is due to the simplicity of the involved methods.