Abstract: Controlled experiments give researchers a statistical tool for determining the yield from subjecting an experimental unit to various treat ments. We will discuss a replicated, block design applied to the experimental unit yeast. We subjected the yeast to six treatments. The purpose of the experiment is to extract a compound to be used in the manufacturing in dustry. We considered an ANOVA and a MANOVA model to analyze the data. The rationale for selecting one model over the other will be discussed. Results and recommendations of which treatments to use when processing the yeast will be presented, also.
Abstract: In this paper, we obtain several estimators of a scale parameter of Morgenstern type bivariate uniform distribution (MTBUD) based on the observations made on the units of the ranked set sampling regarding the study variable Y which is correlated with the auxiliary variable X, when (X, Y ) follows a MTBUD. Efficiency comparisons among these estimators are also made in this work. Finally, we illustrate the methods developed by using a real data set.
Abstract: Traditional loss reserves models focus on the mean of the conditional loss distribution. If the factors driving high claims differ systematically from those driving medium to low claims, alternative models that differentiate such differences are required. We propose quantile regression model loss reserving as the model offers potentially different solutions at distinct quantiles so that the effects of risk factors are differentiated at different points of the conditional loss distribution. Due to its nonparametric nature, quantile regression is free of the model assumptions for traditional mean regression models, including homogeneous variance across risk factors and symmetric and light tails, etc. These model assumptions have posed a great barrier in applications as they are often not met in the claim data. Using two sets of run-off triangle claim data from Israel and Queensland, Australia, we present the quantile regression approach that illustrates the sensitivity of claim size to risk factors, namely the trend pattern and initial claim level, in different quantiles. Trained models are applied to predict future claims in the lower run-off triangle. Findings suggest that reliance on standard loss reserves techniques gives rise to misleading inferences and that claim size is not homogeneously driven by the same risk factors across quantiles.
Abstract: It is known that “standard methods for estimating the causal effect of a time-varying treatment on the mean of a repeated measures outcome (for example, GEE regression) may be biased when there are time-dependent variables that are simultaneously confounders of the effect of interest and are predicted by previous treatment” (Hern´an et al. 2002). Inverse-probability of treatment weighted (IPTW) methods are developed in the literature of causal inference. In genetic studies, however, the main interest is to estimate or test the genetic effect rather than the treatment effect. In this work, we describe an IPTW method that provides unbiased estimate for the genetic effect, and discuss how to develop a family-based association test using IPTW for family-based studies. We apply the developed methods to systolic blood pressure data in Framingham Heart Study, where some subjects took antihypertensive treatment during the course of study.
Abstract: Clustered binary samples arise often in biomedical investigations. An important feature of such samples is that the binary responses within clusters tend to be correlated. The Beta-Binomial model is commonly applied to account for the intra-cluster correlation – the correlation between responses within the clusters – among dichotomous outcomes in cluster sampling. The intracluster correlation coefficient (ICC) quantifies this correlation or level of similarity. In this paper, we propose Bayesian point and interval estimators for the ICC under the Beta-Binomial model. Using Laplace’s method, the asymptotic posterior distribution of the ICC is approximated by a normal distribution. The posterior mean of this normal density is used as a central point estimator for the ICC, and 95% credible sets are calculated. A Monte Carlo simulation is used to evaluate the coverage probability and average length of the credible set of the proposed interval estimator. The simulations indicate that for the situation when the number of clusters is above 40, the underlying mean response probability falls in the range of [0.3;0.7], and the underlying ICC values are ≤ 0.4, the proposed interval estimator performs quite well and attains the correct coverage level. Even for number of clusters as small as 20, the proposed interval estimator may still be useful in the case of small ICC (≤ 0.2).
Abstract: A limited number of studies have utilized multiple causes of death to investigate infant mortality patterns. The purpose of the present study was to examine the risk distribution of underlying and multiple causes of infant death for congenital anomalies, short gestation/low birth weight (LBW), respiratory conditions, infections, sudden infant death syndrome and external causes across four gestational age groups, namely ≤ 23, 24 − 30, 31 − 36, ≥ 37, and determine the extent to which mortality from each condition is underestimated when only the underlying cause of death is used. The data were obtained from the North Carolina linked birth/infant death files (1999 to 2003) and included 4908 death records. The findings of this study indicate that infants born less than 30 weeks old are more likely (odds ratio ranging from 1.99 to 6.03) to have multiple causes recorded when the underlying cause is congenital anomalies, respiratory conditions and infec tions in comparison to infants whose gestational age is at least 37 weeks. The underlying cause of death underestimated mortality for a number of cause specific deaths including short gestation/LBW, respiratory conditions, infec tions and external causes. This was particularly evident among infants born preterm. Based on these findings, it is recommended that multiple causes, whenever available, should be studied in conjunction with the underlying cause of death data.
Abstract: A new approach for analyzing state duration data in brand-choice studies is explored. This approach not only incorporates the correlation among repeated purchases for a subject, it also models the purchase timing and the brand decision jointly. The former is accomplished by applying transition model approaches from longitudinal studies while the latter is done by conditioning on the brand choice variable. Then mixed multinomial logit models and Cox proportional hazards models are employed to model the marginal densities of the brand choice and the conditional densities of the interpurchase time given the brand choice. We illustrate the approach using a Nielsen household scanner panel data set
Abstract: Exploratory data analysis has become more important as large rich data sets become available, with many explanatory variables representing competing theoretical constructs. The restrictive assumptions of linearity and additivity of effects as in regression are no longer necessary to save degrees of freedom. Where there is a clear criterion (dependent) variable or classification, sequential binary segmentation (tree) programs are being used. We explain why, using the current enhanced version (SEARCH) of the original Automatic Interaction Detector program as an illustration. Even the simple example uncovers an interaction that might well have been missed with the usual multivariate regression. We then suggest some promising uses and provide one simple example.