Abstract: Central composite design (CCD) is widely applied in many fields to construct a second-order response surface model with quantitative factors to help to increase the precision of the estimated model. When an experiment also includes qualitative factors, the effects between the quantitative and qualitative factors should be taken into consideration. In the present paper, D-optimal designs are investigated for models where the qualitative factors interact with, respectively, the linear effects, or the linear effects and 2-factor interactions or quadratic effects of the quantitative factors. It is shown that, at each qualitative level, the corresponding D-optimal design also consists of three portions as CCD, i.e. the cube design, the axial design and center points, but with different weights. An example about a chemical study is used to demonstrate how the D-optimal design obtained here may help to design an experiment with both quantitative and qualitative factors more efficiently.
Abstract: Sample size and power calculations are often based on a two-group comparison. However, in some instances the group membership cannot be ascertained until after the sample has been collected. In this situation, the respective sizes of each group may not be the same as those prespecified due to binomial variability, which results in a difference in power from that expected. Here we suggest that investigators calculate an “expected power” taking into account the binomial variability of the group member ship, and adjust the sample size accordingly when planning such studies. We explore different scenarios where such an adjustment may or may not be necessary for both continuous and binary responses. In general, the number of additional subjects required depends only slightly on the values of the (standardized) difference in the two group means or proportions, but more importantly on the respective sizes of the group membership. We present tables with adjusted sample sizes for a variety of scenarios that can be readily used by investigators at the study design stage. The proposed approach is motivated by a genetic study of cerebral malaria and a sleep apnea study.
Abstract: Missing values are not uncommon in longitudinal data studies. Missingness could be due to withdrawal from the study (dropout) or intermittent. The missing data mechanism is termed non-ignorable if the probability of missingness depends on the unobserved (missing) observations. This paper presents a model for continuous longitudinal data with non-ignorable non-monotone missing values. Two separate models, for the response and missingness, are assumed. The response is modeled as multivariate nor mal whereas the binomial model for missingness process. Parameters in the adopted model are estimated using the stochastic EM algorithm. The proposed model (approach) is then applied to an example from the International Breast Cancer Study Group.
Abstract: Accurately understanding the distribution of sediment measurements within large water bodies such as Lake Michigan is critical for modeling and understanding of carbon, nitrogen, silica, and phosphorus dynamics. Several water quality models have been formulated and applied to the Great Lakes to investigate the fate and transport of nutrients and other constituents, as well as plankton dynamics. This paper summarizes the development of spatial statistical tools to study and assess the spatial trends of the sediment data sets, which were collected from Lake Michigan, as part of Lake Michigan Mass Balance Study. Several new spatial measurements were developed to quantify the spatial variation and continuity of sediment data sets under concern. The applications of the newly designed spatial measurements on the sediment data, in conjunction with descriptive statistics, clearly reveal the existence of the intrinsic structure of strata, which is hypothesized based on linear wave theory. Furthermore, a new concept of strata consisting of two components defined based on depth is proposed and justified. The findings presented in this paper may impact the future studies of sediment within Lake Michigan and all of the Great Lakes as well.
Abstract: The actions of the anonymous banker in the highstake television gambling programme Deal or No Deal is examined. If a model can successfully predict his behaviour it might suggest that an automatic process is employed to reach his decisions. Potential strategies associated with a number of games are investigated and a model developed for the offers the anonymous banker makes to buy out the player. This approach is developed into a selection strategy of the optimum stage at which a player should accept the money offered. This is reduced to a simple table, by knowing their current position players can rapidly arrive at an appropriate decision strategy with associated probabilities. These probabilities give a guide as to the confidence to be placed in the choice adopted.
Abstract: A core task in analyzing randomized clinical trials based on longitudinal data is to find the best way to describe the change over time for each treatment arm. We review the implementation and estimation of a flexible piecewise Hierarchical Linear Model (HLM) to model change over time. The flexible piecewise HLM consists of two phases with differing rates of change. The breakpoints between these two phases, as well as the rates of change per phase are allowed to vary between treatment groups as well as individuals. While this approach may provide better model fit, how to quantify treatment differences over the longitudinal period is not clear. In this paper, we develop a procedure for summarizing the longitudinal data for the flexible piecewise HLM on the lines of Cook et al. (2004). We focus on quantifying the overall treatment efficacy using the area under the curve (AUC) of the individual flexible piecewise HLM models. Methods are illustrated through data from a placebo-controlled trial in the treatment of depression comparing psychotherapy and pharmacotherapy.
Abstract: In maximum likelihood exploratory factor analysis, the estimates of unique variances can often turn out to be zero or negative, which makes no sense from a statistical point of view. In order to overcome this difficulty, we employ a Bayesian approach by specifying a prior distribution for the variances of unique factors. The factor analysis model is estimated by EM algorithm, for which we provide the expectation and maximization steps within a general framework of EM algorithms. Crucial issues in Bayesian factor analysis model are the choice of adjusted parameters including the number of factors and also the hyper-parameters for the prior distribution. The choice of these parameters can be viewed as a model selection and evaluation problem. We derive a model selection criterion for evaluating a Bayesian factor analysis model. Monte Carlo simulations are conducted to investigate the effectiveness of the proposed procedure. A real data example is also given to illustrate our procedure. We observe that our modeling procedure prevents the occurrence of improper solutions and also chooses the appropriate number of factors objectively.
Abstract: To identify the stand attributes that best explain the variability in wood density, Pinus radiata plantations located in the Chilean coastal sector were studied and modeled. The study area corresponded to stands located in sedimentary soil between the zones of Constituci on and Cobquecura. Within each sampling sector, individual tree variables were recorded and the most relevant stand parameters were estimated. Fifty trees were sampled in each sector, obtaining from each one six wood discs from different stem heights. Each disc was weighed in green and then dried to anhydrous weight, and its basic density was calculated. The profile identification to classify basic density according to stand characteristics was performed through regression trees, a technique based in the use of predictor variables to partition the database using recursive algorithms in regions with similar responses. The objective of the regression tree method is to obtain highly homogenous groups (branches), which are identified using pruning techniques that successively eliminate the branches that least contribute to the classification of the variable of interest. The results found that the stand attributes that contributed significantly to basic density classification were the basal area, the number of trees per hectare, and the mean height.
Abstract: This paper describes and compares three clustering techniques: traditional clustering methods, Kohonen maps and latent class models. The paper also proposes some novel measures of the quality of a clustering. To the best of our knowledge, this is the first contribution in the literature to compare these three techniques in a context where the classes are not known in advance.
Abstract: This investigation utilized a robust logistic regression method (BYLOGREG) to investigate CEO bonuses prior to the 2007-2009 financial crisis. The robust logistic regression analysis determined that the year and CEO tenure affected the probability that a CEO received a bonus in the 2004-2006 study period. The analysis refuted that “management entrenchment” widely influenced CEO bonus compensation because the probability of receiving a bonus was negatively related to CEO tenure. The probability of receiving of bonus declined during the 2004-2006 study period because the percentage of CEOs that received a bonus was lowest in 2006. The robust logistic regression analysis found that the current year stock return was positively and statistically significantly related to the probability that a CEO received a bonus. The analysis also showed that managerial (financial) performance in the areas of growth of sales, ROE, and growth in earnings per share increased the probability that a CEO received a bonus. In this investigation, the size of the firm and the growth rate of equity were not statistically significant. Overall, robust logistic regression correctly classified 77% of the observations on the basis of the model variables, which indicated that most CEO bonuses could be explained by firm, CEO, and financial variables. The BY robust logistic regression proved to be robust to outliers in the CEO bonus sample studied. Interestingly, the relationship between stock return and the probability of a bonus was completely missed by a maximum likelihood (ML) logistic regression with the full CEO bonus sample, which contained outliers. After trimming the CEO bonus data set to remove outliers, the ML logistic regression coefficients changed dramatically. However, the BY robust logistic regression coefficients changed very little. Use of the residuals from the BY robust logistic equation should facilitate further inquiry into CEOs that received a bonus but were predicted to have a low probability of a bonus.