Abstract: Exploratory data analysis has become more important as large rich data sets become available, with many explanatory variables representing competing theoretical constructs. The restrictive assumptions of linearity and additivity of effects as in regression are no longer necessary to save degrees of freedom. Where there is a clear criterion (dependent) variable or classification, sequential binary segmentation (tree) programs are being used. We explain why, using the current enhanced version (SEARCH) of the original Automatic Interaction Detector program as an illustration. Even the simple example uncovers an interaction that might well have been missed with the usual multivariate regression. We then suggest some promising uses and provide one simple example.
Abstract: A diagnostic defined in terms of the Kullback-Leibler directed divergence is developed for identifying cases which impact the prediction of the random effects in a mixed model. The diagnostic compares two conditional densities governing the prediction of the random effects: one based on parameter estimates computed using the full data set, the other based on parameter estimates computed using a case-deleted data set. We present the definition of the diagnostic and derive a formula for its evaluation. Its performance is investigated in an application where exam scores are modeled using a mixed model containing a fixed exam effect and a random subject effect.
Abstract: Linear mixed models are extremely sensitive to outlying responses and extreme points in the fixed and random effect design spaces. Few diagnostics are available in standard computing packages. We provide routine diagnostic tools, which are computationally inexpensive. The diagnostics are functions of basic building blocks: studentized residuals, error contrast matrix, and the inverse of the response variable covariance matrix. The basic building blocks are computed only once from the complete data analysis and provide information on the influence of the data on different aspects of the model fit. Numerical examples provide analysts with the complete pictures of the diagnostics.
Abstract: This article presents and illustrates several important subset design approaches for Gaussian nonlinear regression models and for linear models where interest lies in a nonlinear function of the model parameters. These design strategies are particularly useful in situations where currentlyused subset design procedures fail to provide designs which can be used to fit the model function. Our original design technique is illustrated in conjuction with D-optimality, Bayesian D-optimality and Kiefer’s Φk-optimality, and is extended to yield subset designs which take account of curvature.
Abstract: In many clinical trials, information is collected on both the frequency of event occurrence and the severity of each event. For example, in evaluating a new anti-epileptic medication both the total number of seizures a patient has during the study period as well as the severity (e.g., mild, severe) of each seizure could be measured. In order to arrive at a full picture of drug or treatment performance, one needs to jointly model the number of events and their correlated ordinal severity measures. A separate analysis is not recommended as it is inefficient and can lead to what we define as “zero length bias” in estimates of treatment effect on severity. This paper proposes a general, likelihood based, marginal regression model for jointly modeling the number of events and their correlated ordinal severity measures. We describe parameter estimation issues and derive the Fisher information matrix for the joint model in order to obtain the asymptotic covariance matrix of the parameter estimates. A limited simulation study is conducted to examine the asymptotic properties of the maximum likelihood estimators. Using this joint model, we propose tests that incorporate information from both the number of events and their correlated ordinal severity measures. The methodology is illustrated with two examples from clinical trials: the first concerning a new drug treatment for epilepsy; the second evaluating the effect of a cholesterol lowering medication on coronary artery disease.
Abstract: The aim of this paper is to identify the effects of socioeconomic factors and family planning program effort on total fertility rate with national level data from forty-three developing countries. The data used have mainly been taken from the secondary source “Family Planning and Child Survival: 100 Developing Countries” compiled by the Center for Population and Family Health, Columbia University. Because the independent variables were found to be highly correlated among themselves, component regression technique has been used to analyze the data. The analysis shows that the family planning program effort has the largest contribution in lowering the total fertility rate, followed by percent of urban population, female literacy rate, and infant mortality rate in that order. Policy implications are discussed.
Abstract: There has been great interest in the Southern Illinois mine war by historians. An explanation has been that this war was caused by miners who had radical political beliefs. We examine this view by applying four methods of ecological inference to estimate the proportion of coal miners who were socialist voters in this time period. Based on these results (especially considering the assumptions of the methods) we conclude that miners were politically less radical than previously thought.