Abstract: Longitudinal studies represent one of the principal research strategies employed in medical and social research. These studies are the most appropriate for studying individual change over time. The prematurely withdrawal of some subjects from the study (dropout) is termed nonrandom when the probability of missingness depends on the missing value. Nonrandom dropout is common phenomenon associated with longitudinal data and it complicates statistical inference. Linear mixed effects model is used to fit longitudinal data in the presence of nonrandom dropout. The stochastic EM algorithm is developed to obtain the model parameter estimates. Also, parameter estimates of the dropout model have been obtained. Standard errors of estimates have been calculated using the developed Monte Carlo method. All these methods are applied to two data sets.
Longitudinal data analysis had been widely developed in the past three decades. Longitudinal data are common in many fields such as public health, medicine, biological and social sciences. Longitudinal data have special nature as the individual may be observed during a long period of time. Hence, missing values are common in longitudinal data. The presence of missing values leads to biased results and complicates the analysis. The missing values have two patterns: intermittent and dropout. The missing data mechanisms are missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). The appropriate analysis relies heavily on the assumed mechanism and pattern. The parametric fractional imputation is developed to handle longitudinal data with intermittent missing pattern. The maximum likelihood estimates are obtained and the Jackkife method is used to obtain the standard errors of the parameters estimates. Finally a simulation study is conducted to validate the proposed approach. Also, the proposed approach is applied to a real data.
Summary: Longitudinal binary data often arise in clinical trials when repeated measurements, positive or negative to certain tests, are made on the same subject over time. To account for the serial corre lation within subjects, we propose a marginal logistic model which is implemented using the Generalized Estimating Equation (GEE) ap proach with working correlation matrices adopting some widely used forms. The aim of this paper is to seek some robust working correla tion matrices that give consistently good fit to the data. Model-fit is assessed using the modified expected utility of Walker & Guti´errez Pe˜na (1999). To evaluate the effect of the length of time series and the strength of serial correlation on the robustness of various working correlation matrices, the models are demonstrated using three data sets containing respectively all short time series, all long time series and time series of varying length. We identify factors that affect the choice of robust working correlation matrices and give suggestions under different situations.
Abstract: Mixed effects models are often used for estimating fixed effects and variance components in continuous longitudinal outcomes. An EM based estimation approach for mixed effects models when the outcomes are truncated was proposed by Hughes (1999). We consider the situation when the longitudinal outcomes are also subject to non-ignorable missing in addition to truncation. A shared random effect parameter model is presented where the missing data mechanism depends on the random effects used to model the longitudinal outcomes. Data from the Indianapolis-Ibadan dementia project is used to illustrate the proposed approach
Abstract: Here we develop methods for applications where random change points are known to be present a priori and the interest lies in their estimation and investigating risk factors that influence them. A simple least square method estimating each individual’s change point based on one’s own observations is first proposed. An easy-to-compute empirical Bayes type shrinkage is then proposed to pool information from separately estimated change points. A method to improve the empirical Bayes estimates is developed. Simulations are conducted to compare least-square estimates and Bayes shrinkage estimates. The proposed methods are applied to the Berkeley Growth Study data to estimate the transition age of the puberty height growth.
As data acquisition technologies advance, longitudinal analysis is facing challenges of exploring complex feature patterns from high-dimensional data and modeling potential temporally lagged effects of features on a response. We propose a tensor-based model to analyze multidimensional data. It simultaneously discovers patterns in features and reveals whether features observed at past time points have impact on current outcomes. The model coefficient, a k-mode tensor, is decomposed into a summation of k tensors of the same dimension. We introduce a so-called latent F-1 norm that can be applied to the coefficient tensor to performed structured selection of features. Specifically, features will be selected along each mode of the tensor. The proposed model takes into account within-subject correlations by employing a tensor-based quadratic inference function. An asymptotic analysis shows that our model can identify true support when the sample size approaches to infinity. To solve the corresponding optimization problem, we develop a linearized block coordinate descent algorithm and prove its convergence for a fixed sample size. Computational results on synthetic datasets and real-life fMRI and EEG datasets demonstrate the superior performance of the proposed approach over existing techniques.
Climate change is widely recognized as one of the most challenging, urgent and complex problem facing humanity. There are rising interests in understanding and quantifying climate changing. We analyze the climate trend in Canada using Canadian monthly surface air temperature, which is longitudinal data in nature with long time span. Analysis of such data is challenging due to the complexity of modeling and associated computation burdens. In this paper, we divide this type of longitudinal data into time blocks, conduct multivariate regression and utilize a vine copula model to account for the dependence among the multivariate error terms. This vine copula model allows separate specification of within-block and between-block dependence structure and has great flexibility of modeling complex association structures. To release the computational burden and concentrate on the structure of interest, we construct composite likelihood functions, which leave the connecting structure between time blocks unspecified. We discuss different estimation procedures and issues regarding model selection and prediction. We explore the prediction performance of our vine copula model by extensive simulation studies. An analysis of the Canada climate dataset is provided.