Longitudinal data analysis had been widely developed in the past three decades. Longitudinal data are common in many fields such as public health, medicine, biological and social sciences. Longitudinal data have special nature as the individual may be observed during a long period of time. Hence, missing values are common in longitudinal data. The presence of missing values leads to biased results and complicates the analysis. The missing values have two patterns: intermittent and dropout. The missing data mechanisms are missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). The appropriate analysis relies heavily on the assumed mechanism and pattern. The parametric fractional imputation is developed to handle longitudinal data with intermittent missing pattern. The maximum likelihood estimates are obtained and the Jackkife method is used to obtain the standard errors of the parameters estimates. Finally a simulation study is conducted to validate the proposed approach. Also, the proposed approach is applied to a real data.
Summary: Longitudinal binary data often arise in clinical trials when repeated measurements, positive or negative to certain tests, are made on the same subject over time. To account for the serial corre lation within subjects, we propose a marginal logistic model which is implemented using the Generalized Estimating Equation (GEE) ap proach with working correlation matrices adopting some widely used forms. The aim of this paper is to seek some robust working correla tion matrices that give consistently good fit to the data. Model-fit is assessed using the modified expected utility of Walker & Guti´errez Pe˜na (1999). To evaluate the effect of the length of time series and the strength of serial correlation on the robustness of various working correlation matrices, the models are demonstrated using three data sets containing respectively all short time series, all long time series and time series of varying length. We identify factors that affect the choice of robust working correlation matrices and give suggestions under different situations.
Abstract: Mixed effects models are often used for estimating fixed effects and variance components in continuous longitudinal outcomes. An EM based estimation approach for mixed effects models when the outcomes are truncated was proposed by Hughes (1999). We consider the situation when the longitudinal outcomes are also subject to non-ignorable missing in addition to truncation. A shared random effect parameter model is presented where the missing data mechanism depends on the random effects used to model the longitudinal outcomes. Data from the Indianapolis-Ibadan dementia project is used to illustrate the proposed approach
Abstract: Here we develop methods for applications where random change points are known to be present a priori and the interest lies in their estimation and investigating risk factors that influence them. A simple least square method estimating each individual’s change point based on one’s own observations is first proposed. An easy-to-compute empirical Bayes type shrinkage is then proposed to pool information from separately estimated change points. A method to improve the empirical Bayes estimates is developed. Simulations are conducted to compare least-square estimates and Bayes shrinkage estimates. The proposed methods are applied to the Berkeley Growth Study data to estimate the transition age of the puberty height growth.
Clustering is an essential technique for discovering patterns in data. Many clustering algorithms have been developed to tackle the ever increasing quantity and complexity of data, yet algorithms that can cluster data with mixed variables (continuous and categorical) remain limited despite the abundance of mixed-type data. Of the existing clustering methods for mixed data types, some posit unverifiable distributional assumptions or rest on unbalanced contributions of different variable types. To address these issues, we propose a two-step hybrid density- and partition-based (HyDaP) algorithm to detect clusters after variable selection. The first step involves both density-based and partition-based algorithms to identify the data structure formed by continuous variables and determine important variables (both continuous and categorical) for clustering. The second step involves a partition-based algorithm together with our proposed novel dissimilarity measure to obtain clustering results. Simulations across various scenarios were conducted to compare the HyDaP algorithm with other commonly used methods. Our HyDaP algorithm was applied to identify sepsis phenotypes and yielded important results.