Abstract: Objectives: Exploratory Factor Analysis (EFA) is a very popular statistical technique for identifying potential latent structure underlying a set of observed indicator variables. EFA is used widely in the social sciences, business and finance, machine learning, and the health sciences, among others. Research has found that standard methods of estimating EFA model parameters do not work well when the sample size is relatively small (e.g. less than 50) and/or when the number of observed variables approaches the sample size in value. The purpose of the current study was to investigate and compare some alternative approaches to fitting EFA in the case of small samples and high dimensional data. Results of both a small simulation study, and an application of the methods to an intelligence test revealed that several alternative approaches designed to reduce the dimensionality of the observed variable covariance matrix worked very well in terms of recovering population factor structure with EFA. Implications of these results for practice are discussed..
Abstract: Missing data are a common problem for researchers working with surveys and other types of questionnaires. Often, respondents do not respond to one or more items, making the conduct of statistical analyses, as well as the calculation of scores difficult. A number of methods have been developed for dealing with missing data, though most of these have focused on continuous variables. It is not clear that these techniques for imputation are appropriate for the categorical items that make up surveys. However, methods of imputation specifically designed for categorical data are either limited in terms of the number of variables they can accommodate, or have not been fully compared with the continuous data approaches used with categorical variables. The goal of the current study was to compare the performance of these explicitly categorical imputation approaches with the more well established continuous method used with categorical item responses. Results of the simulation study based on real data demonstrate that the continuous based imputation approach and a categorical method based on stochastic regression appear to perform well in terms of creating data that match the complete datasets in terms of logistic regression results.
Abstract: The current study examines the performance of cluster analysis with dichotomous data using distance measures based on response pattern similarity. In many contexts, such as educational and psychological testing, cluster analysis is a useful means for exploring datasets and identifying underlying groups among individuals. However, standard approaches to cluster analysis assume that the variables used to group observations are continuous in nature. This paper focuses on four methods for calculating distance between individuals using dichotomous data, and the subsequent introduction of these distances to a clustering algorithm such as Ward’s. The four methods in question, are potentially useful for practitioners because they are relatively easy to carry out using standard statistical software such as SAS and SPSS, and have been shown to have potential for correctly grouping observations based on dichotomous data. Results of both a simulation study and application to a set of binary survey responses show that three of the four measures behave similarly, and can yield correct cluster recovery rates of between 60% and 90%. Furthermore, these methods were found to work better, in nearly all cases, than using the raw data with Ward’s clustering algorithm.