Comparison of Distance Measures in Cluster Analysis with Dichotomous Data
Volume 3, Issue 1 (2005), pp. 85–100
Pub. online: 4 August 2022
Type: Research Article
Open Access
Published
4 August 2022
4 August 2022
Abstract
Abstract: The current study examines the performance of cluster analysis with dichotomous data using distance measures based on response pattern similarity. In many contexts, such as educational and psychological testing, cluster analysis is a useful means for exploring datasets and identifying underlying groups among individuals. However, standard approaches to cluster analysis assume that the variables used to group observations are continuous in nature. This paper focuses on four methods for calculating distance between individuals using dichotomous data, and the subsequent introduction of these distances to a clustering algorithm such as Ward’s. The four methods in question, are potentially useful for practitioners because they are relatively easy to carry out using standard statistical software such as SAS and SPSS, and have been shown to have potential for correctly grouping observations based on dichotomous data. Results of both a simulation study and application to a set of binary survey responses show that three of the four measures behave similarly, and can yield correct cluster recovery rates of between 60% and 90%. Furthermore, these methods were found to work better, in nearly all cases, than using the raw data with Ward’s clustering algorithm.