Clusters, Trends, and Choices: Feature Selection in Interactive Statistical Graphics
Pub. online: 26 March 2026
Type: Statistical Data Science
Open Access
Received
15 August 2025
15 August 2025
Accepted
21 February 2026
21 February 2026
Published
26 March 2026
26 March 2026
Abstract
This study investigates how user ability to manipulate plot features affects graphical perception, by extending a previous graphical study (Vanderplas and Hofmann, 2017) with an interactive framework. Similar to the original study, statistical lineups included two target patterns (a linear trend and a clustering pattern), as well as eighteen null plots generated from three different mixture proportions of the combined cluster and trend models. Participants were asked to select two plots that they perceived as ‘most different’, and were able to interact with the graphics by toggling aesthetic features such as cluster coloring, cluster ellipses, linear trendlines, and regression error bands.
We found that toggle workflow varied across participants, revealing a divide between “maximalists,” who enabled all features, and “minimalists,” who used few or none, with most toggling occurring before the first selection. Starting features aesthetics did not have a significant effect on target choice. A generalized linear mixed model identified mixture proportion as the strongest predictor of target selection, with additional interactions involving the enabled ending features. These findings contribute to understanding how users engage with interactive graphical tools and how such tools support data interpretation in exploratory data analysis.
Supplementary material
Supplementary Material
•
Shiny App Code: The code used to replicate the study Shiny app can be accessed at https://github.com/earobinson95/interactive-lineup-study-applet.
•
accuracy_clean.csv : De-identified participant accuracy data collected in the study and used for accuracy analyses.
•
toggles_clean.csv : De-identified participant toggle moves data collected in the study and used for understanding toggle workflow analyses.
•
analysis.qmd The code used to replicate the analyses presented in this paper.
References
Bates D, Mächler M, Bolker B, Walker S (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1): 1–48. https://doi.org/10.18637/jss.v067.i01
Buja A, Cook D, Hofmann H, Lawrence M, Lee EK, …, Wickham H (2009). Statistical inference for exploratory data analysis and model diagnostics. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 367(1906): 4361–4383. https://doi.org/10.1098/rsta.2009.0120
Chowdhury NR, Cook D, Hofmann H, Majumder M (2018). Measuring lineup difficulty by matching distance metrics with subject choices in crowd-sourced data. Journal of Computational and Graphical Statistics, 27(1): 132–145. https://doi.org/10.1080/10618600.2017.1356323
Cleveland WS, McGill R (1987). Graphical perception: The visual decoding of quantitative information on graphical displays of data. Journal of the Royal Statistical Society: Series A (General), 150(3): 192–210. https://doi.org/10.2307/2981473
Correll M, Li M, Kindlmann G, Scheidegger C (2019). Looks good to me: Visualizations as sanity checks. IEEE Transactions on Visualization and Computer Graphics, 25(1): 830–839. https://doi.org/10.1109/TVCG.2018.2864907
Glicksohn A, Cohen A (2011). The role of gestalt grouping principles in visual statistical learning. Attention, Perception, & Psychophysics, 73(3): 708–713. https://doi.org/10.3758/s13414-010-0084-4
Hofmann H, Follett L, Majumder M, Cook D (2012). Graphical tests for power comparison of competing designs. IEEE Transactions on Visualization and Computer Graphics, 18(12): 2441–2448. Conference Name: IEEE Transactions on Visualization and Computer Graphics. https://doi.org/10.1109/TVCG.2012.230
Hullman J, Gelman A (2021). Designing for interactive exploratory data analysis requires theories of graphical inference. Harvard Data Science Review, 3(3): 10–1162. https://doi.org/10.1162/99608f92.3ab8a587
Komorowski M, Marshall D, Salciccioli J, Crutain Y (2016). Exploratory Data Analysis. Chapter 15 in Secondary Analysis of Electronic Health Records. Springer, Cham. https://doi.org/10.1007/978-3-319-43742-2_15
Lewandowsky S, Spence I (1989). The perception of statistical graphs. Sociological Methods & Research, 18(2–3): 200–242. https://doi.org/10.1177/0049124189018002002
Liu S, Maljovec D, Wang B, Bremer PT, Pascucci V (2016). Visualizing high-dimensional data: Advances in the past decade. IEEE Transactions on Visualization and Computer Graphics, 23(3): 1249–1268. https://doi.org/10.1109/TVCG.2016.2640960
Loy A, Hofmann H, Cook D (2017). Model choice and diagnostics for linear mixed-effects models using statistics on street corners. Journal of Computational and Graphical Statistics, 26(3): 478–492. https://doi.org/10.1080/10618600.2017.1330207
Majumder M, Hofmann H, Cook D (2014). Human factors influencing visual statistical inference. arXiv preprint: https://arxiv.org/abs/1408.1974.
Reda K, Szafir DA (2021). Rainbows revisited: Modeling effective colormap design for graphical inference. IEEE Transactions on Visualization and Computer Graphics, 27(2): 1032–1042. https://doi.org/10.1109/TVCG.2020.3030439
Rutter H, Parker S, Stahl-Timmins W, Noakes C, Smyth A, …, Freeman AL (2021). Visualising SARS-CoV-2 transmission routes and mitigations. BMJ, 375. e065312. https://doi.org/10.1136/bmj-2021-065312
Vanderplas S, Cook D, Hofmann H (2020). Testing statistical charts: What makes a good graph? Annual Review of Statistics and Its Application, 7(1): 61–88. https://doi.org/10.1146/annurev-statistics-031219-041252
VanderPlas S, Hofmann H (2017). Clusters beat trend!? Testing feature hierarchy in statistical graphics. Journal of Computational and Graphical Statistics, 26(2): 231–242. https://doi.org/10.1080/10618600.2016.1209116
Weissgerber TL, Garovic VD, Savic M, Winham SJ, Milic NM (2016). From static to interactive: Transforming data visualization to improve transparency. PLoS Biology, 14(6): e1002484. https://doi.org/10.1371/journal.pbio.1002484
Zeileis A, Hornik K, Murrell P (2009). Escaping RGBland: Selecting colors for statistical graphics. Computational Statistics & Data Analysis, 53(9): 3259–3270. https://doi.org/10.1016/j.csda.2008.11.033