Journal of Data Science logo


Login Register

  1. Home
  2. To appear
  3. Clusters, Trends, and Choices: Feature S ...

Journal of Data Science

Submit your article Information
  • Article info
  • Related articles
  • More
    Article info Related articles

Clusters, Trends, and Choices: Feature Selection in Interactive Statistical Graphics
Dylan Le   Rachel Rogers   Emily Robinson  

Authors

 
Placeholder
https://doi.org/10.6339/26-JDS1221
Pub. online: 26 March 2026      Type: Statistical Data Science      Open accessOpen Access

Received
15 August 2025
Accepted
21 February 2026
Published
26 March 2026

Abstract

This study investigates how user ability to manipulate plot features affects graphical perception, by extending a previous graphical study (Vanderplas and Hofmann, 2017) with an interactive framework. Similar to the original study, statistical lineups included two target patterns (a linear trend and a clustering pattern), as well as eighteen null plots generated from three different mixture proportions of the combined cluster and trend models. Participants were asked to select two plots that they perceived as ‘most different’, and were able to interact with the graphics by toggling aesthetic features such as cluster coloring, cluster ellipses, linear trendlines, and regression error bands.
We found that toggle workflow varied across participants, revealing a divide between “maximalists,” who enabled all features, and “minimalists,” who used few or none, with most toggling occurring before the first selection. Starting features aesthetics did not have a significant effect on target choice. A generalized linear mixed model identified mixture proportion as the strongest predictor of target selection, with additional interactions involving the enabled ending features. These findings contribute to understanding how users engage with interactive graphical tools and how such tools support data interpretation in exploratory data analysis.

Supplementary material

 Supplementary Material
• Shiny App Code: The code used to replicate the study Shiny app can be accessed at https://github.com/earobinson95/interactive-lineup-study-applet. • accuracy_clean.csv: De-identified participant accuracy data collected in the study and used for accuracy analyses. • toggles_clean.csv: De-identified participant toggle moves data collected in the study and used for understanding toggle workflow analyses. • analysis.qmd The code used to replicate the analyses presented in this paper.

References

 
Bates D, Mächler M, Bolker B, Walker S (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1): 1–48. https://doi.org/10.18637/jss.v067.i01
 
Buja A, Cook D, Hofmann H, Lawrence M, Lee EK, …, Wickham H (2009). Statistical inference for exploratory data analysis and model diagnostics. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 367(1906): 4361–4383. https://doi.org/10.1098/rsta.2009.0120
 
Chang W (2024). Shiny - Shiny Assistant.
 
Chang W, Cheng J, Allaire J, Sievert C, Schloerke B, …, Borges B (2024). shiny: Web Application Framework for R. R package version 1.9.1.
 
Chowdhury NR, Cook D, Hofmann H, Majumder M (2018). Measuring lineup difficulty by matching distance metrics with subject choices in crowd-sourced data. Journal of Computational and Graphical Statistics, 27(1): 132–145. https://doi.org/10.1080/10618600.2017.1356323
 
Cleveland WS, McGill R (1987). Graphical perception: The visual decoding of quantitative information on graphical displays of data. Journal of the Royal Statistical Society: Series A (General), 150(3): 192–210. https://doi.org/10.2307/2981473
 
Correll M, Li M, Kindlmann G, Scheidegger C (2019). Looks good to me: Visualizations as sanity checks. IEEE Transactions on Visualization and Computer Graphics, 25(1): 830–839. https://doi.org/10.1109/TVCG.2018.2864907
 
Glicksohn A, Cohen A (2011). The role of gestalt grouping principles in visual statistical learning. Attention, Perception, & Psychophysics, 73(3): 708–713. https://doi.org/10.3758/s13414-010-0084-4
 
Hartig F (2024). DHARMa: Residual Diagnostics for Hierarchical (Multi-Level / Mixed) Regression Models. R package version 0.4.7.
 
Hofmann H, Follett L, Majumder M, Cook D (2012). Graphical tests for power comparison of competing designs. IEEE Transactions on Visualization and Computer Graphics, 18(12): 2441–2448. Conference Name: IEEE Transactions on Visualization and Computer Graphics. https://doi.org/10.1109/TVCG.2012.230
 
Hullman J, Gelman A (2021). Designing for interactive exploratory data analysis requires theories of graphical inference. Harvard Data Science Review, 3(3): 10–1162. https://doi.org/10.1162/99608f92.3ab8a587
 
Komorowski M, Marshall D, Salciccioli J, Crutain Y (2016). Exploratory Data Analysis. Chapter 15 in Secondary Analysis of Electronic Health Records. Springer, Cham. https://doi.org/10.1007/978-3-319-43742-2_15
 
Lewandowsky S, Spence I (1989). The perception of statistical graphs. Sociological Methods & Research, 18(2–3): 200–242. https://doi.org/10.1177/0049124189018002002
 
Li NT, Brossard D, Scheufele DA, Wilson PH, Rose KM (2018). Communicating data: Interactive infographics, scientific data and credibility. Journal of Science Communication, 17. A06.
 
Liu S, Maljovec D, Wang B, Bremer PT, Pascucci V (2016). Visualizing high-dimensional data: Advances in the past decade. IEEE Transactions on Visualization and Computer Graphics, 23(3): 1249–1268. https://doi.org/10.1109/TVCG.2016.2640960
 
Loy A, Hofmann H, Cook D (2017). Model choice and diagnostics for linear mixed-effects models using statistics on street corners. Journal of Computational and Graphical Statistics, 26(3): 478–492. https://doi.org/10.1080/10618600.2017.1330207
 
Majumder M, Hofmann H, Cook D (2014). Human factors influencing visual statistical inference. arXiv preprint: https://arxiv.org/abs/1408.1974.
 
R Core Team (2024). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
 
Reda K, Szafir DA (2021). Rainbows revisited: Modeling effective colormap design for graphical inference. IEEE Transactions on Visualization and Computer Graphics, 27(2): 1032–1042. https://doi.org/10.1109/TVCG.2020.3030439
 
Rutter H, Parker S, Stahl-Timmins W, Noakes C, Smyth A, …, Freeman AL (2021). Visualising SARS-CoV-2 transmission routes and mitigations. BMJ, 375. e065312. https://doi.org/10.1136/bmj-2021-065312
 
SAS Institute Inc (2023). JMP®, Version 18.0. SAS Institute Inc., Cary, NC. Computer software.
 
Shah P, Miyake A (2005). The Cambridge Handbook of Visuospatial Thinking. Cambridge University Press.
 
Spence I (1990). Visual psychophysics of simple graphical elements. Journal of Experimental Psychology: Human Perception and Performance, 16(4): 683–692.
 
Swayne DF, Buja A (2004). Exploratory visual analysis of graphs in GGOBI. In: COMPSTAT 2004—Proceedings in Computational Statistics: 16th Symposium Held in Prague, Czech Republic. 2004, 477–488. Springer.
 
Vanderplas S, Cook D, Hofmann H (2020). Testing statistical charts: What makes a good graph? Annual Review of Statistics and Its Application, 7(1): 61–88. https://doi.org/10.1146/annurev-statistics-031219-041252
 
VanderPlas S, Hofmann H (2017). Clusters beat trend!? Testing feature hierarchy in statistical graphics. Journal of Computational and Graphical Statistics, 26(2): 231–242. https://doi.org/10.1080/10618600.2016.1209116
 
Ward M, Grinstein GG, Keim D (2021). Interactive Data Visualization: Foundations, Techniques, and Applications. CRC Press.
 
Weissgerber TL, Garovic VD, Savic M, Winham SJ, Milic NM (2016). From static to interactive: Transforming data visualization to improve transparency. PLoS Biology, 14(6): e1002484. https://doi.org/10.1371/journal.pbio.1002484
 
Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag, New York.
 
Zeileis A, Hornik K, Murrell P (2009). Escaping RGBland: Selecting colors for statistical graphics. Computational Statistics & Data Analysis, 53(9): 3259–3270. https://doi.org/10.1016/j.csda.2008.11.033

Related articles PDF XML
Related articles PDF XML

Copyright
2026 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
Exploratory Data Analysis Statistical Lineups Visual Perception

Metrics
since February 2021
53

Article info
views

38

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy