Journal of Data Science logo


Login Register

  1. Home
  2. To appear
  3. Quantifying the Alignment of a Data Anal ...

Journal of Data Science

Submit your article Information
  • Article info
  • Related articles
  • More
    Article info Related articles

Quantifying the Alignment of a Data Analysis Between Analyst and Audience
Lucy D’Agostino McGowan   Roger D. Peng   Stephanie C. Hicks  

Authors

 
Placeholder
https://doi.org/10.6339/25-JDS1189
Pub. online: 12 June 2025      Type: Education In Data Science      Open accessOpen Access

Received
26 November 2024
Accepted
22 May 2025
Published
12 June 2025

Abstract

A challenge that data scientists face is building an analytic product that is useful and trustworthy for a given audience. Previously, a set of principles for describing data analyses were defined that can be used to create a data analysis and to characterize the variation between analyses. Here, we introduce a concept called the alignment of a data analysis, which is between the data analyst and an audience. We define an aligned data analysis as the matching of principles between the analyst and the audience for whom the analysis is developed. In this paper, we propose a model for evaluating the alignment of a data analysis and describe some of its properties. We argue that more generally, this framework provides a language for characterizing alignment and can be used as a guide for practicing data scientists to building better data products.

Supplementary material

 Supplementary Material
In the supplementary materials we provide the lecture slides used for the case study and the code and data used for the analysis in Section 4.

References

 
Artino Jr AR, Driessen EW, Maggio LA (2019). Ethical shades of gray: International frequency of scientific misconduct and questionable research practices in health professions education. Academic Medicine, 94(1): 76–84. https://doi.org/10.1097/ACM.0000000000002412
 
Baggerly KA, Coombes KR (2009). Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology. Annals of Applied Statistics, 3(4): 1309–1334.
 
Broderick T, Gelman A, Meager R, Smith AL, Zheng T (2023). Toward a taxonomy of trust for probabilistic machine learning. Science Advances, 9(7): eabn3999. https://doi.org/10.1126/sciadv.abn3999
 
Cabrera J, McDougall A (2013). Statistical Consulting. Springer Science & Business Media.
 
Coiera E, Ammenwerth E, Georgiou A, Magrabi F (2018). Does health informatics have a replication crisis? Journal of the American Medical Informatics Association, 25(8): 963–968. https://doi.org/10.1093/jamia/ocy028
 
Coiera E, Tong HL (2021). Replication studies in the clinical decision support literature–frequency, fidelity, and impact. Journal of the American Medical Informatics Association, 28(9): 1815–1825. https://doi.org/10.1093/jamia/ocab049
 
Cross N (2011). Design Thinking: Understanding How Designers Think and Work. Berg.
 
D’Agostino McGowan L (2019). tidycode: Analyze Lines of R Code the Tidy Way. R package version 0.1.1.
 
D’Agostino McGowan L, Peng RD, Hicks SC (2022). Design principles for data analysis. Journal of Computational and Graphical Statistics, 32(2): 754–761.
 
Dreber A, Johannesson M (2019). Statistical significance and the replication crisis in the social sciences. In: Oxford Research Encyclopedia of Economics and Finance. Oxford University Press.
 
Edwards MA, Roy S (2017). Academic research in the 21st century: Maintaining scientific integrity in a climate of perverse incentives and hypercompetition. Environmental Engineering Science, 34(1): 51–61. https://doi.org/10.1089/ees.2016.0223
 
Franco A, Malhotra N, Simonovits G (2014). Publication bias in the social sciences: Unlocking the file drawer. Science, 345(6203): 1502–1505. https://doi.org/10.1126/science.1255484
 
Gelman A, Loken E (2014). The statistical crisis in science. American Scientist, 102(6): 460–465. https://doi.org/10.1511/2014.111.460
 
Gigerenzer G (2018). Statistical rituals: The replication delusion and how we got there. Advances in Methods and Practices in Psychological Science, 1(2): 198–218. https://doi.org/10.1177/2515245918771329
 
Hand DJ (2022). Trustworthiness of statistical inference. Journal of the Royal Statistical Society. Series A. Statistics in Society, 185(1): 329–347. https://doi.org/10.1111/rssa.12752
 
Hand DJ, Everitt BS, Everitt B (2007). The Statistical Consultant in Action. Cambridge University Press.
 
Hicks SC, Peng RD (2019). Elements and principles of data analysis. arXiv preprint, 1–13.
 
Kimball A (1957). Errors of the third kind in statistical consulting. Journal of the American Statistical Association, 52(278): 133–142. https://doi.org/10.1080/01621459.1957.10501374
 
Maimone C, Sharp JL, Schwartz-Soicher O, Oliver JC, Beltran L (2024). Do good: Strategies for leading an inclusive data science or statistics consulting team. Statistica, 13(2): e687. https://doi.org/10.1002/sta4.687
 
Mira A, Wit E (2021). The capstone in everyone’s delivery room: Placing ‘practice’at the center of data science education. Harvard Data Science Review, 3(1). https://doi.org/10.1162/99608f92.539432b5
 
Moonesinghe R, Khoury MJ, Janssens ACJW (2007). Most published research findings are false—but a little replication goes a long way. PLoS Medicine, 4(2): e28. https://doi.org/10.1371/journal.pmed.0040028
 
Nosek BA, Ebersole CR, DeHaven AC, Mellor DT (2018). The preregistration revolution. Proceedings of the National Academy of Sciences of the United States of America, 115(11): 2600–2606. https://doi.org/10.1073/pnas.1708274114
 
Nosek BA, Spies JR, Motyl M (2012). Scientific utopia: Ii. restructuring incentives and practices to promote truth over publishability. Perspectives on Psychological Science, 7(6): 615–631. https://doi.org/10.1177/1745691612459058
 
Open Science Collaboration (2015). Estimating the reproducibility of psychological science. Science, 349(6251). aac4716.
 
Parker H (2017). Opinionated analysis development. PeerJ Preprints, 5: e3210v1.
 
Peng RD (2011). Reproducible research in computational science. Science, 334(6060): 1226–1227. https://doi.org/10.1126/science.1213847
 
Rubio DM, Del Junco DJ, Bhore R, Lindsell CJ, Oster RA, Wittkowski KM, et al. (2011). Evaluation metrics for biostatistical and epidemiological collaborations. Statistics in Medicine, 30(23): 2767–2777. https://doi.org/10.1002/sim.4184
 
Schirm A, Lazar N, et al. (2019). Moving to a world beyond “$p\lt 0.05$”. American Statistician, 73(sup1): 1–19. https://doi.org/10.1080/00031305.2019.1583913
 
Silberzahn R, Uhlmann EL, Martin DP, Anselmi P, Aust F, Awtrey E, et al. (2018). Many analysts, one data set: Making transparent how variations in analytic choices affect results. Advances in Methods and Practices in Psychological Science, 1(3): 337–356. https://doi.org/10.1177/2515245917747646
 
Simmons JP, Nelson LD, Simonsohn U (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11): 1359–1366. https://doi.org/10.1177/0956797611417632
 
Tukey JW (1962). The future of data analysis. The Annals of Mathematical Statistics, 33(1): 1–67. https://doi.org/10.1214/aoms/1177704711
 
Tukey W, Wilk MB (1966). Data analysis and statistics: An expository overview. In: Proceedings of the November 7–10, 1966, Fall Joint Computer Conference, 695–709.
 
Valentine JC, Biglan A, Boruch RF, Castro FG, Collins LM, Flay BR, et al. (2011). Replication in prevention science. Prevention Science, 12: 103–117. https://doi.org/10.1007/s11121-011-0217-6
 
Van Aert RC, Wicherts JM, Van Assen MA (2019). Publication bias examined in meta-analyses from psychology and medicine: A meta-meta-analysis. PLoS ONE, 14(4): e0215052. https://doi.org/10.1371/journal.pone.0215052
 
Wen H, Wang HY, He X, Wu CI (2018). On the low reproducibility of cancer studies. National Science Review, 5(5): 619–624. https://doi.org/10.1093/nsr/nwy021
 
Wild CJ, Pfannkuch M (1999). Statistical thinking in empirical enquiry. International Statistical Review, 67(3): 223–248. https://doi.org/10.1111/j.1751-5823.1999.tb00442.x
 
Yu B, Barter RL (2024). Veridical Data Science: The Practice of Responsible Data Analysis and Decision Making. MIT Press.
 
Yu B, Kumbier K (2020). Veridical data science. Proceedings of the National Academy of Sciences of the United States of America, 117(8): 3920–3929. https://doi.org/10.1073/pnas.1901326117

Related articles PDF XML
Related articles PDF XML

Copyright
2025 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
analytic design theory data science evaluation

Funding
The authors do not have any funding to acknowledge.

Metrics
since February 2021
16

Article info
views

5

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy