A challenge that data scientists face is building an analytic product that is useful and trustworthy for a given audience. Previously, a set of principles for describing data analyses were defined that can be used to create a data analysis and to characterize the variation between analyses. Here, we introduce a concept called the alignment of a data analysis, which is between the data analyst and an audience. We define an aligned data analysis as the matching of principles between the analyst and the audience for whom the analysis is developed. In this paper, we propose a model for evaluating the alignment of a data analysis and describe some of its properties. We argue that more generally, this framework provides a language for characterizing alignment and can be used as a guide for practicing data scientists to building better data products.
Dr. David S. Salsburg’s career has been an exceptional one. He was the first statistician to work in Pfizer, Inc., and later became the first statistician from the pharmaceutical industry to be elected as an ASA fellow. He played a vital role as a statistician in Pfizer, Inc. at a time when the drug approval process was developed. For his contributions, Dr. Salsburg was awarded the Career Achievement Award of the Biostatistics Section of the Pharmaceutical Research and Manufacturers of America in 1994, for “significant contributions to the advancement of biostatistics in the pharmaceutical industry”. Dr. Salsburg also managed to achieve something rare among scientists, which is to popularize his field of research and make it accessible and enjoyable to laypeople. Dr. Salsburg is possibly best known for his book “The Lady Tasting Tea – How Statistics Revolutionized the 20th Century Science”, in which he combines simple and engaging explanations of statistical methods, and why they are needed, along with personal stories told with a great deal of generosity, fondness, and humor about the people who developed them. Dr. Salsburg’s admiration for the those statisticians shines through. In this interview, Dr. Salsburg shares his own stories and perspectives, from his childhood, through his service in the Navy and his long and productive career in Pfizer, Inc. to his equally productive retirement, in which he authored “The Lady Tasting Tea” and other books.
Pub. online:23 Jul 2024Type:Data Science In ActionOpen Access
Journal:Journal of Data Science
Volume 22, Issue 3 (2024): Special issue: The Government Advances in Statistical Programming (GASP) 2023 conference, pp. 393–408
Abstract
The coronavirus disease 2019 (COVID-19) pandemic presented unique challenges to the U.S. healthcare system, particularly for nonprofit U.S. hospitals that are obligated to provide community benefits in exchange for federal tax exemptions. We sought to examine how hospitals initiated, modified, or disbanded community benefits programming in response to the COVID-19 pandemic. We used the free-response text in Part IV of Internal Revenue Service (IRS) Form 990 Schedule H (F990H) to assess health equity and disparities. We combined traditional key term frequency and Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) clustering approaches with a novel Generative Pre-trained Transformer (GPT) 3.5 summarization approach. Our research reveals shifts in community benefits programming. We observed an increase in COVID-related terms starting in the 2019 tax year, indicating a pivot in community focus and efforts toward pandemic-related activities such as telehealth services and COVID-19 testing and prevention. The clustering analysis identified themes related to COVID-19 and community benefits. Generative Artificial Intelligence (GenAI) summarization with GPT3.5 contextualized these changes, revealing examples of healthcare system adaptations and program cancellations. However, GPT3.5 also encountered some accuracy and validation challenges. This multifaceted text analysis underscores the adaptability of hospitals in maintaining community health support during crises and suggests the potential of advanced AI tools in evaluating large-scale qualitative data for policy and public health research.
In 2022 the American Statistical Association established the Riffenburgh Award, which recognizes exceptional innovation in extending statistical methods across diverse fields. Simultaneously, the Department of Statistics at the University of Connecticut proudly commemorated six decades of excellence, having evolved into a preeminent hub for academic, industrial, and governmental statistical grooming. To honor this legacy, a captivating virtual dialogue was conducted with the department’s visionary founder, Dr. Robert H. Riffenburgh, delving into his extraordinary career trajectory, profound insights into the statistical vocation, and heartfelt accounts from the faculty and students he personally nurtured. This multifaceted narrative documents the conversation with more detailed background information on each topic covered by the interview than what is presented in the video recording on YouTube.
Law and legal studies has been an exciting new field for data science applications whereas the technological advancement also has profound implications for legal practice. For example, the legal industry has accumulated a rich body of high quality texts, images and other digitised formats, which are ready to be further processed and analysed by data scientists. On the other hand, the increasing popularity of data science has been a genuine challenge to legal practitioners, regulators and even general public and has motivated a long-lasting debate in the academia focusing on issues such as privacy protection and algorithmic discrimination. This paper collects 1236 journal articles involving both law and data science from the platform Web of Science to understand the patterns and trends of this interdisciplinary research field in terms of English journal publications. We find a clear trend of increasing publication volume over time and a strong presence of high-impact law and political science journals. We then use the Latent Dirichlet Allocation (LDA) as a topic modelling method to classify the abstracts into four topics based on the coherence measure. The four topics identified confirm that both challenges and opportunities have been investigated in this interdisciplinary field and help offer directions for future research.