Pub. online:28 Oct 2025Type:Data Science ConversationOpen Access
Journal:Journal of Data Science
Volume 23, Issue 4 (2025): Special Issue: Statistical Frontiers of Data Science, pp. 695–715
Abstract
Over the past three decades, the discipline of statistics has undergone profound transformation, driven by the rapid emergence of data science and artificial intelligence. These developments have reshaped methodological paradigms and introduced new challenges and opportunities for statistical education, particularly in China. In this context, Professor Xizhi Wu from the School of Statistics at Renmin University of China has remained closely engaged with the evolving landscape, demonstrating keen insight and a forward-looking perspective. Through sustained contributions to teaching, research, and educational reform, Professor Wu has deeply influenced generations of students and educators, playing a pivotal role in the advancement of statistical education. To document and reflect on this legacy, the Capital of Statistics conducted an in-depth interview with Professor Wu, focusing on his academic trajectory, professional contributions, and perspectives on the future of the discipline. The conversation also recounts meaningful interactions with his students, offering a multidimensional portrait of a life devoted to statistics.
Pub. online:12 Jun 2025Type:Data Science In ActionOpen Access
Journal:Journal of Data Science
Volume 24, Issue 1 (2026): Special Issue: Statistical aspects of Trustworthy Machine Learning, pp. 239–253
Abstract
A challenge that data scientists face is building an analytic product that is useful and trustworthy for a given audience. Previously, a set of principles for describing data analyses were defined that can be used to create a data analysis and to characterize the variation between analyses. Here, we introduce a concept called the alignment of a data analysis, which is between the data analyst and an audience. We define an aligned data analysis as the matching of principles between the analyst and the audience for whom the analysis is developed. In this paper, we propose a model for evaluating the alignment of a data analysis and describe some of its properties. We argue that more generally, this framework provides a language for characterizing alignment and can be used as a guide for practicing data scientists to building better data products.
The ultrasonic testing has been considered a promising method for diagnosing and characterizing masonry walls. As ultrasonic waves tend to travel faster in denser materials, their use is common in evaluating the conditions of various materials. Presence of internal voids, e.g., would alter the wave path, and this distinct behavior could be employed to identify unknown conditions within the material, allowing for the assessment of its condition. Therefore, we applied mixed models and Gaussian processes to analyze the behavior of ultrasonic waves on masonry walls and identify relevant factors impacting their propagation. We observed that the average propagation time behavior differs depending on the material for both models. Additionally, the condition of the wall influences the propagation time. Gaussian process and mixed model performances are compared, and we conclude that these models can be useful in a classification model to automatically identify anomalies within masonry walls.
In 2022 the American Statistical Association established the Riffenburgh Award, which recognizes exceptional innovation in extending statistical methods across diverse fields. Simultaneously, the Department of Statistics at the University of Connecticut proudly commemorated six decades of excellence, having evolved into a preeminent hub for academic, industrial, and governmental statistical grooming. To honor this legacy, a captivating virtual dialogue was conducted with the department’s visionary founder, Dr. Robert H. Riffenburgh, delving into his extraordinary career trajectory, profound insights into the statistical vocation, and heartfelt accounts from the faculty and students he personally nurtured. This multifaceted narrative documents the conversation with more detailed background information on each topic covered by the interview than what is presented in the video recording on YouTube.
Pub. online:22 Feb 2021Type:Data Science In ActionOpen Access
Journal:Journal of Data Science
Volume 19, Issue 2 (2021): Special issue: Continued Data Science Contributions to COVID-19 Pandemic, pp. 334–347
Abstract
Coronavirus and the COVID-19 pandemic have substantially altered the ways in which people learn, interact, and discover information. In the absence of everyday in-person interaction, how do people self-educate while living in isolation during such times? More specifically, do communities emerge in Google search trends related to coronavirus? Using a suite of network and community detection algorithms, we scrape and mine all Google search trends in America related to an initial search for “coronavirus,” starting with the first Google search on the term (January 16, 2020) to recently (August 11, 2020). Results indicate a near-constant shift in the structure of how people educate themselves on coronavirus. Queries in the earliest days focusing on “Wuhan” and “China”, then shift to “stimulus checks” at the height of the virus in the U.S., and finally shift to queries related to local surges of new cases in later days. A few communities emerge surrounding terms more overtly related to coronavirus (e.g., “cases”, “symptoms”, etc.). Yet, given the shift in related Google queries and the broader information environment, clear community structure for the full search space does not emerge.