Dr. David S. Salsburg’s career has been an exceptional one. He was the first statistician to work in Pfizer, Inc., and later became the first statistician from the pharmaceutical industry to be elected as an ASA fellow. He played a vital role as a statistician in Pfizer, Inc. at a time when the drug approval process was developed. For his contributions, Dr. Salsburg was awarded the Career Achievement Award of the Biostatistics Section of the Pharmaceutical Research and Manufacturers of America in 1994, for “significant contributions to the advancement of biostatistics in the pharmaceutical industry”. Dr. Salsburg also managed to achieve something rare among scientists, which is to popularize his field of research and make it accessible and enjoyable to laypeople. Dr. Salsburg is possibly best known for his book “The Lady Tasting Tea – How Statistics Revolutionized the 20th Century Science”, in which he combines simple and engaging explanations of statistical methods, and why they are needed, along with personal stories told with a great deal of generosity, fondness, and humor about the people who developed them. Dr. Salsburg’s admiration for the those statisticians shines through. In this interview, Dr. Salsburg shares his own stories and perspectives, from his childhood, through his service in the Navy and his long and productive career in Pfizer, Inc. to his equally productive retirement, in which he authored “The Lady Tasting Tea” and other books.
The last decade has seen a vast increase of the abundance of data, fuelling the need for data analytic tools that can keep up with the data size and complexity. This has changed the way we analyze data: moving from away from single data analysts working on their individual computers, to large clusters and distributed systems leveraged by dozens of data scientists. Technological advances have been addressing the scalability aspects, however, the resulting complexity necessitates that more people are involved in a data analysis than before. Collaboration and leveraging of other’s work becomes crucial in the modern, interconnected world of data science. In this article we propose and describe an open-source, web-based, collaborative visualization and data analysis platform RCloud. It de-couples the user from the location of the data analysis while preserving security, interactivity and visualization capabilities. Its collaborative features enable data scientists to explore, work together and share analyses in a seamless fashion. We describe the concepts and design decisions that enabled it to support large data science teams in the industry and academia.
The ultrasonic testing has been considered a promising method for diagnosing and characterizing masonry walls. As ultrasonic waves tend to travel faster in denser materials, their use is common in evaluating the conditions of various materials. Presence of internal voids, e.g., would alter the wave path, and this distinct behavior could be employed to identify unknown conditions within the material, allowing for the assessment of its condition. Therefore, we applied mixed models and Gaussian processes to analyze the behavior of ultrasonic waves on masonry walls and identify relevant factors impacting their propagation. We observed that the average propagation time behavior differs depending on the material for both models. Additionally, the condition of the wall influences the propagation time. Gaussian process and mixed model performances are compared, and we conclude that these models can be useful in a classification model to automatically identify anomalies within masonry walls.
Pub. online:22 Feb 2021Type:Data Science In Action
Journal:Journal of Data Science
Volume 19, Issue 2 (2021): Special issue: Continued Data Science Contributions to COVID-19 Pandemic, pp. 334–347
Abstract
Coronavirus and the COVID-19 pandemic have substantially altered the ways in which people learn, interact, and discover information. In the absence of everyday in-person interaction, how do people self-educate while living in isolation during such times? More specifically, do communities emerge in Google search trends related to coronavirus? Using a suite of network and community detection algorithms, we scrape and mine all Google search trends in America related to an initial search for “coronavirus,” starting with the first Google search on the term (January 16, 2020) to recently (August 11, 2020). Results indicate a near-constant shift in the structure of how people educate themselves on coronavirus. Queries in the earliest days focusing on “Wuhan” and “China”, then shift to “stimulus checks” at the height of the virus in the U.S., and finally shift to queries related to local surges of new cases in later days. A few communities emerge surrounding terms more overtly related to coronavirus (e.g., “cases”, “symptoms”, etc.). Yet, given the shift in related Google queries and the broader information environment, clear community structure for the full search space does not emerge.