Pub. online:29 Apr 2024Type:Education In Data ScienceOpen Access
Journal:Journal of Data Science
Volume 22, Issue 2 (2024): Special Issue: 2023 Symposium on Data Science and Statistics (SDSS): “Inquire, Investigate, Implement, Innovate”, pp. 333–351
Abstract
By its nature, data science uses ideas and methodologies from computer science and statistics, along with field-specific knowledge, to describe, learn and predict. Recently, storytelling has been highlighted as an important extension of more traditional data science skills such as coding and modeling. Three courses in our new Master in Data Science and Analytic Storytelling program were designed to include interdisciplinary modules, mainly taught by faculty in storytelling-related disciplines, such as Communication and Art & Design. These courses were PDAT 622: Narrative, Argument, and Persuasion in Data Science; PDAT 624: Principles of Design in Data Visualization; and PDAT 625: Big Data Ethics and Security.
Our first cohort serves as a natural case study, allowing us to reflectively analyze our materials and an informal student survey to explore the effects of interdisciplinarity in these novel courses. Results of the student survey show that students generally found value in these interdisciplinary course components, especially in course “signature assignments,” which allow students to actively engage with course content while reinforcing technical skills from previous courses. Examples of these signature assignments are presented in this paper’s supplementary materials.
Pub. online:4 Aug 2022Type:Research ArticleOpen Access
Journal:Journal of Data Science
Volume 18, Issue 5 (2020): Special Issue S1 in Chinese (with abstract in English), pp. 875–888
Abstract
In the wake of the COVID-19 outbreak, the public resorted to Sina Weibo as a major platform for the trend of the pandemic. Research on public sentiment and topic mining of major public sentiment events based on Sina Weibo’s comment data is important for understanding the trend of public opinions during major epidemic outbreaks. Based on classification of the Chinese language into emotion categories in psychology, we use open source tools to build naive Bayesian models to classify Weibo comments. Visualization of comment topics is achieved with word co-occurrence network methods. Commented topics are mined with the help of the latent Dirichlet distribution model. The results show that the psychological sentiment classification combined with the naive Bayesian model can reflect the evolvement of public sentiment during the epidemic, and that the latent Dirichlet distribution model and word co-occurrence network can effectively mine the topics of public concerns.
Abstract: Many nations’ defence departments use capabilitybased planning to guide their investment and divestment decisions. This planning process involves a variety of data that in its raw form is difficult for decisionmakers to use. In this paper we describe how dimensionality reduction and partition clustering are used in the Canadian Armed Forces to create visualizations that convey how important military capabilities are in planning scenarios and how much capacity the planned force structure has to provide the capabilities. Together, these visualizations give decisionmakers an overview of which capabilities may require investment or may be candidates for divestment.
Pub. online:19 Apr 2022Type:Statistical Data ScienceOpen Access
Journal:Journal of Data Science
Volume 21, Issue 3 (2023): Special Issue: Advances in Network Data Science, pp. 470–489
Abstract
Networks are ubiquitous in today’s world. Community structure is a well-known feature of many empirical networks, and a lot of statistical methods have been developed for community detection. In this paper, we consider the problem of community extraction in text networks, which is greatly relevant in medical errors and patient safety databases. We adapt a well-known community extraction method to develop a scalable algorithm for extracting groups of similar documents in large text databases. The application of our method on a real-world patient safety report system demonstrates that the groups generated from community extraction are much more accurate than manual tagging by frontline workers.