Pub. online:4 Aug 2022Type:Research ArticleOpen Access
Journal:Journal of Data Science
Volume 18, Issue 5 (2020): Special Issue S1 in Chinese (with abstract in English), pp. 849–859
Abstract
Millions of people travel from Wuhan to other cities from Jan. 1st 2020 to Jan 23rd 2020. Taking advantage of the masked software development kit data from Aurora Mobile Ltd and open epidemic data released by health authorities, we analyze the relationship between number of confirmed COVID-19 cases in a region and the people who traveled from Wuhan to this region in this period. Further, we identify high risk carriers of COVID-19 to improve the control of COVID-19. The key findings are three-folds: (1) in each region the number of high-risk carriers is highly positively correlated with the severity of illness; (2) history of visit to the 62 designated hospitals is the foremost index of risk; (3) the second most important index is the travelers’ duration of stay in Wuhan. Based on our analysis, we estimate that, as of February 4, 2020, (a) among the 8.5 million people held up in Wuhan, there are 425 thousand high risk carriers; and (b) among all the 3.5 million migrant workers held up in Hubei, there are 175 thousand high risk carriers. The disease control authorities should closely minotor these groups.
Pub. online:4 Aug 2022Type:Research ArticleOpen Access
Journal:Journal of Data Science
Volume 18, Issue 5 (2020): Special Issue S1 in Chinese (with abstract in English), pp. 875–888
Abstract
In the wake of the COVID-19 outbreak, the public resorted to Sina Weibo as a major platform for the trend of the pandemic. Research on public sentiment and topic mining of major public sentiment events based on Sina Weibo’s comment data is important for understanding the trend of public opinions during major epidemic outbreaks. Based on classification of the Chinese language into emotion categories in psychology, we use open source tools to build naive Bayesian models to classify Weibo comments. Visualization of comment topics is achieved with word co-occurrence network methods. Commented topics are mined with the help of the latent Dirichlet distribution model. The results show that the psychological sentiment classification combined with the naive Bayesian model can reflect the evolvement of public sentiment during the epidemic, and that the latent Dirichlet distribution model and word co-occurrence network can effectively mine the topics of public concerns.
Pub. online:4 Aug 2022Type:Research ArticleOpen Access
Journal:Journal of Data Science
Volume 18, Issue 5 (2020): Special Issue S1 in Chinese (with abstract in English), pp. 875–888
Abstract
To surveil the development of COVID-19 is a complex and challenging issue. The foundation of such surveillance is timely and accurate epidemic data. Therefore, quality control for releasing COVID-19 data is very important, accounting for the releasing agent, the content to release, and the impact of the released data. We suggest that the quality requirements for the release of COVID-19 data be based on the global perspective that the goal of open epidemic data is to create a valuable ecological chain in which all stakeholders are involved. As such, the collection, aggregation, and release process of the COVID-19 data should meet not only the data quality standards of official statistics and health statistics, but also the characteristics of the epidemic statistics and the needs of pandemic prevention. The quality requirements should follow the unique characteristics of the epidemic and be scrutinized by the public. We integrate the perspectives of official statistics, health statistics, and open government data, proposing five quality dimensions for releasing COVID-19 data: accuracy, timeliness, systematicness, userfriendliness and security. Through case studies on the official websites of Chinese provincial health commission, we report the quality problems in the current data releasing process and suggest improvements.
Pub. online:4 Aug 2022Type:Research ArticleOpen Access
Journal:Journal of Data Science
Volume 18, Issue 5 (2020): Special Issue S1 in Chinese (with abstract in English), pp. 889–906
Abstract
The new coronavirus disease (COVID-19), as a new infectious disease, has relatively strong ability to spread from person to person. This paper studies several meteorological factors and air quality indicators between Shenzhen and Wenzhou, China, and conducts modelling analysis on whether the transmission of COVID-19 is affected by atmosphere. A comparative assessment is made on the characteristics of meteorological factors and air quality in these two typical cities in China and their impacts on the spread of COVID-19. The article uses meteorological data and air quality data, including 7 variables: daily average temperature, daily average relative humidity, daily average wind speed, nitrogen dioxide (NO2), atmospheric fine particulate matter (PM2.5), carbon monoxide (CO) and ozone (O3), a distributed lag non-linear model (DLNM) is constructed to explore the correlation between atmospheric conditions and non-imported confirmed cases of COVID-19, and the relative risk is introduced to measure the lagging effects of meteorological factors and air pollution on the number of non-imported confirmed cases. Our finding indicates that there is significant differences in the relationship between 7 predictors and the transmission of COVID-19 in Shenzhen and Wenzhou. However, all predictors between the two cities have a non-linear relationship with the number of non-imported confirmed cases. The lower daily average temperature has increased the risk of epidemic transmission in the two cities. As the temperature rises, the risk of epidemic transmission in both cities will significantly decrease. The average daily relative humidity has no significant effects on the epidemic situation in Shenzhen, but the lower relative humidity reduces the risk of epidemic spread in Wenzhou. In contrast, meteorological data have significant impacts on the spread of COVID-19 in Wenzhou. The four predictors (NO2, PM2.5, CO, and O3) have significant effects on the number of nonimported confirmed cases. Among them, PM2.5 has a significant positive correlation with the number of non-imported confirmed cases. Daily average wind speed, NO2 and O3 have different effects on the number of non-imported confirmed cases in different cities.
Pub. online:4 Aug 2022Type:Research ArticleOpen Access
Journal:Journal of Data Science
Volume 18, Issue 5 (2020): Special Issue S1 in Chinese (with abstract in English), pp. 907–921
Abstract
The Corona Virus Disease 2019 (COVID-19) emerged in Wuhan, China in December 2019. In order to control the epidemic, the Chinese government adopted several public health measures. To study the influence of these measures on the transmissibility of COVID-19 in the city of Wuhan and other cities in the Hubei province, China, we establish generalized semi-varying coefficient models for the number of new diagnosed cases and estimate the varying coefficient for the covariates by the spline method. Since the pandemic was most severe in Wuhan, we fitted separate models for Wuhan and the remaining 16 cities in Hubei. Estimators for the incubation periods, the real-time transmission rates, and the real-time reproduction numbers were obtained. The results demonstrate that the changes in the real-time transmission rate in Wuhan and other cities in Hubei are almost simultaneous. Futher, public health interventions such as restriction of traffic, adjustment of the diagnosed standard, deployment of medical resources, and improvement of nucleic acid testing capacity, had positive effects on reducing the transmission of COVID-19.