The United States has the highest numbers of confirmed cases of COVID-19 in the world. The early hot spot states were New York, New Jersey, and Connecticut. The workforce in these states was required to work from home except for essential services. It was necessary to evaluate an appropriate date for resumption of business since the premature reopening of the economy would lead to a broader spread of COVID-19, while the opposite situation would cause greater loss of economy. To reflect the real-time risk of the spread of COVID-19, it was crucial to evaluate the population of infected individuals before or never being confirmed due to the pre-symptomatic and asymptomatic transmissions of COVID-19. To this end, we proposed an epidemic model and applied it to evaluate the real-time risk of epidemic for the states of New York, New Jersey, and Connecticut. We used California as the benchmark state because California began a phased reopening on May 8, 2020. The dates on which the estimated numbers of unidentified infectious individuals per 100,000 for states of New York, New Jersey, and Connecticut were close to those in California on May 8, 2020, were June 1, 22, and 22, 2020, respectively. By the practice in California, New York, New Jersey, and Connecticut might consider reopening their business. Meanwhile, according to our simulation models, to prevent resurgence of infections after reopening the economy, it would be crucial to maintain sufficient measures to limit the social distance after the resumption of businesses. This precaution turned out to be critical as the situation in California quickly deteriorated after our analysis was completed and its interventions after the reopening of business were not as effective as those in New York, New Jersey, and Connecticut.
Previous abstractive methods apply sequence-to-sequence structures to generate summary without a module to assist the system to detect vital mentions and relationships within a document. To address this problem, we utilize semantic graph to boost the generation performance. Firstly, we extract important entities from each document and then establish a graph inspired by the idea of distant supervision (Mintz et al., 2009). Then, we combine a Bi-LSTM with a graph encoder to obtain the representation of each graph node. A novel neural decoder is presented to leverage the information of such entity graphs. Automatic and human evaluations show the effectiveness of our technique.
The coronavirus disease 2019 (COVID-19) pandemic caused by the novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has placed epidemic modeling at the center of attention of public policymaking. Predicting the severity and speed of transmission of COVID-19 is crucial to resource management and developing strategies to deal with this epidemic. Based on the available data from current and previous outbreaks, many efforts have been made to develop epidemiological models, including statistical models, computer simulations, mathematical representations of the virus and its impacts, and many more. Despite their usefulness, modeling and forecasting the spread of COVID-19 remains a challenge. In this article, we give an overview of the unique features and issues of COVID-19 data and how they impact epidemic modeling and projection. In addition, we illustrate how various models could be connected to each other. Moreover, we provide new data science perspectives on the challenges of COVID-19 forecasting, from data collection, curation, and validation to the limitations of models, as well as the uncertainty of the forecast. Finally, we discuss some data science practices that are crucial to more robust and accurate epidemic forecasting.
Researchers and public officials tend to agree that until a vaccine is readily available, stopping SARS-CoV-2 transmission is the name of the game. Testing is the key to preventing the spread, especially by asymptomatic individuals. With testing capacity restricted, group testing is an appealing alternative for comprehensive screening and has recently received FDA emergency authorization. This technique tests pools of individual samples, thereby often requiring fewer testing resources while potentially providing multiple folds of speedup. We approach group testing from a data science perspective and offer two contributions. First, we provide an extensive empirical comparison of modern group testing techniques based on simulated data. Second, we propose a simple one-round method based on ${\ell _{1}}$-norm sparse recovery, which outperforms current state-of-the-art approaches at certain disease prevalence rates.
It is hypothesized that short-term exposure to air pollution may influence the transmission of aerosolized pathogens such as COVID-19. We used data from 23 provinces in Italy to build a generalized additive model to investigate the association between the effective reproductive number of the disease and air quality while controlling for ambient environmental variables and changes in human mobility. The model finds that there is a positive, nonlinear relationship between the density of particulate matter in the air and COVID-19 transmission, which is in alignment with similar studies on other respiratory illnesses.