Evaluate the Risk of Resumption of Business for the States of New York, New Jersey and Connecticut via a Pre-Symptomatic and Asymptomatic Transmission Model of COVID-19 ∗

The United States has the highest numbers of conﬁrmed cases of COVID-19 in the world. The early hot spot states were New York, New Jersey, and Connecticut. The workforce in these states was required to work from home except for essential services. It was necessary to evaluate an appropriate date for resumption of business since the premature reopening of the economy would lead to a broader spread of COVID-19, while the opposite situation would cause greater loss of economy. To reﬂect the real-time risk of the spread of COVID-19, it was crucial to evaluate the population of infected individuals before or never being conﬁrmed due to the pre-symptomatic and asymptomatic transmissions of COVID-19. To this end, we proposed an epidemic model and applied it to evaluate the real-time risk of epidemic for the states of New York, New Jersey, and Connecticut. We used California as the benchmark state because California began a phased reopening on May 8, 2020. The dates on which the estimated numbers of unidentiﬁed infectious individuals per 100,000 for states of New York, New Jersey, and Connecticut were close to those in California on May 8, 2020, were June 1, 22, and 22, 2020, respectively. By the practice in California, New York, New Jersey, and Connecticut might consider reopening their business. Meanwhile, according to our simulation models, to prevent resurgence of infections after reopening the economy, it would be crucial to maintain suﬃcient measures to limit the social distance after the resumption of businesses. This precaution turned out to be critical as the situation in California quickly deteriorated after our analysis was completed and its interventions after the reopening of business were not as eﬀective as those in New York, New Jersey, and Connecticut.


Introduction
The outbreak of novel coronavirus disease  has spread over 200 countries since December 2019 (National Health Commission of the People's Republic of China, 2020). It is unprecedented to have over 7 million cumulative confirmed cases of COVID-19 worldwide at the beginning of June, 2020 (World Health Organization, 2020c). The "battle" against COVID-19 in China has provided experience and likely outcomes of certain interventions to the ongoing hard-hit areas. As a novel and acute infectious disease, the transmission mechanisms of COVID-19 were unknown at the early stage of the epidemic, and the Chinese government implemented relatively strict non-pharmaceutical interventions in the hot spot areas, where the public transportation was suspended within and outside of the cities in Hubei province since January 23, 2020 (Chinese Center for Disease Control and Prevention, 2020). All nationwide residents were recommended to stay at home except for essential needs. The holiday season of the Chinese Spring Festival had been prolonged until late February when essential services were recommenced operating gradually outside Hubei province (The State Council, The People's Republic of China, 2020b). In April 2020, a comprehensive resumption of business started in China (The State Council, The People's Republic of China, 2020a).
In late January 2020, the United States began reporting confirmed cases of COVID-19 (Holshue et al., 2020). There were over 1,000 cumulative confirmed cases on March 13, 2020 (World Health Organization, 2020a), when the White House declared a national emergency concerning COVID-19 outbreak (The White House, 2020b) and issued a "call to action" coronavirus guidelines on March 16, 2020 (The White House, 2020a). The United States has become the most severe country of 346,154,154,40,468,and 94,743 cumulative confirmed cases in the states of New York, New Jersey, Connecticut, and California by May 24, 2020 (Dong et al., 2020), respectively. Making things worse, New York and California are the top two states that contribute to the real gross domestic product (GDP) in the United States ( Figure 1) (The United States Census Bureau, 2020).
The state of New York reported the first confirmed case of COVID-19 on March 1, 2020, and proclaimed an executive order on March 16, 2020, including reducing half of the local government workforce, allowing the statewide nonessential workforce to work from home starting on March 17, 2020, and closing all schools starting on March 18, 2020 (State of New York Governor Andrew Cuomo, 2020g). Due to the rapidly increasing number of additional cases of COVID-19 in the state, the governor announced an aggressive policy of "New York State on PAUSE" on March 20, 2020 (State of New York Governor Andrew Cuomo, 2020c), and required all people in the state to wear masks or face covering in public since April 15, 2020 (State of New York Governor Andrew Cuomo, 2020f). On May 1, 2020, all statewide K-12 schools and college facilities continued to close for the remaining academic year (State of New York Governor Andrew Cuomo, 2020a). The guide of the "NY Forward Reopening" Plan was available on May 11, 2020, and three regions reopened businesses for phase one on May 15, 2020 (State of New York Governor Andrew Cuomo,

Data Collection
This work began in April and initially completed in May 2020 when there were no widespread pretests of social injustice in the United States (Minnesota Daily, 2020). To emphasize the effectiveness of our model for evaluating the real-time risk of the epidemic, we chose to focus on the data before June 2020. This strategy also allowed us the opportunity to contrast our prediction and recommendations to what took place after June 2020.
We collected the epidemic data from March 13, 2020 when the national emergency concerning COVID-19 was proclaimed to May 24, 2020 in New York, New Jersey, Connecticut, and California. The data were made available by the New York Times (The New York Times, 2020).

Bayesian Modelling of Epidemic
Based on a WHO report, the transmission of COVID-19 could be caused by the individuals infected with the virus before significant symptoms developed (World Health Organization, 2020b), or even the carriers who did not develop symptoms. Pre-symptomatic transmission and asymptomatic transmission interfere with our ability to understand the real magnitude of COVID-19 because of the lag from the time of catching the virus to the time of being confirmed by testing. To overcome this issue, we divided a concerned population into four compartments: susceptible (S), unidentified infectious (I ), self-healing without being confirmed (H ), and confirmed cases (C): • The susceptible (S) individuals have no immunity to the disease and are the majority of the population at an early stage of the epidemic. • Unidentified infectious (I ) individuals are infectious but not confirmed individuals, and can be divided into two groups: those who would eventually develop symptoms and the others who would not develop symptoms called asymptomatic carriers. We assumed that individuals in compartment I would move into either group H or C eventually. • Self-healing individuals without being confirmed (H ) are assumed to be no longer infectious and resistant to COVID-19. • Confirmed cases (C) include two groups of individuals: patients in the hospital and asymptomatic carriers who are supposed to stay at home, and unable to transmit the disease. We introduced a SI H C model to accommodate the four compartments above. Figure 4 presents the flowchart among them.
We assumed the following dynamic system in terms of the numbers of individuals in compartments S, I , H , and C at time t, denoted by S(t), I (t), H (t), and C(t), respectively, where ρ is the transmissiblility and θ(t) is the average contact number per person at time t (Blackwood and Childs, 2018), which was assumed to be time-varying; D H is the average dura- tion from catching the virus to self-healing without being confirmed; D C is the average duration from catching the virus to be confirmed by testing; and N is the total number of population. Let α(t) = ρθ(t). θ(t) can be controlled by policy interventions. Typically, it is a constant at an early stage of the epidemic and decreases as the interventions are implemented until the interventions take full effect when it reaches or nears the lowest level. Based on this point of view, we further assumed that α(t) was a monotonically decreasing curve (Tan et al., 2020) with four parameters α 0 , d, m, and η: where α 0 denotes the maximum of α(t) at an early stage of the outbreak, d is the time that takes for the control measures to start their effects and for α(t) to start declining, (1 − η) is the ratio of α 0 to the minimum of α(t), which is α 0 (1 − η). v m was chosen as 2 log (99) m so that α(t) first reaches the minimum at d + m. Figure 5 illustrates the shape of α(t).
Next, we derived the time-varying reproduction number R t (Anderson et al., 1992;Jones, 2007) by our SI H C model: Let = (α 0 , η, m, D C ) and = ( , d, D H ), where d and D H were given and the others needed to be estimated. Note that and Z t = (I t , H t , C t ) was assumed to be a latent Markov process where Z t+1 (Z t , ) was the evolving operator to determine the values of I , H and C at time (t + 1) given the deterministic dynamic system (1) with the initial value of (N − I t − H t − C t , I t , H t , C t ) and parameters . Pois(·|λ) is the mass of a multi-dimensional independent Poisson distribution with mean vector λ.
H t , C t , the conditional independent Poisson likelihood of (I t+1 , H t+1 , C t+1 ) is the Poisson approximation for the multinomial likelihood of (S t+1 , I t+1 , H t+1 , C t+1 ), whose incident rate is and the total number is N (Tan et al., 2020). Notice that Z t+1 (Z t , ) has no closed form, we used ode() function implemented by R package deSolve (Soetaert et al., 2010) to solve the given system of ordinary differential equations.
Given the observable data C 1:T and N , where T was the time period of observation, we set H 1 = 0, which was reasonable for the early stage of the outbreak. The numbers of individuals in compartments I 1:T and H 2:T were treated as the latent variables. For simplicity, we assumed that ) a.s.; namely, H t+1 depended completely on I t , H t and D H . In fact, our empirical observation suggested that this simplification did not substantially alter the results, while assuming H t+1 ∼ Pois(H t+1 (Z t , )) would have substantially increased the computational burden for the sampling procedure. We employed a Bayesian procedure in our parameter estimation and future prediction.The posterior distribution of the parameters was (Bolstad and Curran, 2016): where π(·) represents the prior distribution of corresponding parameter and π(· | * ) represents the posterior distribution of corresponding parameter given the observed data " * ". Similarly, we can use the posterior distribution of Z s , π(Z s |C 1:T , N, d, D H , H 1 ) (s > T ), to predict the spread of infectious disease. For prior distributions, notice that D C was governed by the mean of the incubation period. Based on the related literature on the incubation period of COVID-19 (Lauer et al., 2020), we chose an informative prior: the log-normal distribution with the log-mean of log(5.1) as the prior distribution of D C . Similarly, we chose the log-normal distribution with the log-mean of log(30) as the prior distribution of m. In other words, the interventions were assumed to take the full effect after one month. To enhance the influence of prior information, we assumed the log-standard deviation for both D C and m is log(1.05). The priors of the remaining parameters were chosen to be non-informative or flat priors, i.e., π(α 0 ), π(η) ∝ 1.
For the fixed parameters d, and D H , recall that d was the waiting time for interventions to begin. We chose d as the start of implementing certain interventions, i.e, d = 8 for New York (see "New York State on PAUSE" in Figure 3), d = 9 for New Jersey (see "statewide stay at home" in Figure 3), d = 8 for Connecticut (see "stay safe, stay home" in Figure 3), and d = 7 for California (see "stay home except for essential needs" in Figure 3). D H was assumed to be 9.5 according to the clinical study of asymptomatic cases (Hu et al., 2020).
Since π( , I 1:T , H 2:T |C 1:T , N, d, D H , H 1 ) has no closed form, we used Markov Chain Monte Carlo (MCMC) (Ghosh et al., 2006;Andrieu et al., 2003;Chib and Greenberg, 1995;Soubeyrand, 2016) to approximate posterior distributions of parameters and latent numbers of I and H at each iteration. The Appendix provides the details of the sampling algorithm. The point estimates of , I 1:T , H 2:T and the prediction of spread were the medians of the posterior distribution while 95% credible intervals were constructed with 2.5% and 97.5% quantiles.

Model Estimation
We used publicly available data to simulate the possible outcomes of the outbreak of COVID-19 by varying the dates when the business reopens. Table 1 presented the parameter estimates. These estimates were then used in our simulation models for potential second waves of COVID-19 while we assessed the risk for people to return to work. The data after our models were built and used to assess the validity of our simulation models.
Briefly, the average relative error between our predicted and the observed numbers of cases from May 25 to June 6, 2020 was computed from where K is the length of prediction time. The AERs were 0.34%, 0.45%, 0.59%, 3.56% in New York, New Jersey, Connecticut, and California, respectively. These low levels of errors suggested Figure 6: The state-specific trends of simulated confirmed cases (C) and unidentified infected (I ) per 100,000 from March 13 to June 6, 2020. The points represent the observed cumulative confirmed cases.
accurate prediction from our models. The estimated trends for the four states are given in Figure 6. The estimated rates of infected individuals without being confirmed on May 24, 2020 (i.e. the estimated I t +H t N −S t for time t to be May 24, 2020) were 22.82% with 95% CI (21.86%, 24.68%), 32.38% with 95% CI (31.36%, 33.09%), 34.95% with 95% CI (33.04%, 36.32%), 37.74% with 95% CI (35.72%, 38.53%), respectively, in the four states. It had been reported that the numbers of tests per 100,000 in New York, New Jersey, Connecticut, and California on May 24, 2020, were 9312, 7434, 6321, and 4396, respectively (Jin, 2020). It appeared that the rate of testing was inversely proportional to the rate of unidentified infected.
To visualize the importance of the unobserved number of unidentified infected individuals (N − S t ), rather than the observed confirmed cases (C t ), for the assessment of epidemic risk, we displayed the comparison between the estimated number of new daily infected individuals and observed new daily confirmed cases in Figure 7. Note that there were gaps between the number of new daily confirmed cases and new infected individuals in Figure 7 as a result of the pre-symptomatic and asymptomatic transmissions of COVID-19. This phenomenon was the lagging effect for new daily confirmed cases, reflecting the time interval from catching the virus to being confirmed by testing. Figure 7 indicated that we cannot ignore the lagging effect at an early stage of the epidemic, although it seemed to disappear over time.

Evaluate the Risk for Reopening the Economy
To appreciate the potential risk of COVID-19, we considered the estimated numbers of the only transmissible compartment I t per 100,000 for New York, New Jersey, Connecticut, and California; see Figure 8. This figure indicated that the peak of unidentified infectious individuals in New York, New Jersey, and Connecticut had passed. So far, this remained to be the case.
California reopened the lower-risk workplaces as Stage 2 on May 8, 2020. This decision of California was used as a reference in our consideration of resuming the business in New York, New Jersey, and Connecticut. Compared with California, the numbers of unidentified infected individuals (I t ) in New York, New Jersey, and Connecticut were higher before May 8, 2020, but on clearly descending trajectories. However, while the number was low, the steady upward trajectory in California was concerning. This upward trajectory coupled with insufficient interventions might be the major cause for the resurgence of COVID-19 in California after the reopening.
To balance the risk of epidemic versus resuming business, we considered a few choices of Mondays in June 2020 as possible dates of reopening. We used our simulation models to predict the numbers of unidentified infectious individuals (I t ) per 100,000 for New York, New Jersey, Connecticut, and California on June 1, June 8, June 15, June 22 and June 29, 2020. The results were given in Table 2. Common wisdom is that the risk is regarded as reasonably low for a population when the number of unidentified infectious persons per 100,000 is closed to 20. This guideline was used to form cross-state travel guidelines in the United States. This was then used as a rationale for a safe resumption of business.
To appreciate the potential risk of resuming business on different Mondays, we simulated the possible second wave of infection after people returned to work. Unlike the first attack, Figure 8: The trend of the numbers of unidentified infectious individuals per 100,000 for states of New York, New Jersey, Connecticut, and California from March 13 to July 20, 2020. The corresponding 95% credible intervals were represented in each state. we assumed that stringent interventions would be re-enforced one week after the business was reopened. We used the estimatedα(t) in Table 1 under "reopening" operation (6) in the Supplement as the parameters underlying the COVID-19 transmission.
The simulated results are given in Figure 9. Brief, for New York, the simulated numbers of cumulative confirmed cases per 100,000 on July 20 after the business was resumed on June 1, June 8, June 15, June 22 and June 29 were, respectively, 20.26%, 14.76%, 10.37%, 6.98%, and 4.37% higher than those if the business was not resumed. For New Jersey, the simulated numbers of cumulative confirmed cases per 100,000 on July 20 after the business was resumed on June 1, June 8, June 15, June 22 and June 29 were, respectively, 43.91%, 33.07%, 24.15%, 16.72% and 10.70% higher than those if the business was not resumed. For Connecticut, the simulated numbers of cumulative confirmed cases per 100,000 on July 20 after the business was resumed on June 1, June 8, June 15, June 22 and June 29 were, respectively, 37.18%, 28.08%, 20.45%, 14.29%, and 9.12%, higher than those if the business was not resumed.

Discussion
The decision for the resumption of business is not only a public health issue but also an economic issue. What we focused on was the epidemiological feasibility of returning to work at an early date. There were obviously many other factors to consider (Centers for Disease Control and Prevention, 2020). To analyze the epidemic data of COVID-19 in New York, New Jersey, and Connecticut, we proposed an epidemic model by considering pre-symptomatic and asymptomatic transmissions of COVID-19. This model provided estimates for the numbers of new daily infected individuals and new confirmed cases. The higher number of unidentified infectious individuals, the higher risk for the resumption of business was expected. From Figure 9, we concluded that there were certain risks for the resumption of businesses on June 1, 2020, for New York, New Jersey, and Connecticut. If the governors of those states delayed the resumption of businesses for one week or more, the simulated magnitude of the second wave of the infection was much lower. However, the added benefit of delaying the reopening appeared less beyond one week and even more so after.
Because California began the process of reopening economy on May 8, 2020, we used the data in California at that time as a reference for reopening New York, New Jersey, and Connecticut. The dates when the estimated numbers of unidentified infectious individuals per 100,000 for states of New York, New Jersey, and Connecticut were close to those in California on May 8, 2020 were June 1, 22, and 22, 2020, respectively. By following the practice in California, New York, New Jersey, and Connecticut might consider reopening their business on June 1, 22, and 22, 2020, respectively. Moreover, we noted that the trajectory in California was clearly in a wrong direction as opposed to the descending trajectories New York, New Jersey, and Connecticut. While the three east coast states have become the lowest risk states, California has been suffering from a resurgence, underscoring the importance of maintaining public health practice after reopening business.

Supplementary Material
The data and the R code used in the analysis in this study are available at https://github.com/ tingT0929/Resumption-of-business.

Algorithm 1 Update
Input: (k)   group sampling is high dimensional if T is large. Hence, moving from one iteration of Metropolis-Hastings algorithm to the next is computationally intensive. The sequential sampling requires more time for Markov chain to converge. We utilized the mixture of MCMC kernels Andrieu et al. (2003) to combine these two sampling strategies to balance the trade-off between the acceptance rate and the convergence time of MCMC. For simulation of the second wave of the infection, we defined a "reopening" operation R h on α(t): which implies the stringent interventions would be re-enforced one week after the business was reopened at time h.