Pub. online:4 Aug 2022Type:Research ArticleOpen Access
Journal:Journal of Data Science
Volume 18, Issue 3 (2020): Special issue: Data Science in Action in Response to the Outbreak of COVID-19, pp. 536–549
Abstract
As the COVID-19 pandemic has strongly disrupted people’s daily work and life, a great amount of scientific research has been conducted to understand the key characteristics of this new epidemic. In this manuscript, we focus on four crucial epidemic metrics with regard to the COVID-19, namely the basic reproduction number, the incubation period, the serial interval and the epidemic doubling time. We collect relevant studies based on the COVID-19 data in China and conduct a meta-analysis to obtain pooled estimates on the four metrics. From the summary results, we conclude that the COVID-19 has stronger transmissibility than SARS, implying that stringent public health strategies are necessary.
Compound distributions gained their importance from the fact that natural factors have compound effects, as in the medical, social and logical experiments. Dubey (1968) introduced the compound Weibull by compounding Weibull distribution with gamma distribution. The main aim of this paper is to define a bivariate generalized Burr (compound Weibull) distribution so that the marginals have univariate generalized Burr distributions. Several properties of this distribution such as marginals, conditional distributions and product moments have been discussed. The maximum likelihood estimates for the unknown parameters of this distribution and their approximate variance- covariance matrix have been obtained. Some simulations have been performed to see the performances of the MLEs. One data analysis has been performed for illustrative purpose.
In this paper, we introduce a new lifetime model, called the Gen- eralized Weibull-Burr XII distribution. We discuss some of its mathematical properties such as density, hazard rate functions, quantile function and mo- ments. Maximum likelihood method is used to estimate model parameters. A simulation study is performed to assess the performance of maximum like- lihood estimators by means of biases, mean squared errors. Finally, we prove that the proposed distribution is a very competitive model to other classical models by means of application on real data set.
Abstract: In the paper, we propose power weighted quantile regression(PWQR), which can reduce the effect of heterogeneous of the conditional densities of the response effectively and improve efficiency of quantile regression). In addition to PWQR, this article also proves that all the weighting of those that the actual value is less than the estimated value of PWQR and the proportion of all the weighting is very close to the corresponding quantile. At last, this article establishes the relationship between Geomagentic Indices and GIC. According to the problems of power system security operation, we make GIC risk value table. This table can have stronger practical operation ability, can provide power system security operation with important inferences.
Abstract: An individual in a finite population is represented by a random variable whose expectation is linearly composed of explanatory variables and a personal effect. This expectation locates her (his) random variable on a scale when s(he) responds to a questionnaire item or physical instrument. This formulation reinterprets design-based sampling, which represents an individual as a constant waiting to be observed. Retaining constant expecta tions , however, along with fixed realizations of random variables, preserves and strengthens design-based theory through the Horvitz-Thompson (1952) theorem. This interpretation reaffirms the usual design-based regression es timates, whose normality is seen to be free of any assumptions about the distribution of the outcome variable. It also formulates response error in a way that renders a superpopulation, postulated by model-based sampling, unnecessary. The value of distribution-free regression is illustrated with an analysis of American presidential approval.
The statistical modeling of natural disasters is an indispensable tool for extracting information for prevention and risk reduction casualties. The Poisson distribution can reveal the characteristics of 1 a natural disaster. However, this distribution is insufficient for the clustering of natural events and related casualties. The best approach is to use a Neyman type A (NTA) distribution which has the feature that two or more events occur in a short time. We obtain some properties of the NTA distribution and suggest that it could provide a suitable description to analyze the natural disaster distribution and casualties. We support this argument using disaster events, including earthquakes, floods, landslides, forest fires, avalanches, and rock falls in Turkey between 1900 and 2013. The data strongly supports that the NTA distribution represents the main tool for handling disaster data. The findings indicate that approximately three earthquakes, fifteen landslides, five floods, six rock falls, six avalanches, and twenty nine forest fires are expected in a year. The results from this model suggest that the probability of the total number of casualties is the highest for the earthquakes and the lowest for the rock falls. This study also finds that the expected number of natural disasters approximately equals to 64 per year and inter-event time between two successive earthquakes is approximately four months. The inter-event time for the natural disasters is approximately six days in Turkey.
Abstract: The Asian financial crisis that struck most of the East Asian countries in 1997 have caught the attention of many researchers in finance and economic. This is due to realization that during the crisis the countries affected saw their currencies depreciate for more than 50% and their stock markets sharply fall about 30% to 50%. In this paper, we investigate the relationship among the return of stock markets from three Southeast Asian countries (Malaysia, Singapore and Thailand) or the ASEAN countries using monthly data between 1990 and 2004. We found the three stock markets are not cointegrated. Therefore, instead of modelling the returns data using linear vector autoregressive (VAR) models, we assume the returns data are regime-dependent and we use the two regime multivariate Markov switching vector autoregressive (MS-VAR) model with regime shifts in both the mean and the variance to extract common regime shifts behaviour from the return series. It is found that MS-VAR model with two regimes manage to detect common shifts in all the stock markets return series and this show evidence of comovement among the three returns series. Furthermore, we also found that the MS-VAR model manage to capture a satisfactory timing of the 1997 financial crisis that happen in the three countries.
Abstract: The paper deals with the introduction of new generalized model i.e., Rayleigh Lomax distribution. In this manuscript, a comprehensive description of the various structural properties of the new proposed model including explicit expressions for moments, quantile function, generating functions and Renyi entropy have been given. The parameters of the newly developed distribution have been estimated using the technique of maximum likelihood estimation. Also, the generalized model has been compared with different models for illustration and best fit.
Abstract: We derive three likelihood-based confidence intervals for the risk ratio of two proportion parameters using a double sampling scheme for mis classified binomial data. The risk ratio is also known as the relative risk. We obtain closed-form maximum likelihood estimators of the model parameters by maximizing the full-likelihood function. Moreover, we develop three confidence intervals: a naive Wald interval, a modified Wald interval, and a Fieller-type interval. We apply the three confidence intervals to cervical cancer data. Finally, we perform two Monte Carlo simulation studies to assess and compare the coverage probabilities and average lengths of the three interval estimators. Unlike the other two interval estimators, the modified Wald interval always produces close-to-nominal confidence intervals for the various simulation scenarios examined here. Hence, the modified Wald confidence interval is preferred in practice.
Pub. online:4 Aug 2022Type:Research ArticleOpen Access
Journal:Journal of Data Science
Volume 18, Issue 3 (2020): Special issue: Data Science in Action in Response to the Outbreak of COVID-19, pp. 526–535
Abstract
COVID-19 is a disease caused by the severe acute respiratory syndrome coronavirus 2 (SARSCoV-2) that was reported to spread in people in December 2019. Understanding epidemiological
features of COVID-19 is important for the ongoing global efforts to contain the virus. As a
complement to the available work, in this article we analyze the Kaggle novel coronavirus dataset
of 3397 patients dated from January 22, 2020 to March 29, 2020. We employ semiparametric
and nonparametric survival models as well as text mining and data visualization techniques to
examine the clinical manifestations and epidemiological features of COVID-19. Our analysis
shows that: (i) the median incubation time is about 5 days and older people tend to have a
longer incubation period; (ii) the median time for infected people to recover is about 20 days,
and the recovery time is significantly associated with age but not gender; (iii) the fatality rate
is higher for older infected patients than for younger patients