Abstract: The paper deals with the introduction of new generalized model i.e., Rayleigh Lomax distribution. In this manuscript, a comprehensive description of the various structural properties of the new proposed model including explicit expressions for moments, quantile function, generating functions and Renyi entropy have been given. The parameters of the newly developed distribution have been estimated using the technique of maximum likelihood estimation. Also, the generalized model has been compared with different models for illustration and best fit.
Abstract: We derive three likelihood-based confidence intervals for the risk ratio of two proportion parameters using a double sampling scheme for mis classified binomial data. The risk ratio is also known as the relative risk. We obtain closed-form maximum likelihood estimators of the model parameters by maximizing the full-likelihood function. Moreover, we develop three confidence intervals: a naive Wald interval, a modified Wald interval, and a Fieller-type interval. We apply the three confidence intervals to cervical cancer data. Finally, we perform two Monte Carlo simulation studies to assess and compare the coverage probabilities and average lengths of the three interval estimators. Unlike the other two interval estimators, the modified Wald interval always produces close-to-nominal confidence intervals for the various simulation scenarios examined here. Hence, the modified Wald confidence interval is preferred in practice.
COVID-19 is a disease caused by the severe acute respiratory syndrome coronavirus 2 (SARSCoV-2) that was reported to spread in people in December 2019. Understanding epidemiological
features of COVID-19 is important for the ongoing global efforts to contain the virus. As a
complement to the available work, in this article we analyze the Kaggle novel coronavirus dataset
of 3397 patients dated from January 22, 2020 to March 29, 2020. We employ semiparametric
and nonparametric survival models as well as text mining and data visualization techniques to
examine the clinical manifestations and epidemiological features of COVID-19. Our analysis
shows that: (i) the median incubation time is about 5 days and older people tend to have a
longer incubation period; (ii) the median time for infected people to recover is about 20 days,
and the recovery time is significantly associated with age but not gender; (iii) the fatality rate
is higher for older infected patients than for younger patients
Abstract: Design-based regression regards the survey response as a constant waiting to be observed. Bechtel (2007) replaced this constant with the sum of a fixed true value and a random measurement error. The present paper relaxes the assumption that the expected error is zero within a survey respondent. It also allows measurement errors in predictor variables as well as in the response variable. Reasonable assumptions about these errors over respondents, along with coefficient alpha in psychological test theory, enable the regression of true responses on true predictors. This resolves two major issues in survey regression, i.e. errors in variables and item non-response. The usefulness of this resolution is demonstrated with three large datasets collected by the European Social Survey in 2002, 2004 and 2006. The paper concludes with implications of true-value regression for survey theory and practice and for surveying large world populations.
Abstract: Objectives: Exploratory Factor Analysis (EFA) is a very popular statistical technique for identifying potential latent structure underlying a set of observed indicator variables. EFA is used widely in the social sciences, business and finance, machine learning, and the health sciences, among others. Research has found that standard methods of estimating EFA model parameters do not work well when the sample size is relatively small (e.g. less than 50) and/or when the number of observed variables approaches the sample size in value. The purpose of the current study was to investigate and compare some alternative approaches to fitting EFA in the case of small samples and high dimensional data. Results of both a small simulation study, and an application of the methods to an intelligence test revealed that several alternative approaches designed to reduce the dimensionality of the observed variable covariance matrix worked very well in terms of recovering population factor structure with EFA. Implications of these results for practice are discussed..
Abstract: In compositional data, an observation is a vector with non-negative components which sum to a constant, typically 1. Data of this type arise in many areas, such as geology, archaeology, biology, economics and political science among others. The goal of this paper is to extend the taxicab metric and a newly suggested metric for com-positional data by employing a power transformation. Both metrics are to be used in the k-nearest neighbours algorithm regardless of the presence of zeros. Examples with real data are exhibited.
Abstract: We present power calculations for zero-inflated Poisson (ZIP) and zero-inflated negative-binomial (ZINB) models. We detail direct computations for a ZIP model based on a two-sample Wald test using the expected information matrix. We also demonstrate how Lyles, Lin, and Williamson’s method (2006) of power approximation for categorical and count outcomes can be extended to both zero-inflated models. This method can be used for power calculations based on the Wald test (via the observed information matrix) and the likelihood ratio test, and can accommodate both categorical and continuous covariates. All the power calculations can be conducted when covariates are used in the modeling of both the count data and the “excess zero” data, or in either part separately. We present simulations to detail the performance of the power calculations. Analysis of a malaria study is used for illustration.
Abstract: In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly cor related items. Firth (1993, Biometrika, 80(1),27-38) proposed a penalized likelihood estimator for generalized linear models and it was shown to re duce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinear ity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth’s penalized likelihood equation with a ridge parameter. We present a simu lation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study.
The concept of ranked set sampling (RSS) is applicable whenever ranking on a set of sampling units can be done easily by a judgment method or based on an auxiliary variable. In this work, we consider a study variable 𝑌 correlated with auxiliary variable 𝑋 which is used to rank the sampling units. Further (𝑋, 𝑌) is assumed to have Morgenstern type bivariate generalized uniform distribution. We obtain an unbiased estimator of a scale parameter associated with the study variable 𝑌 based on different RSS schemes and censored RSS. Efficiency comparison study of these estimators is also performed and presented numerically.
Abstract: This paper provides an introduction to multivariate non-parametric hazard model for the occurrence of earthquakes since the hazard function defines the statistical distribution of inter-event times. The method is ap plied to the Turkish seismicity since a significant portion of Turkey is subject to frequent earthquakes and presents several advantages compared to other more traditional approaches. Destructive earthquakes from 1903 to 2009 between the longitudes of (39-42)N◦ and the latitudes of (26-45)E◦ are used. The paper demonstrates how seismicity and tectonics/physics parameters that can potentially influence the spatio-temporal variability of earthquakes and presents several advantages compared to more traditional approaches.