Abstract: For longitudinal binary data with non-monotone non-ignorable missing outcomes over time, a full likelihood approach is complicated alge braically, and maximum likelihood estimation can be computationally pro hibitive with many times of follow-up. We propose pseudo-likelihoods to estimate the covariate effects on the marginal probabilities of the outcomes, in addition to the association parameters and missingness parameters. The pseudo-likelihood requires specification of the distribution for the data at all pairs of times on the same subject, but makes no assumptions about the joint distribution of the data at three or more times on the same sub ject, so the method can be considered semi-parametric. If using maximum likelihood, the full likelihood must be correctly specified in order to obtain consistent estimates. We show in simulations that our proposed pseudo likelihood produces a more efficient estimate of the regression parameters than the pseudo-likelihood for non-ignorable missingness proposed by Troxel et al. (1998). Application to data from the Six Cities study (Ware, et.al, 1984), a longitudinal study of the health effects of air pollution, is discussed.
Abstract: The aim of this paper is to determine the effectiveness of cross association in detecting the similarity between correlated geological columnar sections. For this purpose, cross association is used to compare several geological columnar sections which are arbitrarily selected from different localities in central and north Jordan. It turns out, for most of the study cases, that the sections which consist of the same rock units (formations) are statistically classified as similar (p-value .05), while sections of different rock units (formations) are statistically classified as dissimilar (p-value .05).
Abstract: Applications of multivariate statistical techniques, including graphical models, are seldom found in e-commerce studies. However, as this paper demonstrates, we find that probabilistic graphical models are useful in this area, both because of their ability to handle large numbers of potentially interrelated variables, and because of their ability to communicate statistical relationships clearly to both the researcher and the ultimate business audience. We show an application of this methodology to intranets, internal corporate information systems employing Internet technology. In particular, we study both the interrelationships among intranet benefits and the interrelationships among intranet applications. This approach confirms some hypothesized relationships, and uncovers heretofore-unanticipated relationships among intranet variables, providing guidance for business professionals seeking to develop effective intranet systems. The techniques described here also have potential applicability in other e-commerce arenas, including business-to-consumer and business-to-business applications.
Abstract: Examining the daily Dow Jones Industrial Average (DJI) we find evidence both of higher-order anomalies and predictability. While most researchers are only aware of the relatively harmless anomalies that occur just in the mean, the first part of this article provides empirical evidence of more dangerous kinds of anomalies occurring in higher-order moments. This evidence casts some doubt on the common practice of fitting standard time series models (e.g., ARMA models, GARCH models, or stochastic volatility models) to financial time series and carrying out tests based upon autocorre lation coefficients without making proper provision for these anomalies. The second part of this article provides evidence in favor of the predictability of the returns on the DJI and, more interestingly, against the efficient market hypothesis. The special value of this evidence is due to the simplicity of the involved methods.
Abstract: This paper describes a test of two alternative sets of ratio edit and imputation procedures, both using the U.S. Census Bureau’s generalized editing/imputation subsystem (“Plain Vanilla”) on 1997 Economic Census data. We compare the quality of edited and im puted data — at both the macro and micro levels — from both sets of procedures and discuss how our quantitative methods allowed us to recommend changes to current procedures.
Abstract: We propose a new method of adding two parameters to a contin uous distribution that extends the idea first introduced by Lehmann (1953) and studied by Nadarajah and Kotz (2006). This method leads to a new class of exponentiated generalized distributions that can be interpreted as a double construction of Lehmann alternatives. Some special models are dis cussed. We derive some mathematical properties of this class including the ordinary moments, generating function, mean deviations and order statis tics. Maximum likelihood estimation is investigated and four applications to real data are presented.
Abstract: A new family of copulas generated by a univariate distribution function is introduced, relations between this copula and other well-known ones are discussed. The new copula is applied to model the dependence of two real data sets as illustrations.
Abstract: In this paper, we reconsider the two-factor stochastic mortality model introduced by Cairns, Blake and Dowd (2006) (CBD). The error terms in the CBD model are assumed to form a two-dimensional random walk. We first use the Doornik and Hansen (2008) multivariate normality test to show that the underlying normality assumption does not hold for the considered data set. Ainou (2011) proposed independent univariate normal inverse Gaussian L´evy processes to model the error terms in the CBD model. We generalize this idea by introducing a possible dependency between the 2-dimensional random variables, using a bivariate Generalized Hyperbolic distribution. We propose four non-Gaussian, fat-tailed distributions: Stu dent’s t, normal inverse Gaussian, hyperbolic and generalized hyperbolic distributions. Our empirical analysis shows some preferences for using the new suggested model, based on Akaike’s information criterion, the Bayesian information criterion and likelihood ratio test, as our in-sample model selec tion criteria, as well as mean absolute percentage error for our out-of-sample projection errors.
Fixed-point algorithms are popular in statistics and data science due to their simplicity, guaranteed convergence, and applicability to high-dimensional problems. Well-known examples include the expectation-maximization (EM) algorithm, majorization-minimization (MM), and gradient-based algorithms like gradient descent (GD) and proximal gradient descent. A characteristic weakness of these algorithms is their slow convergence. We discuss several state-of-art techniques for accelerating their convergence. We demonstrate and evaluate these techniques in terms of their efficiency and robustness in six distinct applications. Among the acceleration schemes, SQUAREM shows robust acceleration with a mean 18-fold speedup. DAAREM and restarted-Nesterov schemes also demonstrate consistently impressive accelerations. Thus, it is possible to accelerate the original fixed-point algorithm by using one of SQUAREM, DAAREM, or restarted-Nesterov acceleration schemes. We describe implementation details and software packages to facilitate the application of the acceleration schemes. We also discuss strategies for selecting a particular acceleration scheme for a given problem.