Home
Search

Journal of Data Science

Submit your article Information

Journal home
To appear
Current issue
All issues
More
Journal home To appear Current issue All issues

Detailed search

Title

Author

Types

Abstract

Keywords

Published

Pages

Volumes

Issues

DOI

Affiliation

Search results 842

Order by:

Select: All None Download:

Pseudo-likelihood Methods for the Analysis of Longitudinal Binary Data Subject to Nonignorable Non-monotone Missingness

Michael Parzen

https://doi.org/10.6339/JDS.2007.05(1).301

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 5, Issue 1 (2007), pp. 1–21

Abstract

Abstract: For longitudinal binary data with non-monotone non-ignorable missing outcomes over time, a full likelihood approach is complicated alge braically, and maximum likelihood estimation can be computationally pro hibitive with many times of follow-up. We propose pseudo-likelihoods to estimate the covariate effects on the marginal probabilities of the outcomes, in addition to the association parameters and missingness parameters. The pseudo-likelihood requires specification of the distribution for the data at all pairs of times on the same subject, but makes no assumptions about the joint distribution of the data at three or more times on the same sub ject, so the method can be considered semi-parametric. If using maximum likelihood, the full likelihood must be correctly specified in order to obtain consistent estimates. We show in simulations that our proposed pseudo likelihood produces a more efficient estimate of the regression parameters than the pseudo-likelihood for non-ignorable missingness proposed by Troxel et al. (1998). Application to data from the Six Cities study (Ware, et.al, 1984), a longitudinal study of the health effects of air pollution, is discussed.

On the Use of Geostatistical Cross-Association Method for Lithostratigraphical Correlation

Walid Abdolqader Saqqa Mohammad Fraiwan Al-Saleh

https://doi.org/10.6339/JDS.2006.04(1).243

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 4, Issue 1 (2006), pp. 1–20

Abstract

Abstract: The aim of this paper is to determine the effectiveness of cross association in detecting the similarity between correlated geological columnar sections. For this purpose, cross association is used to compare several geological columnar sections which are arbitrarily selected from different localities in central and north Jordan. It turns out, for most of the study cases, that the sections which consist of the same rock units (formations) are statistically classified as similar (p-value .05), while sections of different rock units (formations) are statistically classified as dissimilar (p-value .05).

An Application of Graphical Modeling to the Analysis of Intranet Benefits and Applications

Raffaella Settimi Linda V. Knight Theresa A. Steinbach All authors (4)

https://doi.org/10.6339/JDS.2005.03(1).169

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 3, Issue 1 (2005), pp. 1–17

Abstract

Abstract: Applications of multivariate statistical techniques, including graphical models, are seldom found in e-commerce studies. However, as this paper demonstrates, we find that probabilistic graphical models are useful in this area, both because of their ability to handle large numbers of potentially interrelated variables, and because of their ability to communicate statistical relationships clearly to both the researcher and the ultimate business audience. We show an application of this methodology to intranets, internal corporate information systems employing Internet technology. In particular, we study both the interrelationships among intranet benefits and the interrelationships among intranet applications. This approach confirms some hypothesized relationships, and uncovers heretofore-unanticipated relationships among intranet variables, providing guidance for business professionals seeking to develop effective intranet systems. The techniques described here also have potential applicability in other e-commerce arenas, including business-to-consumer and business-to-business applications.

Unexpected Features of Financial Time Series: Higher-Order Anomalies and Predictability

Erhard Reschenhofer

https://doi.org/10.6339/JDS.2004.02(1).146

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 2, Issue 1 (2004), pp. 1–15

Abstract

Abstract: Examining the daily Dow Jones Industrial Average (DJI) we find evidence both of higher-order anomalies and predictability. While most researchers are only aware of the relatively harmless anomalies that occur just in the mean, the first part of this article provides empirical evidence of more dangerous kinds of anomalies occurring in higher-order moments. This evidence casts some doubt on the common practice of fitting standard time series models (e.g., ARMA models, GARCH models, or stochastic volatility models) to financial time series and carrying out tests based upon autocorre lation coefficients without making proper provision for these anomalies. The second part of this article provides evidence in favor of the predictability of the returns on the DJI and, more interestingly, against the efficient market hypothesis. The special value of this evidence is due to the simplicity of the involved methods.

Data Quality Effects of Alternative Edit Parameters

Katherine Jenny Thompson Samson A. Adeshiyan

https://doi.org/10.6339/JDS.2003.01(1).102

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 1, Issue 1 (2003), pp. 1–25

Abstract

Abstract: This paper describes a test of two alternative sets of ratio edit and imputation procedures, both using the U.S. Census Bureau’s generalized editing/imputation subsystem (“Plain Vanilla”) on 1997 Economic Census data. We compare the quality of edited and im puted data — at both the macro and micro levels — from both sets of procedures and discuss how our quantitative methods allowed us to recommend changes to current procedures.

The Exponentiated Generalized Class of Distributions

Gauss M. Cordeiro Edwin M. M. Ortega Daniel C. C. da Cunha

https://doi.org/10.6339/JDS.2013.11(1).1086

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 11, Issue 1 (2013), pp. 1–27

Abstract

Abstract: We propose a new method of adding two parameters to a contin uous distribution that extends the idea first introduced by Lehmann (1953) and studied by Nadarajah and Kotz (2006). This method leads to a new class of exponentiated generalized distributions that can be interpreted as a double construction of Lehmann alternatives. Some special models are dis cussed. We derive some mathematical properties of this class including the ordinary moments, generating function, mean deviations and order statis tics. Maximum likelihood estimation is investigated and four applications to real data are presented.

A New Family of Bivariate Copulas Generated by Univariate Distributions

Xiaohu Li Rui Fang

https://doi.org/10.6339/JDS.2012.10(1).1011

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 10, Issue 1 (2012), pp. 1–17

Abstract

Abstract: A new family of copulas generated by a univariate distribution function is introduced, relations between this copula and other well-known ones are discussed. The new copula is applied to model the dependence of two real data sets as illustrations.

Two Factor Stochastic Mortality Modeling with Generalized Hyperbolic Distribution

Seyed Saeed Ahmadi Patrice Gaillardetz

https://doi.org/10.6339/JDS.2014.12(1).1223

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 12, Issue 1 (2014), pp. 1–18

Abstract

Abstract: In this paper, we reconsider the two-factor stochastic mortality model introduced by Cairns, Blake and Dowd (2006) (CBD). The error terms in the CBD model are assumed to form a two-dimensional random walk. We first use the Doornik and Hansen (2008) multivariate normality test to show that the underlying normality assumption does not hold for the considered data set. Ainou (2011) proposed independent univariate normal inverse Gaussian L´evy processes to model the error terms in the CBD model. We generalize this idea by introducing a possible dependency between the 2-dimensional random variables, using a bivariate Generalized Hyperbolic distribution. We propose four non-Gaussian, fat-tailed distributions: Stu dent’s t, normal inverse Gaussian, hyperbolic and generalized hyperbolic distributions. Our empirical analysis shows some preferences for using the new suggested model, based on Akaike’s information criterion, the Bayesian information criterion and likelihood ratio test, as our in-sample model selec tion criteria, as well as mean absolute percentage error for our out-of-sample projection errors.

Editorial: Data Science Meets Social Sciences

Elena A. Erosheva Shahryar Minhas Gongjun Xu All authors (4)

https://doi.org/10.6339/22-JDS203EDI

Pub. online: 21 Jul 2022 Type: Editorial

Open Access

Journal: Journal of Data Science Volume 20, Issue 3 (2022): Special Issue: Data Science Meets Social Sciences, pp. 277–278

Accelerating Fixed-Point Algorithms in Statistics and Data Science: A State-of-Art Review

Bohao Tang Nicholas C. Henderson Ravi Varadhan

https://doi.org/10.6339/22-JDS1051

Pub. online: 19 Jul 2022 Type: Data Science Reviews

Open Access

Journal: Journal of Data Science Volume 21, Issue 1 (2023), pp. 1–26

Abstract

Fixed-point algorithms are popular in statistics and data science due to their simplicity, guaranteed convergence, and applicability to high-dimensional problems. Well-known examples include the expectation-maximization (EM) algorithm, majorization-minimization (MM), and gradient-based algorithms like gradient descent (GD) and proximal gradient descent. A characteristic weakness of these algorithms is their slow convergence. We discuss several state-of-art techniques for accelerating their convergence. We demonstrate and evaluate these techniques in terms of their efficiency and robustness in six distinct applications. Among the acceleration schemes, SQUAREM shows robust acceleration with a mean 18-fold speedup. DAAREM and restarted-Nesterov schemes also demonstrate consistently impressive accelerations. Thus, it is possible to accelerate the original fixed-point algorithm by using one of SQUAREM, DAAREM, or restarted-Nesterov acceleration schemes. We describe implementation details and software packages to facilitate the application of the acceleration schemes. We also discuss strategies for selecting a particular acceleration scheme for a given problem.

75 76 77 78 79

Items per page

Export citation

Copy and paste formatted citation

Formatted citation

Placeholder

Citation style

Download citation in file

Export format

Authors

Placeholder

RSS

Journal of data science

Online ISSN: 1683-8602
Print ISSN: 1680-743X

About

About journal

For contributors

Submit
OA Policy
Become a Peer-reviewer

Contact us

JDS@ruc.edu.cn
No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China