Home
Search

Journal of Data Science

Submit your article Information

Journal home
To appear
Current issue
All issues
More
Journal home To appear Current issue All issues

Detailed search

Title

Author

Types

Abstract

Keywords

Published

Pages

Volumes

Issues

DOI

Affiliation

Search results 889

Order by:

Select: All None Download:

Estimation of Linear Regression Models with Serially Correlated Errors

Chiao-Yi Yang

https://doi.org/10.6339/JDS.2012.10(4).1106

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 10, Issue 4 (2012), pp. 723–755

Abstract

Abstract: This paper develops a generalized least squares (GLS) estimator in a linear regression model with serially correlated errors. In particular, the asymptotic optimality of the proposed estimator is established. To obtain this result, we use the modified Cholesky decomposition to estimate the inverse of the error covariance matrix based on the ordinary least squares (OLS) residuals. The resulting matrix estimator maintains positive definite ness and converges to the corresponding population matrix at a suitable rate. The outstanding finite sample performance of the proposed GLS estimator is illustrated using simulation studies and two real datasets.

Parameter Estimation and Stress-Strength Model of Power Lomax Distribution: Classical Methods and Bayesian Estimation

Ehab M. Almetwally Hisham. M. Almongy

https://doi.org/10.6339/JDS.202010_18(4).0008

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 18, Issue 4 (2020), pp. 718–738

Abstract

In this paper, parameter estimation for the power Lomax distribution is studied with different methods as maximum likelihood, maximum product spacing, ordinary least squares, weighted least squares, Cramér–von Mises and Bayesian estimation by Markov chain Monte Carlo (MCMC). Robust estimation of the stress-strength model for the Power Lomax distribution is discussed. We propose that the method of maximum product of spacing for reliable estimation of stress-strength model as an alternative method to maximum likelihood and Bayesian estimation methods. A numerical study using real data and Monte Carlo Simulation is performed to compare between different methods.

Local Influence in Constrained General Linear Models

Hadi Emami Mostafa Emami

https://doi.org/10.6339/JDS.201410_12(4).0008

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 12, Issue 4 (2014), pp. 717–726

Abstract

Abstract: Constrained general linear models (CGLMs) have wide applications in practice. Similar to other data analysis, the identification of influential obser vations that may be potential outliers is an important step beyond in CGLMs. We develop local influence approach for detecting influential observations in CGLMs. The procedure makes use of the normal curvature and the direction achieving the maximum curvature to assess the local influences of minor perturbation of CGLMs. An illustrative example with a real data set is also reported.

Trent L. Lalonde Anh Q. Nguyen Jianqiong Yin All authors (5)

https://doi.org/10.6339/JDS.2013.11(4).1195

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 11, Issue 4 (2013), pp. 715–738

Abstract

Abstract: We group approaches to modeling correlated binary data accord ing to data recorded cross-sectionally as opposed to data recorded longi tudinally; according to models that are population-averaged as opposed to subject-specific; and according to data with time-dependent covariates as opposed to time-independent covariates. Standard logistic regression mod els are appropriate for cross-sectional data. However, for longitudinal data, methods such as generalized estimating equations (GEE) and generalized method of moments (GMM) are commonly used to fit population-averaged models, while random-effects models such as generalized linear mixed mod els (GLMM) are used to fit subject-specific models. Some of these methods account for time-dependence in covariates while others do not. This paper addressed these approaches with an illustration using a Medicare dataset as it relates to rehospitalization. In particular, we compared results from standard logistic models, GEE models, GMM models, and random-effects models by analyzing a binary outcome for four successive hospitalizations. We found that these procedures address differently the correlation among responses and the feedback from response to covariate. We found marginal GMM logistic regression models to be more appropriate when covariates are classified as time-dependent in comparison to GEE models. We also found conditional random-intercept models with time-dependent covariates decom posed into components to be more appropriate when time-dependent covari ates are present in comparison to ordinary random-effects models. We used the SAS procedures GLIMMIX, NLMIXED, IML, GENMOD, and LOGIS TIC to analyze the illustrative dataset, as well as unique programs written using the R language.

Equivalent Models in Association Rules Analysis

Pannapa Changpetch Dennis K. J. Lin

https://doi.org/10.6339/JDS.201610_14(4).0008

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 14, Issue 4 (2016), pp. 713–738

Abstract

Abstract: A powerful methodology for exploring relationships among items, association rules analysis can be used to capture a set of rules from any given dataset. Little is known, however, that a single dataset can be represented by more than one set of rules, i.e., by equivalent models. In fact, most studies on the goodness of model can be misleading because they assume the model is unique. These are phenomenon that the literature has yet to explore. In our study, we demonstrate that equivalent models exist for any dataset and propose a method for converting any given model into its dominant model, recommended as the benchmark model. Further, we explain how the phenomenon of equivalent models affects decision tree analysis and statistical model selection. It is shown that the decision rules from decision tree analysis can always be simplified by reducing the decision rules to the dominant model. The simulated and real datasets are used for illustration.

Comparing the Exponentiated and Generalized Modified Weibull Distributions

Saad J. Almalki Saralees Nadarajah

https://doi.org/10.6339/JDS.201510_13(4).0005

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 13, Issue 4 (2015), pp. 713–732

Abstract

Abstract: In recent years, many modifications of the Weibull distribution have been proposed. Some of these modifications have a large number of parameters and so their real benefits over simpler modifications are questionable. Here, we use two data sets with modified unimodal (unimodal followed by increasing) hazard function for comparing the exponentiated Weibull and generalized modified Weibull distributions. We find no evidence that the generalized modified Weibull distribution can provide a better fit than the exponentiated Weibull distribution for data sets exhibiting the modified unimodal hazard function.In a related issue, we consider Carrasco et al. (2008), a widely cited paper, proposing the generalized modified Weibull distribution, and illustrating two real data applications. We point out that some of the results in both real data applications in Carrasco et al. (2008) 1 are incorrect.

Exponentiated Weibull-Geometric Distribution and Its Application to Count Data

Felix Famoye

https://doi.org/10.6339/JDS.201910_17(4).0005

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 17, Issue 4 (2019), pp. 712–725

Abstract

An exponentiated Weibull-geometric distribution is defined and studied. A new count data regression model, based on the exponentiated Weibull-geometric distribution, is also defined. The regression model can be applied to fit an underdispersed or an over-dispersed count data. The exponentiated Weibull-geometric regression model is fitted to two numerical data sets. The new model provided a better fit than the fit from its competitors.

The Chi-plot and Its Asymptotic Confidence Interval for Analyzing Bivariate Dependence: An Application to the Average Intelligence and Atheism Rates across Nations Data

Vitor A. A. Marchi Francisco A. R. Rojas Francisco Louzada

https://doi.org/10.6339/JDS.2012.10(4).1094

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 10, Issue 4 (2012), pp. 711–722

Abstract

Abstract: Bivariate data analysis plays a key role in several areas where the variables of interest are obtained in a paired form, leading to the con sideration of possible association measures between them. In most cases, it is common to use known statistics measures such as Pearson correlation, Kendall’s and Spearman’s coefficients. However, these statistics measures may not represent the real correlation or structure of dependence between the variables. Fisher and Switzer (1985) proposed a rank-based graphical tool, the so called chi-plot, which, in conjunction with its Monte Carlo based confidence interval can help detect the presence of association in a random sample from a continuous bivariate distribution. In this article we construct the asymptotic confidence interval for the chi-plot. Via a Monte Carlo simulation study we discovery the coverage probabilities of the asymptotic and the Monte Carlo based confidence intervals are similar. A immediate advantage of the asymptotic confidence interval over the Monte Carlo based one is that it is computationally less expensive providing choices of any confidence level. Moreover, it can be implemented straightforwardly in the existing statistical softwares. The chi-plot approach is illustrated in on the average intelligence and atheism rates across nations data.

Forecasting Foreign Tourist Arrivals to India Using Time Series Models

Shalini Chandra Kriti Kumari

https://doi.org/10.6339/JDS.201810_16(4).00003

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 16, Issue 4 (2018), pp. 702–722

Abstract

This study aims to compare various quantitative models to forecast monthly foreign tourist arrivals (FTAs) to India. The models which are considered here include vector error correction (VEC) model, Naive I and Naive II models, seasonal autoregressive integrated moving average (SARIMA) model and Grey models. A model based on combination of single forecast values using simple average (SA) method has also been applied. The forecasting performance of these models have been compared under mean absolute percentage error (MAPE) and U-statistic (Ustat) criteria. Empirical findings suggest that the combination model gives better forecast of FTAs to India relative to other individual time series models considered here.

A Brief Note on the Simulation of Survival Data with A Desired Percentage of Right-Censored Datas

Edson Zangiacomi Martinez Jorge Alberto Achcar Marcos Vinicius de Oliveira Peres All authors (4)

https://doi.org/10.6339/JDS.201610_14(4).0007

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 14, Issue 4 (2016), pp. 701–712

Abstract

Abstract: Simulation studies are important statistical tools used to inves-tigate the performance, properties and adequacy of statistical models. The simulation of right censored time-to-event data involves the generation of two independent survival distributions, where the rst distribution repre-sents the uncensored survival times and the second distribution represents the censoring mechanism. In this brief report we discuss how we can make it so that the percentage of censored data is previously de ned. The described method was used to generate data from a Weibull distribution, but it can be adapted to any other lifetime distribution. We further presented an R code function for generating random samples, considering the proposed approach.

18 19 20 21 22

Items per page

Export citation

Copy and paste formatted citation

Formatted citation

Placeholder

Citation style

Download citation in file

Export format

Authors

Placeholder

RSS

Journal of data science

Online ISSN: 1683-8602
Print ISSN: 1680-743X

About

About journal

For contributors

Submit
OA Policy
Become a Peer-reviewer

Contact us

JDS@ruc.edu.cn
No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China