Home
Search

Journal of Data Science

Submit your article Information

Journal home
To appear
Current issue
All issues
More
Journal home To appear Current issue All issues

Detailed search

Title

Author

Types

Abstract

Keywords

Published

Pages

Volumes

Issues

DOI

Affiliation

Search results 892

Order by:

Select: All None Download:

Topp-Leone Gompertz Distribution: Properties and Applications

Lawrence Chukwudumebi Nzei Joseph Thomas Eghwerido Nosakhare Ekhosuehi

https://doi.org/10.6339/JDS.202010_18(4).0012

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 18, Issue 4 (2020), pp. 782–794

Abstract

This paper proposes the Topp-Leone Gompertz distribution; an extension of the Gompertz distribution for modeling real life time data. The new model is obtained by transforming the cumulative distribution function of the Gompertz random variable, while taking the Topp-Leone as the generator. Some statistical properties of the new distribution are derived. Maximum likelihood estimates of model parameters are also derived. A Monte Carlo simulation study is carried out to examine the accuracy of the maximum likelihood estimate of the distribution parameters. Two real data sets are used to illustrate the applicability of the new distribution, and the results show that the new distribution outperforms some related lifetime distributions.

The Exponentiated Generalized Extended Pareto Distribution

Thiago A. N. De Andrade Luz M. Zea

https://doi.org/10.6339/JDS.201810_16(4).00007

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 16, Issue 4 (2018), pp. 781–800

Abstract

We define and study a three-parameter model with positive real support called the exponentiated generalized extended Pareto distribution. We provide a comprehensive mathematical treatment and prove that the formulas related to the new model are simple and manageable. We study the behaviour of the maximum likelihood estimates for the model parameters using Monte Carlo simulation. We take advantage of applied studies and offer two applications to real data sets that proves empirically the power of adjustment of the new model when compared to another twelve lifetime distributions.

The Log-Kumaraswamy Generalized Gamma Regression Model with Application to Chemical Dependency Data

Marcelino A. R. Pascoa Claudia M. M. de Paiva Gauss M. Cordeiro All authors (4)

https://doi.org/10.6339/JDS.2013.11(4).1131

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 11, Issue 4 (2013), pp. 781–818

Abstract

Abstract: The five parameter Kumaraswamy generalized gamma model (Pas coa et al., 2011) includes some important distributions as special cases and it is very useful for modeling lifetime data. We propose an extended version of this distribution by assuming that a shape parameter can take negative values. The new distribution can accommodate increasing, decreasing, bath tub and unimodal shaped hazard functions. A second advantage is that it also includes as special models reciprocal distributions such as the recipro cal gamma and reciprocal Weibull distributions. A third advantage is that it can represent the error distribution for the log-Kumaraswamy general ized gamma regression model. We provide a mathematical treatment of the new distribution including explicit expressions for moments, generating function, mean deviations and order statistics. We obtain the moments of the log-transformed distribution. The new regression model can be used more effectively in the analysis of survival data since it includes as sub models several widely-known regression models. The method of maximum likelihood and a Bayesian procedure are used for estimating the model pa rameters for censored data. Overall, the new regression model is very useful to the analysis of real data.

Forward Regression in R: From The Extreme Slow to the Extreme Fast

Michail Tsagris Manos Papadakis

https://doi.org/10.6339/JDS.201810_16(4).00006

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 16, Issue 4 (2018), pp. 771–780

Abstract

Forward regression has been criticised heavily and one of the many reasons is regarding its speed and its stopping criteria. The main focus of this paper is on demonstrating how to make it efficient, using R. Our method worksfor continuous predictor variables only, as the use of the partial correlation plays the most important role.

On the classical estimation of bivariate copula-based Seemingly unrelated tobit models through the proposed inference function for augmented margins method

Francisco Louzada Paulo H. Ferreira

https://doi.org/10.6339/JDS.201510_13(4).0008

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 13, Issue 4 (2015), pp. 771–794

Abstract

Abstract: This paper extends the analysis of the bivariate Seemingly Unrelated (SUR) Tobit by modeling its nonlinear dependence structure through copula and assuming non-normal marginal error distributions. For model estimation, the use of copula methods enables the use of the (classical) Inference Function for Margins (IFM) method by Joe and Xu (1996), which is more computationally attractive (feasible) than the full maximum likelihood approach. However, our simulation study shows that the IFM method provides a biased estimate of the copula parameter in the presence of censored observations in both margins. In order to obtain an unbiased estimate of the copula association parameter, we propose/develop a modified version of the IFM method, which we refer to as Inference Function for Augmented Margins (IFAM). Since the usual asymptotic approach, that is the computation of the asymptotic covariance matrix of the parameter estimates, is troublesome, we propose the use of resampling procedures (bootstrap methods) to obtain confidence intervals for the copula-based SUR Tobit model parameters. The satisfactory results from the simulation and empirical studies indicate the adequate performance of our proposed model and methods. We illustrate our procedure using bivariate data on consumption of salad dressings and lettuce by U.S. individuals.

A Copula-Based Supervised Learning Classification for Continuous and Discrete Data

Yuhui Chen

https://doi.org/10.6339/JDS.201610_14(4).0010

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 14, Issue 4 (2016), pp. 769–790

Abstract

Abstract: Despite the unreasonable feature independence assumption, the naive Bayes classifier provides a simple way but competes well with more sophisticated classifiers under zero-one loss function for assigning an observation to a class given the features observed. However, it has been proved that the naive Bayes works poorly in estimation and in classification for some cases when the features are correlated. To extend, researchers had developed many approaches to free of this primary but rarely satisfied assumption in the real world for the naive Bayes. In this paper, we propose a new classifier which is also free of the independence assumption by evaluating the dependence of features through pair copulas constructed via a graphical model called D-Vine tree. This tree structure helps to decompose the multivariate dependence into many bivariate dependencies and thus makes it possible to easily and efficiently evaluate the dependence of features even for data with high dimension and large sample size. We further extend the proposed method for features with discrete-valued entries. Experimental studies show that the proposed method performs well for both continuous and discrete cases.

Comparison of Estimation Methods for Unit-Gamma Distribution

Sanku Dey F. B. Menezes Josmar Mazucheli

https://doi.org/10.6339/JDS.201910_17(4).0009

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 17, Issue 4 (2019), pp. 768–801

Abstract

In this study we have considered different methods of estimation of the unknown parameters of a two-parameter unit-Gamma (UG) distribution from the frequentists point of view. First, we briefly describe different frequentists approaches: maximum likelihood estimators, moments estimators, least squares estimators, maximum product of spacings estimators, method of Cramer-von-Mises, methods of AndersonDarling and four variants of Anderson-Darling test and compare them using extensive numerical simulations. Monte Carlo simulations are performed to compare the performances of the proposed methods of estimation for both small and large samples. The performances of the estimators have been compared in terms of their bias and root mean squared error using simulated samples. Also, for each method of estimation, we consider the interval estimation using the bootstrap method and calculate the coverage probability and the average width of the bootstrap confidence intervals. The study reveals that the maximum product of spacing estimators and Anderson-Darling 2 (AD2) estimators are highly competitive with the maximum likelihood estimators in small and large samples. Finally, two real data sets have been analyzed for illustrative purposes.

Combining Paired and Two-Sample Data Using a Permutation Test

Richard L. Einsporn Desale Habtzghi

https://doi.org/10.6339/JDS.2013.11(4).1164

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 11, Issue 4 (2013), pp. 767–779

Abstract

Abstract: This paper presents a permutation test for the incomplete pairs setting. This situation arises in both observational and experimental studies when some of the data are in the form of a paired sample and the rest of the data comprise two independent samples. The proposed method uses the data from the two types of samples to test the difference between the mean responses. Our test statistic combines the observed mean difference for the complete pairs with the difference between the two means of the independent samples. The randomizations are carried out as is typically done with standard permutation tests for paired and independent samples. We show by a simulation study that our statistic performs well in comparison to other methods.

A Class of Bivariate Semiparametric Families of Distributions

Hiba Zeyada Muhammed

https://doi.org/10.6339/JDS.202010_18(4).0011

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 18, Issue 4 (2020), pp. 761–781

Abstract

The study of semiparametric families is useful because it provides methods of extending families for adding flexibility in fitting data. The main aim of this paper is to introduce a class of bivariate semiparametric families of distributions. One especial bivariate family of the introduced semiparametric families is discussed in details with its sub-models and different properties. In most of the cases the joint probability distribution, joint distribution and joint hazard functions can be expressed in compact forms. The maximum likelihood and Bayesian estimation are considered for the vector of the unknown parameters. For illustrative purposes a data set has been re-analyzed and the performances are quite satisfactory. A simulation study is performed to see the performances of the estimators.

Tree-Structured Assessment of Causal Odds Ratio with Large Observational Study Data Sets

Joseph Kang Xiaogang Su Kiang Liu

https://doi.org/10.6339/JDS.2012.10(4).1087

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 10, Issue 4 (2012), pp. 757–776

Abstract

Abstract: Observational studies of relatively large data can have potentially hidden heterogeneity with respect to causal effects and propensity scores–patterns of a putative cause being exposed to study subjects. This underlying heterogeneity can be crucial in causal inference for any observational studies because it is systematically generated and structured by covariates which influence the cause and/or its related outcomes. Addressing the causal inference problem in view of data structure, machine learning techniques such as tree analysis can be naturally necessitated. Kang, Su, Hitsman, Liu and Lloyd-Jones (2012) proposed Marginal Tree (MT) procedure to explore both the confounding and interacting effects of the covariates on causal inference. In this paper, we extend the MT method to the case of binary responses along with a clear exposition of its relationship with established causal odds ratio. We assess the causal effect of dieting on emotional distress using both a real data set from the Lalonde’s National Supported Work Demonstration Analysis (NSW) and a simulated data set from the National Longitudinal Study of Adolescent Health (Add Health).

16 17 18 19 20

Items per page

Export citation

Copy and paste formatted citation

Formatted citation

Placeholder

Citation style

Download citation in file

Export format

Authors

Placeholder

RSS

Journal of data science

Online ISSN: 1683-8602
Print ISSN: 1680-743X

About

About journal

For contributors

Submit
OA Policy
Become a Peer-reviewer

Contact us

JDS@ruc.edu.cn
No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China