Home
Search

Journal of Data Science

Submit your article Information

Journal home
To appear
Current issue
All issues
More
Journal home To appear Current issue All issues

Detailed search

Title

Author

Types

Abstract

Keywords

Published

Pages

Volumes

Issues

DOI

Affiliation

Search results 892

Order by:

Select: All None Download:

More Powerful Test for Homogeneity of Means Under an Order Restriction in Time Series with Stationary Process

Abouzar Bazyari

https://doi.org/10.6339/JDS.201610_14(4).0006

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 14, Issue 4 (2016), pp. 681–700

Abstract

Abstract: Suppose that an order restriction is imposed among several means in time series. We are interested in testing the homogeneity of these unknown means under this restriction. In the present paper, a test based on the isotonic regression is done for monotonic ordered means in time series with stationary process and short range dependent sequences errors. A test statistic is proposed using the penalized likelihood ratio (PLR) approach. Since the asymptotic null distribution of test statistic is complicated, its critical values are computed by using Monte Carlo simulation method for some values of sample sizes at different significance levels. The power study of our test statistic is provided which is more powerful than that of the test proposed by Brillinger (1989). Finally, to show the application of the proposed test, it is applied to real dataset contains monthly Iran rainfall records.

A New Distribution for Extreme Values: Regression Model, Characterizations and Applications

H.M. Yousof S.M.A. Jahanshahi T.G. Ramires All authors (5)

https://doi.org/10.6339/JDS.201810_16(4).00002

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 16, Issue 4 (2018), pp. 677–706

Abstract

A new four parameter extreme value distribution is defined and studied. Various structural properties of the proposed distribution including ordinary and incomplete moments, generating functions, residual and reversed residual life functions, order statistics are investigated. Some useful characterizations based on two truncated moments as well as based on the reverse hazard function and on certain functions of the random variable are presented. The maximum likelihood method is used to estimate the model parameters. Further, we propose a new extended regression model based on the logarithm of the new distribution. The new distribution is applied to model three real data sets to prove empirically its flexibility.

Weighted Orthogonal Components Regression Analysis

Xiaogang Su Yaa Wonkye Pei Wang All authors (4)

https://doi.org/10.6339/JDS.201910_17(4).0003

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 17, Issue 4 (2019), pp. 674–695

Abstract

In the linear regression setting, we propose a general framework, termed weighted orthogonal components regression (WOCR), which encompasses many known methods as special cases, including ridge regression and principal components regression. WOCR makes use of the monotonicity inherent in orthogonal components to parameterize the weight function. The formulation allows for efficient determination of tuning parameters and hence is computationally advantageous. Moreover, WOCR offers insights for deriving new better variants. Specifically, we advocate assigning weights to components based on their correlations with the response, which may lead to enhanced predictive performance. Both simulated studies and real data examples are provided to assess and illustrate the advantages of the proposed methods.

Sequence Mutations of Genes Pertaining to Malignancy in Cancer

Nardnisa Sintupisut Chen-Hsiang Yeang

https://doi.org/10.6339/JDS.2013.11(4).1121

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 11, Issue 4 (2013), pp. 673–714

Abstract

Abstract: Cancer is a complex disease where various types of molecular aber rations drive the development and progression of malignancies. Among the diverse molecular aberrations, inherited and somatic mutations on DNA se quences are considered as major drivers for oncogenesis. The complexity of somatic alterations is revealed from large-scale investigations of cancer genomes and robust methods for interring the function of genes. In this review, we will describe sequence mutations of several cancer-related genes and discuss their functional implications in cancer. In addition, we will in troduce the on-line resources for accessing and analyzing sequence mutations in cancer. We will also provide an overview of the statistical and computa tional approaches and future prospects to conduct comprehensive analyses of the somatic alterations in cancer genomes.

Nonparametric Inference for Inverse Probability Weighted Estimators with a Randomly Truncated Sample

Xu Zhang

https://doi.org/10.6339/JDS.2012.10(4).1096

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 10, Issue 4 (2012), pp. 673–691

Abstract

Abstract: A randomly truncated sample appears when the independent variables T and L are observable if L < T. The truncated version Kaplan-Meier estimator is known to be the standard estimation method for the marginal distribution of T or L. The inverse probability weighted (IPW) estimator was suggested as an alternative and its agreement to the truncated version Kaplan-Meier estimator has been proved. This paper centers on the weak convergence of IPW estimators and variance decomposition. The paper shows that the asymptotic variance of an IPW estimator can be decom posed into two sources. The variation for the IPW estimator using known weight functions is the primary source, and the variation due to estimated weights should be included as well. Variance decomposition establishes the connection between a truncated sample and a biased sample with know prob abilities of selection. A simulation study was conducted to investigate the practical performance of the proposed variance estimators, as well as the relative magnitude of two sources of variation for various truncation rates. A blood transfusion data set is analyzed to illustrate the nonparametric inference discussed in the paper.

Graphical Jump Method for Neural Networks

Jing Chang Herbert K. H. Lee

https://doi.org/10.6339/JDS.201710_15(4).00006

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 15, Issue 4 (2017), pp. 669–690

Abstract

A graphical tool for choosing the number of nodes for a neural network is introduced. The idea is to fit the neural network with a range of numbers of nodes at first, and then generate a jump plot using a transformation of the mean square errors of the resulting residuals. A theorem is proven to show that the jump plot will select several candidate numbers of nodes among which one is the true number of nodes. Then a single node only test, which has been theoretically justified, is used to rule out erroneous candidates. The method has a sound theoretical background, yields good results on simulated datasets, and shows wide applicability to datasets from real research.

Unit-Gamma Distortion of Bivariate Copulas

Jungsywan H. Sepanski

https://doi.org/10.6339/JDS.202010_18(4).0005

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 18, Issue 4 (2020), pp. 664–681

Abstract

In this paper, we advance new families of bivariate copulas constructed by distributional distortions of existing bivariate copulas. The distortions under consideration are based on the unit gamma distribution of two forms. When the initial copula is Archimedean, the induced copula is also Archimedean under the admissible parameter space. Properties such as Kendall’s tau coefficient, tail dependence coefficients and tail orders for the new families of copulas are derived. An empirical application to economic indicator data is presented.

The Comparison of Partial Least Squares Regression, Principal Component Regression and Ridge Regression with Multiple Linear Regression for Predicting Pm10 Concentration Level Based on Meteorological Parameters

Esra Polat Suleyman Gunay

https://doi.org/10.6339/JDS.201510_13(4).0003

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 13, Issue 4 (2015), pp. 663–692

Abstract

Abstract:Air pollution shows itself as a serious problem in big cities in Turkey, especially for winter seasons. Particulate atmospheric pollution in urban areas is considered to have significant impact on human health. Therefore, the ability to make accurate predictions of particulate ambient concentrations is important to improve public awareness and air quality management. Ambient PM10 (i.e particulate diameter less than 10um in size) pollution has negative impacts on human health and it is influenced by meteorological conditions. In this study, partial least squares regression, principal component regression, ridge regression and multiple linear regression methods are compared in modeling and predicting daily mean PM10 concentrations on the base of various meteorological parameters obtained for the city of Ankara, in Turkey. The analysed period is February 2007. The results show that while multiple linear regression and ridge regression yield somewhat better results for fitting to this dataset, principal component regression and partial least squares regression are better than both of them in terms of prediction of PM10 values for future datasets. In addition, partial least squares regression is the remarkable method in terms of predictive ability as it has a close performance with principal component regression even with less number of factors.

The Kummer Beta Generalized Gamma Distribution

Gauss M. Cordeiro Rodrigo R. Pescim Clarice G.B. Demétrio All authors (4)

https://doi.org/10.6339/JDS.201410_12(4).0006

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 12, Issue 4 (2014), pp. 661–698

Abstract

Abstract: A new extension of the generalized gamma distribution with six parameter called the Kummer beta generalized gamma distribution is introduced and studied. It contains at least 28 special models such as the beta generalized gamma, beta Weibull, beta exponential, generalized gamma, Weibull and gamma distributions and thus could be a better model for analyzing positive skewed data. The new density function can be expressed as a linear combination of generalized gamma densities. Various mathematical properties of the new distribution including explicit expressions for the ordinary and incomplete moments, generating function, mean deviations, entropy, density function of the order statistics and their moments are derived. The elements of the observed information matrix are provided. We discuss the method of maximum likelihood and a Bayesian approach to fit the model parameters. The superiority of the new model is illustrated by means of three real data sets.

Generalized Linear Distributed Lag Models

Hanh Nguyen Qin Shao

https://doi.org/10.6339/JDS.201910_17(4).0002

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 17, Issue 4 (2019), pp. 660–673

Abstract

We propose distributed generalized linear models for the purpose of incorporating lagged effects. The model class provides a more accurate statistical measure of the relationship between the dependent variable and a series of covariates. The estimators from the proposed procedure are shown to be consistent. Simulation studies not only confirm the asymptotic properties of the estimators, but exhibit the adverse effects of model misspecification in terms of accuracy of model estimation and prediction. The application is illustrated by analyzing the presidential election data of 2016.

20 21 22 23 24

Items per page

Export citation

Copy and paste formatted citation

Formatted citation

Placeholder

Citation style

Download citation in file

Export format

Authors

Placeholder

RSS

Journal of data science

Online ISSN: 1683-8602
Print ISSN: 1680-743X

About

About journal

For contributors

Submit
OA Policy
Become a Peer-reviewer

Contact us

JDS@ruc.edu.cn
No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China