Home
Search

Journal of Data Science

Submit your article Information

Journal home
To appear
Current issue
All issues
More
Journal home To appear Current issue All issues

Detailed search

Title

Author

Types

Abstract

Keywords

Published

Pages

Volumes

Issues

DOI

Affiliation

Search results 889

Order by:

Select: All None Download:

The Compound Class of Janardan–Power Series Distributions: Properties and Applications

Marzieh Shekari Hossein Zamani Mohammad Mehdi Saber

https://doi.org/10.6339/JDS.201904_17(2).0002

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 17, Issue 2 (2019), pp. 259–278

Abstract

In the present paper, we propose the new Janardan-Power Series (JPS) class of distributions, which is a result of combining the Janardan distribution of Shanker et.al (2013) with the family of power series distributions. Here, we examine the fundamental attributes of this class of distribution, including the survival, hazard and reverse hazard functions, limiting behavior of the cdf and pdf, quantile function, moments and distribution of order statistics. Moreover, the particular case of the JPS distribution such as the JanardanBinomial (JB), Janardan-Geometric (JG), Janardan-Poisson (JP) and the Janardan-Logarithmic (JL) distributions, are introduced. In addition, the JP distribution is analyzed in details. Eventually, an example of the proposed class applied on some real data set.

Automated Linking PUBMED Documents with GO Terms Using SVM

Su-Shing Chen Hyunki Kim

https://doi.org/10.6339/JDS.2007.05(2).331

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 5, Issue 2 (2007), pp. 259–267

Abstract

Abstract: We have developed an automated linking scheme for PUBMED citations with GO terms using SVM (Support Vector Machine), a classifica tion algorithm. The PUBMED database has been essential to life science re searchers with over 12 million citations. More recently GO (Gene Ontology) has provided a graph structure for biological process, cellular component, and molecular function of genomic data. By text mining the textual content of PUBMED and associating them with GO terms, we have built up an ontological map for these databases so that users can search PUBMED via GO terms and conversely GO entries via PUBMED classification. Conse quently, some interesting and unexpected knowledge may be captured from them for further data analysis and biological experimentation. This paper reports our results on SVM implementation and the need to parallelize for the training phase.

Using the Box-Cox Power Transformation to Predict Temporally Correlated Longitudinal Data

R. C. Hwang

https://doi.org/10.6339/JDS.2004.02(3).141

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 2, Issue 3 (2004), pp. 259–272

Image De-noising with a New Threshold Value Using Wavelets

B. Ismail Anjum Khan

https://doi.org/10.6339/JDS.2012.10(2).749

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 10, Issue 2 (2012), pp. 259–270

Abstract

Abstract: The image de-noising is the process to remove the noise from the image naturally corrupted by the noise. The wavelet method is one among the various methods for recovering infinite dimensional objects like curves, densities, images etc. The wavelet techniques are very effective to remove the noise because of its ability to capture the energy of a signal in few energy transform values. The wavelet methods are based on shrinking the wavelet coefficients in the wavelet domain. This paper concentrates on selecting a threshold for wavelet function estimation. A new threshold value is pro posed to shrink the wavelet coefficients obtained by wavelet decomposition of a noisy image by considering that the sub band coefficients have a gener alized Gaussian distribution. The proposed threshold value is based on the power of 2 in the size 2J × 2 J of the data that can be computed efficiently. The experiment has been conducted on various test images to compare with the established threshold parameters. The result shows that the proposed threshold value removes the noise significantly.

A Log-weighted Power Function Distribution and Its Statistical Properties

Rasha Mohamed Mandouh Mahmoud Abdel-Ghaffar Mohamed

https://doi.org/10.6339/JDS.202004_18(2).0003

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 18, Issue 2 (2020), pp. 257–278

Abstract

The Power function distribution is a flexible life time distribution that has applications in finance and economics. It is, also, used to model reliability growth of complex systems or the reliability of repairable systems. A new weighted Power function distribution is proposed using a logarithmic weight function. Statistical properties of the weighted power function distribution are obtained and studied. Location measures such as mode, median and mean, reliability measures such as reliability function, hazard and reversed hazard functions and the mean residual life are derived. Shape indices such as skewness and kurtosis coefficients and order statistics are obtained. Parametric estimation is performed to obtain estimators for the parameters of the distribution using three different estimation methods; namely: the maximum likelihood method, the L-moments method and the method of moments. Numerical simulation is carried out to validate the robustness of the proposed distribution.

An Modified PLSR Method in Prediction

Bo Cheng Xizhi Wu

https://doi.org/10.6339/JDS.2006.04(3).285

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 4, Issue 3 (2006), pp. 257–274

Abstract

Abstract: Among many statistical methods for linear models with the multicollinearity problem, partial least squares regression (PLSR) has become, in recent years, increasingly popular and, very often, the best choice. However, while dealing with the predicting problem from automobile market, we noticed that the results from PLSR appear unstable though it is still the best among some standard statistical methods. This unstable feature is likely due to the impact of the information contained in explanatory variables that is irrelevant to the response variable. Based on the algorithm of PLSR, this paper introduces a new method, modified partial least squares regression (MPLSR), to emphasize the impact of the relevant information of explanatory variables on the response variable. With the MPLSR method, satisfactory predicting results are obtained in the above practical problem. The performance of MPLSR, PLSR and some standard statistical methods are compared by a set of Monte Carlo experiments. This paper shows that the MPLSR is the most stable and accurate method, especially when the ratio of the number of observation and the number of explanatory variables is low.

Testing Statistical Significance of the Area under a Receiving Operating Characteristics Curve for Repeated Measures Design with Bootstrapping

Honghu Liu Gang Li William G. Cumberland All authors (4)

https://doi.org/10.6339/JDS.2005.03(3).206

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 3, Issue 3 (2005), pp. 257–278

Abstract

Abstract: Receiver operating characteristic (ROC) curve is an effective and widely used method for evaluating the discriminating power of a diagnostic test or statistical model. As a useful statistical method, a wealth of literature about its theories and computation methods has been established. The research on ROC curves, however, has focused mainly on cross-sectional design. Very little research on estimating ROC curves and their summary statistics, especially significance testing, has been conducted for repeated measures design. Due to the complexity of estimating the standard error of a ROC curve, there is no currently established statistical method for testing the significance of ROC curves under a repeated measures design. In this paper, we estimate the area of a ROC curve under a repeated measures design through generalized linear mixed model (GLMM) using the predicted probability of a disease or positivity of a condition and propose a bootstrap method to estimate the standard error of the area under a ROC curve for such designs. Statistical significance testing of the area under a ROC curve is then conducted using the bootstrapped standard error. The validity of bootstrap approach and the statistical testing of the area under the ROC curve was validated through simulation analyses. A special statistical software written in SAS/IML/MACRO v8 was also created for implementing the bootstrapping algorithm, conducting the calculations and statistical testing.

On the Spatio-Temporal Relationship Between MODIS AOD and PM2.5 Particulate Matter Measurements

Aaron T. Porter Jacob J. Oleson Charles O. Stanier

https://doi.org/10.6339/JDS.201404_12(2).0003

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 12, Issue 2 (2014), pp. 255–275

Abstract

Abstract: Particulate matter smaller than 2.5 microns (PM2.5) is a com monly measured parameter in ground-based sampling networks designed to assess short and long-term air quality. The measurement techniques for ground based PM2.5 are relatively accurate and precise, but monitoring lo cations are spatially too sparse for many applications. Aerosol Optical Depth (AOD) is a satellite based air quality measurement that can be computed for more spatial locations, but measures light attenuation by particulates throughout in entire air column, not just near the ground. The goal of this paper is to better characterize the spatio-temporal relationship between the two measurements. An informative relationship will aid in imputing PM2.5 values for health studies in a way that accounts for the variability in both sets of measurements, something physics based models cannot do. We use a data set of Chicago air quality measurements taken during 2007 and 2008 to construct a weekly hierarchical model. We also demonstrate that AOD measurements and a latent spatio-temporal process aggregated weekly can be used to aid in the prediction of PM2.5measurements.

Panel Regression of Arbitrarily Distributed Responses

Gordon G. Bechtel

https://doi.org/10.6339/JDS.2009.07(2).459

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 7, Issue 2 (2009), pp. 255–266

Abstract

Abstract: The primary advantage of panel over cross-sectional regression stems from its control for the effects of omitted variables or ”unobserved heterogeneity”. However, panel regression is based on the strong assump tions that measurement errors are independently identically ( i.i.d.) and normal. These assumptions are evaded by design-based regression, which dispenses with measurement errors altogether by regarding the response as a fixed real number. The present paper establishes a middle ground between these extreme interpretations of longitudinal data. The individual is now represented as a panel of responses containing dependently non-identically distributed (d.n.d) measurement errors. Modeling the expectations of these responses preserves the Neyman randomization theory, rendering panel regression slopes ap proximately unbiased and normal in the presence of arbitrarily distributed measurement error. The generality of this reinterpretation is illustrated with German Socio-Economic Panel (GSOEP) responses that are discretely distributed on a 3-point scale.

Estimating and Testing Quantile-based Process Capability Indices for Processes with Skewed Distributions

Cheng Peng

https://doi.org/10.6339/JDS.2010.08(2).582

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 8, Issue 2 (2010), pp. 253–268

Abstract

Abstract: This article extends the recent work of V¨annman and Albing (2007) regarding the new family of quantile based process capability indices (qPCI) CMA(τ, v). We develop both asymptotic parametric and nonparametric confidence limits and testing procedures of CMA(τ, v). The kernel density estimator of process was proposed to find the consistent estimator of the variance of the nonparametric consistent estimator of CMA(τ, v). Therefore, the proposed procedure is ready for practical implementation to any processes. Illustrative examples are also provided to show the steps of implementing the proposed methods directly on the real-life problems. We also present a simulation study on the sample size required for using asymptotic results.

54 55 56 57 58

Items per page

Export citation

Copy and paste formatted citation

Formatted citation

Placeholder

Citation style

Download citation in file

Export format

Authors

Placeholder

RSS

Journal of data science

Online ISSN: 1683-8602
Print ISSN: 1680-743X

About

About journal

For contributors

Submit
OA Policy
Become a Peer-reviewer

Contact us

JDS@ruc.edu.cn
No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China