Home
Search

Journal of Data Science

Submit your article Information

Journal home
To appear
Current issue
All issues
More
Journal home To appear Current issue All issues

Detailed search

Title

Author

Types

Abstract

Keywords

Published

Pages

Volumes

Issues

DOI

Affiliation

Search results 892

Order by:

Select: All None Download:

A Latent-Class Model for Clustering Incomplete Linear and Circular Data in Marine Studies

Francesco Lagona Marco Picone

https://doi.org/10.6339/JDS.201110_09(4).0007

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 9, Issue 4 (2011), pp. 585–605

Abstract

Abstract: Identification of representative regimes of wave height and direction under different wind conditions is complicated by issues that relate to the specification of the joint distribution of variables that are defined on linear and circular supports and the occurrence of missing values. We take a latent-class approach and jointly model wave and wind data by a finite mixture of conditionally independent Gamma and von Mises distributions. Maximum-likelihood estimates of parameters are obtained by exploiting a suitable EM algorithm that allows for missing data. The proposed model is validated on hourly marine data obtained from a buoy and two tide gauges in the Adriatic Sea.

Wavelet-Based Robust Estimation of Hurst Exponent with Application in Visual Impairment Classification

Chen Feng Yajun Mei Brani Vidakovic

https://doi.org/10.6339/JDS.202010_18(4).0001

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 18, Issue 4 (2020), pp. 581–605

Abstract

Pupillary response behavior (PRB) refers to changes in pupil diameter in response to simple or complex stimuli. There are underlying, unique patterns hidden within complex, high-frequency PRB data that can be utilized to classify visual impairment, but those patterns cannot be described by traditional summary statistics. For those complex high-frequency data, Hurst exponent, as a measure of long-term memory of time series, becomes a powerful tool to detect the muted or irregular change patterns. In this paper, we proposed robust estimators of Hurst exponent based on non-decimated wavelet transforms. The properties of the proposed estimators were studied both theoretically and numerically. We applied our methods to PRB data to extract the Hurst exponent and then used it as a predictor to classify individuals with different degrees of visual impairment. Compared with other standard wavelet-based methods, our methods reduce the variance of the estimators and increase the classification accuracy.

The Effect of Sample Composition on Inference for Random Effects Using Normal and Dirichlet Process Models

Guofen Yan J. Sedransk

https://doi.org/10.6339/JDS.2010.08(4).640

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 8, Issue 4 (2010), pp. 579–595

Abstract

Abstract: Good inference for the random effects in a linear mixed-effects model is important because of their role in decision making. For example, estimates of the random effects may be used to make decisions about the quality of medical providers such as hospitals, surgeons, etc. Standard methods assume that the random effects are normally distributed, but this may be problematic because inferences are sensitive to this assumption and to the composition of the study sample. We investigate whether using a Dirichlet process prior instead of a normal prior for the random effects is effective in reducing the dependence of inferences on the study sample. Specifically, we compare the two models, normal and Dirichlet process, emphasizing inferences for extrema. Our main finding is that using the Dirichlet process prior provides inferences that are substantially more robust to the composition of the study sample.

A Model for Spatially Disaggregated Trends and Forecasts of Diabetes Prevalence

Peter Congdon

https://doi.org/10.6339/JDS.2012.10(4).1076

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 10, Issue 4 (2012), pp. 579–595

Abstract

Abstract: A multilevel model (allowing for individual risk factors and geo graphic context) is developed for jointly modelling cross-sectional differences in diabetes prevalence and trends in prevalence, and then adapted to provide geographically disaggregated diabetes prevalence forecasts. This involves a weighted binomial regression applied to US data from the Behavioral Risk Factor Surveillance System (BRFSS) survey, specifically totals of diagnosed diabetes cases, and populations at risk. Both cases and populations are dis aggregated according to survey year (2000 to 2010), individual risk factors (e.g., age, education), and contextual risk factors, namely US census division and the poverty level of the county of residence. The model includes a linear growth path in decadal time units, and forecasts are obtained by extending the growth path to future years. The trend component of the model controls for interacting influences (individual and contextual) on changing prevalence. Prevalence growth is found to be highest among younger adults, among males, and among those with high school education. There are also regional shifts, with a widening of the US “diabetes belt”.

Impact of Foreign Direct Investment on Regional Innovation Capability: A Case of China

Yufen Chen

https://doi.org/10.6339/JDS.2007.05(4).305

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 5, Issue 4 (2007), pp. 577–596

Abstract

Abstract: Foreign direct investment (FDI) has been traditionally considered an important channel in the diffusion of advanced technology. Whether it can promote technology progress for the host country is a focused problem. This paper analyzes the relationship between FDI and regional innovation capability (RIC). We find that the spillover effects of FDI are not as signif icant as it is usually thought. It is found that the impact of FDI on RIC is weak; the entry of FDI has no use for enhancing indigenous innovation capability. Moreover inward FDI might have the crowding-out effect on in novation and domestic R&D activity. The research manifests that increasing domestic R&D inputs, strengthening the innovation capabilities and absorp tive capacity in domestic enterprises are determinants to improve RIC.

Modeling on Generalized Extended Inverse Weibull Software Reliability Growth Model

David D. Hanagal Nileema N. Bhalerao

https://doi.org/10.6339/JDS.201907_17(3).0007

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 17, Issue 3 (2019), pp. 575–592

Abstract

In this paper we introduce the generalized extended inverse Weibull finite failure software reliability growth model which includes both increasing/decreasing nature of the hazard function. The increasing/decreasing behavior of failure occurrence rate fault is taken into account by the hazard of the generalized extended inverse Weibull distribution. We proposed a finite failure non-homogeneous Poisson process (NHPP) software reliability growth model and obtain unknown model parameters using the maximum likelihood method for interval domain data. Illustrations have been given to estimate the parameters using standard data sets taken from actual software projects. A goodness of fit test is performed to check statistically whether the fitted model provides a good fit with the observed data. We discuss the goodness of fit test based on the Kolmogorov-Smirnov (K-S) test statistic. The proposed model is compared with some of the standard existing models through error sum of squares, mean sum of squares, predictive ratio risk and Akaikes information criteria using three different data sets. We show that the observed data fits the proposed software reliability growth model. We also show that the proposed model performs satisfactory better than the existing finite failure category models

Using Informative Prior from Meta-Analysis in Bayesian Approach

Esin AVCI

https://doi.org/10.6339/JDS.201710_15(4).00001

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 15, Issue 4 (2017), pp. 575–588

Abstract

In a Bayesian approach, uncertainty explained by a prior distribution that contains information about an uncertain parameter. Determination of the prior distribution is important in because it impacts the posterior inference. The objective of this study is to use metaanalysis for proportion to obtain prior information about patients with breast cancer stage I who undergoing modified radical mastectomy treatment and applied Bayesian approach. R and WinBUGS programs are performed for meta-analysis and Bayesian approach respectively.

On Bayesian Analysis of a General Class of Randomized Response Models In Social Surveys about Stigmatized Traits

Zawar Hussain Muhammad Abid Nasir Abbas

https://doi.org/10.6339/JDS.201410_12(4).0003

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 12, Issue 4 (2014), pp. 575–612

Abstract

Abstract: While conducting a social survey on stigmatized/sensitive traits, obtaining efficient (truthful) data is an intricate issue and estimates are generally biased in such surveys. To obtain trustworthy data and to reduce false response bias, a technique, known as randomized response technique, is now being used in many surveys. In this study, we performed a Bayesian analysis of a general class of randomized response models. Suitable simple Beta prior and mixture of Beta priors are used in a common prior structure to obtain the Bayes estimates for the proportion of a stigmatized/sensitive attributes in the population of interest. We also extended our proposal to stratified random sampling. The Bayes and the maximum likelihood estimators are compared. For further understanding of variability, we have also compared the prior and posterior distributions for different values of the design constants through graphs and credible intervals. The condition to develop a new randomized response model is also discussed.

Robust Methods in Event Studies: Empirical Evidence and Theoretical Implications

Nonna Sorokina David E. Booth John H. Thornton

https://doi.org/10.6339/JDS.2013.11(3).1166

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 11, Issue 3 (2013), pp. 575–606

Abstract

Abstract: We apply methodology robust to outliers to an existing event study of the effect of U.S. financial reform on the stock markets of the 10 largest world economies, and obtain results that differ from the original OLS results in important ways. This finding underlines the importance of han dling outliers in event studies. We further review closely the population of outliers identified using Cook’s distance and find that many of the out liers lie within the event windows. We acknowledge that those data points lead to inaccurate regression fitting; however, we cannot remove them since they carry valuable information regarding the event effect. We study further the residuals of the outliers within event windows and find that the resid uals change with application of M-estimators and MM-estimators; in most cases they became larger, meaning the main prediction equation is pulled back towards the main data population and further from the outliers and indicating more proper fitting. We support our empirical results by pseudo simulation experiments and find significant improvement in determination of both types of the event effect − abnormal returns and change in systematic risk. We conclude that robust methods are important for obtaining accurate measurement of event effects in event studies.

Analysis of Covariance Structures in Time Series

Jennifer S. K. Chan S. T. Boris Choy

https://doi.org/10.6339/JDS.2008.06(4).432

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 6, Issue 4 (2008), pp. 573–589

Abstract

Abstract: Longitudinal data often arise in clinical trials when measure ments are taken from subjects repeatedly over time so that data from each subject are serially correlated. In this paper, we seek some covariance matri ces that make the regression parameter estimates robust to misspecification of the true dependency structure between observations. Moreover, we study how this choice of robust covariance matrices is affected by factors such as the length of the time series and the strength of the serial correlation. We perform simulation studies for data consisting of relatively short (N=3), medium (N=6) and long time series (N=14) respectively. Finally, we give suggestions on the choice of robust covariance matrices under different situ ations.

25 26 27 28 29

Items per page

Export citation

Copy and paste formatted citation

Formatted citation

Placeholder

Citation style

Download citation in file

Export format

Authors

Placeholder

RSS

Journal of data science

Online ISSN: 1683-8602
Print ISSN: 1680-743X

About

About journal

For contributors

Submit
OA Policy
Become a Peer-reviewer

Contact us

JDS@ruc.edu.cn
No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China