Home
Search

Journal of Data Science

Submit your article Information

Journal home
To appear
Current issue
All issues
More
Journal home To appear Current issue All issues

Detailed search

Title

Author

Types

Abstract

Keywords

Published

Pages

Volumes

Issues

DOI

Affiliation

Search results 892

Order by:

Select: All None Download:

A Geostatistical Approach to Predict the Average Annual Rainfall of Bangladesh

Mohammad Samsu Alam Syed Shahadat Hossain

https://doi.org/10.6339/JDS.201601_14(1).0009

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 14, Issue 1 (2016), pp. 149–166

Abstract

Abstract: In this paper we tried to fit a predictive model for the average annual rainfall of Bangladesh through a geostatistical approach. From geostatistical point of view, we studied the spatial dependence pattern of average annual rainfall data (measured in mm) collected from 246 stations of Bangladesh. We have employed kriging or spatial interpolation for rainfall data. The data reveals a linear trend when investigated, so by fitting a linear model we tried to remove the trend and, then we used the trend-free data for further calculations. Four theoretical semivariogram models Exponential, Spherical, Gaussian and Matern were used to explain the spatial variation among the average annual rainfall. These models are chosen according to the pattern of empirical semivariogram. The prediction performance of Ordinary kriging with these four fitted models are then compared through 𝑘 fold cross-validation and it is found that Ordinary Kriging performs better when the spatial dependency in average annual rainfall of Bangladesh is modeled through Gaussian semivariogram model.

Wavelet Analysis of Tide-affected Low Streamflows Series

Yeo-Howe Lim Leonard M. Lye

https://doi.org/10.6339/JDS.2004.02(2).144

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 2, Issue 2 (2004), pp. 149–163

A Classification Statistic for GEE Categorical Response Models

John M. Williamson

https://doi.org/10.6339/JDS.2003.01(2).106

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 1, Issue 2 (2003), pp. 149–165

Quantifying Disease Severity Of Cystic Fibrosis Using Quantile Regression Methods

Kameryn Denaro Barbara A. Bailey Douglas J. Conrad

https://doi.org/10.6339/JDS.202001_18(1).0008

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 18, Issue 1 (2020), pp. 148–160

Abstract

This article presents a classification of disease severity for patients with cystic fibrosis (CF). CF is a genetic disease that dramatically decreases life expectancy and quality. The disease is characterized by polymicrobial infections which lead to lung remodeling and airway mucus plugging. In order to quantify disease severity of CF patients and compute a continuous severity index measure, quantile regression, rank scores, and corresponding normalized ranks are calculated for CF patients. Based on the rank scores calculated from the set of quantile regression models, a continuous severity index is computed for each CF patient and can be considered a robust estimate of CF disease severity.

A Bayesian Spatial and Temporal Modeling Approach to Mapping Geographic Variation in Mortality Rates for Subnational Areas with R-Inla

Diba Khana Lauren M. Rossen Holly Hedegaard All authors (4)

https://doi.org/10.6339/JDS.201801_16(1).0009

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 16, Issue 1 (2018), pp. 147–182

Abstract

Hierarchical Bayes models have been used in disease mapping to examine small scale geographic variation. State level geographic variation for less common causes of mortality outcomes have been reported however county level variation is rarely examined. Due to concerns about statistical reliability and confidentiality, county-level mortality rates based on fewer than 20 deaths are suppressed based on Division of Vital Statistics, National Center for Health Statistics (NCHS) statistical reliability criteria, precluding an examination of spatio-temporal variation in less common causes of mortality outcomes such as suicide rates (SRs) at the county level using direct estimates. Existing Bayesian spatio-temporal modeling strategies can be applied via Integrated Nested Laplace Approximation (INLA) in R to a large number of rare causes of mortality outcomes to enable examination of spatio-temporal variations on smaller geographic scales such as counties. This method allows examination of spatiotemporal variation across the entire U.S., even where the data are sparse. We used mortality data from 2005- 2015 to explore spatiotemporal variation in SRs, as one particular application of the Bayesian spatio-temporal modeling strategy in R-INLA to predict year and county-specific SRs. Specifically, hierarchical Bayesian spatio-temporal models were implemented with spatially structured and unstructured random effects, correlated time effects, time varying confounders and space-time interaction terms in the software RINLA, borrowing strength across both counties and years to produce smoothed county level SRs. Model-based estimates of SRs were mapped to explore geographic variation.

A Growing Self-Organizing Neural Network for Lifestyle Segmentation

Reinhold Decker

https://doi.org/10.6339/JDS.2006.04(2).251

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 4, Issue 2 (2006), pp. 147–168

Abstract

Abstract: Lifestyles can be used to explain existent and to anticipate future consumer behavior, both in a geographical and a temporal context. Basing market segmentations on consumer lifestyles enables the development of purposeful advertising strategies and the design of new products meeting future demands. The present paper introduces a new growing self-organizing neural network which identifies lifestyles, or rather consumer types, in survey data largely autonomously. Before applying the algorithm to real marketing data we are going to demonstrate its general performance and adaptability by means of synthetic 2D data featuring distinct heterogeneity with respect to the arrangement of the individual data points.

Weighted Quantile Regression Theory And Its Application

Wei Xiong Maozai Tian

https://doi.org/10.6339/JDS.201901_17(1).0007

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 17, Issue 1 (2019), pp. 145–160

Abstract

As a robust data analysis technique, quantile regression has attracted extensive interest. In this study, the weighted quantile regression (WQR) technique is developed based on sparsity function. We first consider the linear regression model and show that the relative efficiency of WQR compared with least squares (LS) and composite quantile regression (CQR) is greater than 70% regardless of the error distributions. To make the pro- posed method practically more useful, we consider two nontrivial extensions. The first concerns with a nonparametric model. Local WQR estimate is introduced to explore the nonlinear data structure and shown to be much more efficient compared to other estimates under various non-normal error distributions. The second extension concerns with a multivariate problem where variable selection is needed along with regulation. We couple the WQR with penalization and show that under mild conditions, the penalized WQR en- joys the oracle property. The WQR has an intuitive formulation and can be easily implemented. Simulation is conducted to examine its finite sample performance and compare against alternatives. Analysis of mammal dataset is also conducted. Numerical studies are consistent with the theoretical findings and indicate the usefulness of WQR

An Inference Model for Online Media Users

Narameth Nananukul

https://doi.org/10.6339/JDS.2013.11(1).1129

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 11, Issue 1 (2013), pp. 143–155

Abstract

Abstract: Watching videos online has become a popular activity for people around the world. To be able to manage revenue from online advertising an efficient Ad server that can match advertisement to targeted users is needed. In general the users’ demographics are provided to an Ad server by an inference engine which infers users’ demographics based on a profile reasoning technique. Rich media streaming through broadband networks has made significant impact on how online television users’ profiles reasoning can be implemented. Compared to traditional broadcasting services such as satellite and cable, broadcasting through broadband networks enables bidirectional communication between users and content providers. In this paper, a user profile reasoning technique based on a logistic regression model is introduced. The inference model takes into account genre preferences and viewing time from users in different age/gender groups. Historical viewing data were used to train and build the model. Different input data processing and model building strategies are discussed. Also, experimental results are provided to show how effective the proposed technique is.

Power and Sample Size Calculations with the Additive Hazards Model

Ling Chen Chengjie Xiong J. Philip Miller All authors (4)

https://doi.org/10.6339/JDS.2012.10(1).1044

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 10, Issue 1 (2012), pp. 143–155

Abstract

Abstract: Existing methods on sample size calculations for right-censored data largely assume the failure times follow exponential distribution or the Cox proportional hazards model. Methods under the additive hazards model are scarce. Motivated by a well known example of right-censored failure time data which the additive hazards model fits better than the Cox model, we proposed a method for power and sample size calculation for a two-group comparison assuming the additive hazards model. This model allows the investigator to specify a group difference in terms of a hazard difference and choose increasing, constant or decreasing baseline hazards. The power computation is based on the Wald test. Extensive simulation studies are performed to demonstrate the performance of the proposed approach. Our simulation also shows substantially decreased power if the additive hazards models is misspecified as the Cox proportional hazards model.

D-Optimal Designs for Second-Order Response Surface Models with Qualitative Factors

Chuan-Pin Lee Mong-Na Lo Huang

https://doi.org/10.6339/JDS.201104_09(2).0001

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 9, Issue 2 (2011), pp. 139–153

Abstract

Abstract: Central composite design (CCD) is widely applied in many fields to construct a second-order response surface model with quantitative factors to help to increase the precision of the estimated model. When an experiment also includes qualitative factors, the effects between the quantitative and qualitative factors should be taken into consideration. In the present paper, D-optimal designs are investigated for models where the qualitative factors interact with, respectively, the linear effects, or the linear effects and 2-factor interactions or quadratic effects of the quantitative factors. It is shown that, at each qualitative level, the corresponding D-optimal design also consists of three portions as CCD, i.e. the cube design, the axial design and center points, but with different weights. An example about a chemical study is used to demonstrate how the D-optimal design obtained here may help to design an experiment with both quantitative and qualitative factors more efficiently.

65 66 67 68 69

Items per page

Export citation

Copy and paste formatted citation

Formatted citation

Placeholder

Citation style

Download citation in file

Export format

Authors

Placeholder

RSS

Journal of data science

Online ISSN: 1683-8602
Print ISSN: 1680-743X

About

About journal

For contributors

Submit
OA Policy
Become a Peer-reviewer

Contact us

JDS@ruc.edu.cn
No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China