Login Register

Home
Issues
Volume 9, Issue 1 (2011)

Journal of Data Science

Submit your article Information

Journal home
To appear
Current issue
All issues
More
Journal home To appear Current issue All issues

Volume 9, Issue 1 (2011), January 2011

Order by:

Select: All None Download:

Comparing Two Dependent Groups: Dealing with Missing Values

Rand R. Wilcox

https://doi.org/10.6339/JDS.201101_09(1).0001

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 9, Issue 1 (2011), pp. 1–13

Abstract

Abstract: The paper considers the problem of comparing measures of lo cation associated with two dependent groups when values are missing at random, with an emphasis on robust measures of location. It is known that simply imputing missing values can be unsatisfactory when testing hypothe ses about means, so the goal here is to compare several alternative strategies that use all of the available data. Included are results on comparing means and a 20% trimmed mean. Yet another method is based on the usual median but differs from the other methods in a manner that is made obvious. (It is somewhat related to the formulation of the Wilcoxon-Mann-Whitney test for independent groups.) The strategies are compared in terms of Type I error probabilities and power.

Asymptotic Equivalence between Cross-Validations and Akaike Information Criteria in Mixed-Effects Models

Yixin Fang

https://doi.org/10.6339/JDS.201101_09(1).0002

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 9, Issue 1 (2011), pp. 15–21

Abstract

Abstract: For model selection in mixed effects models, Vaida and Blan chard (2005) demonstrated that the marginal Akaike information criterion is appropriate as to the questions regarding the population and the conditional Akaike information criterion is appropriate as to the questions regarding the particular clusters in the data. This article shows that the marginal Akaike information criterion is asymptotically equivalent to the leave-one-cluster-out cross-validation and the conditional Akaike information criterion is asymptotically equivalent to the leave-one-observation-out cross-validation.

Maximum Likelihood Estimation for Ascertainment Bias in Sampling Siblings

Balgobin Nandram Jai-Won Choi Hongyan Xu

https://doi.org/10.6339/JDS.201101_09(1).0003

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 9, Issue 1 (2011), pp. 23–41

Abstract

Abstract: When there is a rare disease in a population, it is inefficient to take a random sample to estimate a parameter. Instead one takes a random sample of all nuclear families with the disease by ascertaining at least one affected sibling (proband) of each family. In these studies, an estimate of the proportion of siblings with the disease will be inflated. For example, studies of the issue of whether a rare disease shows an autosomal recessive pattern of inheritance, where the Mendelian segregation ratios are of interest, have been investigated for several decades. How do we correct for this ascertainment bias? Methods, primarily based on maximum likelihood estimation, are available to correct for the ascertainment bias. We show that for ascertainment bias, although maximum likelihood estimation is optimal under asymptotic theory, it can perform badly. The problem is exasperated in the situation where the proband probabilities are allowed to vary with the number of affected siblings. We use two data sets to illustrate the difficulties of maximum likelihood estimation procedure, and we use a simulation study to assess the quality of the maximum likelihood estimators.

Latent Class Analysis for Models with Error of Measurement Using Log-Linear Models and An Application to Women’s Liberation Data

Haydar Demirhan

https://doi.org/10.6339/JDS.201101_09(1).0004

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 9, Issue 1 (2011), pp. 43–54

Abstract

Abstract: This article deals with the latent class analysis of models with

error of measurement. If the latent variable is ordinal and manifest variables

are nominal, an approach to handle the restrictions is given for latent class

analysis of the models with error of measurement using log linear models. By

this way, we include ordinal nature of the latent variable into the analysis.

Therefore, overall uncertainty is decreased, and our inferences become more

precise. The new approach is applied to a women’s liberation data set.

Estimating Transmissibility of Seasonal Influenza Virus by Surveillance Data

Shenghai Zhang

https://doi.org/10.6339/JDS.201101_09(1).0005

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 9, Issue 1 (2011), pp. 55–64

Abstract

Abstract: It is important to estimate transmissibility of influenza virus during its growing phase for understanding the propagation of the virus. The estimation procedures of the transmissibility are usually based on the data generated in flu seasons. The data-generating process of the outbreak of influenza has many features. The data is generated by not only a biological process but also control measures such as flu vaccination. The estimation is discussed by considering the aspects of the data-generating process and using the model to capture the essential characteristics of flu transmission during the growing phase of a flu season.

An Assessment of the Use of an Advanced Neural Network Model with Five Different Training Strategies for the Preparation of Landslide Susceptibility Maps

Biswajeet Pradhan

https://doi.org/10.6339/JDS.201101_09(1).0006

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 9, Issue 1 (2011), pp. 65–81

Abstract

Abstract: Data collection for landslide susceptibility modelling is often an almost inhibitive activity. This has been the reason for quite sometimes land slide was described and modelled on the basis of spatially distributed values of landslide related attributes. This paper presents landslide susceptibility analysis at Selangor area, Malaysia, using artificial neural network model with the aid of remote sensing data and geographic information system (GIS) tools. To meet the objectives, landslide locations were identified in the study area from interpretation of aerial photographs and supported with extensive field surveys. Then, the landslide inventory was grouped into two categories: (1) training data (2) testing data. Further, topographical, geological data and satellite images were collected, processed, and constructed into a spatial database using GIS tools and image processing techniques. Nine landslide occurrence attributes were selected and analyzed using an artificial neural network model to generate the landslide susceptibility maps. Landslide loca tion data (training data) were used for training the neural network and five training sites were selected randomly in this case. The use of five training sites ensemble to investigate the model reliability, including the role of the thematic variables used to construct the model, and the model sensitivity to changes in the selection of the training sites. By studying the variation of the neural network’s susceptibility estimate, the error associated with the model is determined. The results of the neural network analysis are shown on five sets of landslide susceptibility maps. Then the susceptibility maps were validated using ”receiver operating characteristics (ROC)” method as a measure for the model verification. Landslide training data which were not used during the training of the neural network was used for the verification of the maps. The results of the analysis were verified using the landslide location data and compared between five different cases. Qualitatively, the model seems to give reasonable results with accuracy observed was 87%, 83%, 85%, 86% and 82% for five different training sites respectively.

Association between Use of Internet Services and Quality of Life in Taiwan

Te-Hsin Liang

https://doi.org/10.6339/JDS.201101_09(1).0007

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 9, Issue 1 (2011), pp. 83–92

Abstract

Abstract: The study explored the association between the use of Internet services and quality of life in Taiwan. The use of broadband, wireless, and mobile Internet is found to be positively correlated with the people’s overall quality of life. The more the Internet services of e-Government are used, the higher the satisfaction with social-economic status and social competence. People using more Internet services in their daily activities also have higher self-esteem and less psychological pressures. However, people who deeply rely on Internet services for e-Business such as online shopping or ticket booking have lower satisfaction with community support.

Multilevel Logistic Regression Analysis Applied to Binary Contraceptive Prevalence Data

Md. Hasinur Rahaman Khan J. Ewart H. Shaw

https://doi.org/10.6339/JDS.201101_09(1).0008

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 9, Issue 1 (2011), pp. 93–110

Abstract

Abstract: In public health, demography and sociology, large-scale surveys often follow a hierarchical data structure as the surveys are based on multistage stratified cluster sampling. The appropriate approach to analyzing such survey data is therefore based on nested sources of variability which come from different levels of the hierarchy. When the variance of the residual errors is correlated between individual observations as a result of these nested structures, traditional logistic regression is inappropriate. We use the 2004 Bangladesh Demographic and Health Survey (BDHS) contraceptive binary data which is a multistage stratified cluster dataset. This dataset is used to exemplify all aspects of working with multilevel logistic regression models, including model conceptualization, model description, understanding of the structure of required multilevel data, estimation of the model via the statistical package MLwiN, comparison between different estimations, and investigation of the selected determinants of contraceptive use.

Test Procedures for Change Point in a General Class of Distributions

S. M. Sadooghi-Alvandi A. R. Nematollahi R. Habibi

https://doi.org/10.6339/JDS.201101_09(1).0009

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 9, Issue 1 (2011), pp. 111–126

Abstract

Abstract: This paper is concerned with the change point analysis in a general class of distributions. The quasi-Bayes and likelihood ratio test procedures are considered to test the null hypothesis of no change point. Exact and asymptotic behaviors of the two test statistics are derived. To compare the performances of two test procedures, numerical significance levels and powers of tests are tabulated for certain selected values of the parameters. Estimation of the change point based on these two test procedures are also considered. Moreover, the epidemic change point problem is studied as an alternative model for the single change point model. A real data set with epidemic change model is analyzed by two test procedures.

Adjusting for Treatment Effect when Estimating or Testing Genetic Effect is of Main Interest

Yuanjia Wang Yixin Fang

https://doi.org/10.6339/JDS.201101_09(1).0010

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 9, Issue 1 (2011), pp. 127–138

Abstract

Abstract: It is known that “standard methods for estimating the causal effect of a time-varying treatment on the mean of a repeated measures outcome (for example, GEE regression) may be biased when there are time-dependent variables that are simultaneously confounders of the effect of interest and are predicted by previous treatment” (Hern´an et al. 2002). Inverse-probability of treatment weighted (IPTW) methods are developed in the literature of causal inference. In genetic studies, however, the main interest is to estimate or test the genetic effect rather than the treatment effect. In this work, we describe an IPTW method that provides unbiased estimate for the genetic effect, and discuss how to develop a family-based association test using IPTW for family-based studies. We apply the developed methods to systolic blood pressure data in Framingham Heart Study, where some subjects took antihypertensive treatment during the course of study.

Export citation

Copy and paste formatted citation

Formatted citation

Placeholder

Citation style

Download citation in file

Export format

Authors

Placeholder

Share

RSS

Journal of data science

Online ISSN: 1683-8602
Print ISSN: 1680-743X

About

About journal
Renmin University of China homepage
Academic Journal Management
and Development Center homepage

For contributors

Submit
OA Policy
Become a Peer-reviewer

Contact us

JDS@ruc.edu.cn
Contact person: Jing Zhou
Phone: +86-10-62511318
No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China

Powered by PubliMill • Privacy policy