Home
Search

Journal of Data Science

Submit your article Information

Journal home
To appear
Current issue
All issues
More
Journal home To appear Current issue All issues

Detailed search

Title

Author

Types

Abstract

Keywords

Published

Pages

Volumes

Issues

DOI

Affiliation

Search results 892

Order by:

Select: All None Download:

Specification Tests for Families of Discrete Distributions with Applications to Insurance Claims Data

Yue Fang

https://doi.org/10.6339/JDS.201801_16(1).0008

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 16, Issue 1 (2018), pp. 129–146

Abstract

Families of distributions are commonly used to model insurance claims data that require flexible distributional forms in a satisfactory manner, but the specification problem to assess the goodness-of-fit of the hypothesized model can sometimes be a challenge due to the complexity of the likelihood function of the family of distributions involved. The previous work shows that these specification problems can be attacked by means of semi-parametric tests based on generalized method of moment (GMM) estimators. While the approach can be directly applied to both discrete and continuous families of distributions, the paper focuses on developing a testing strategy within a framework of discrete families of distributions. Both the local power analysis and the approximate slope method demonstrate the excellent performance of these tests. The finite-sample performance of the tests, based on both asymptotic and bootstrap critical values, are also discussed and are compared with established methods that require the complete specification of likelihood functions.

Comparison of Estimation Methods of the Joint Density of A Circular and Linear Variable

Sahana Bhattacharjee Kishore Kumar Das

https://doi.org/10.6339/JDS.201701_15(1).0008

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 15, Issue 1 (2017), pp. 129–154

A Replicated Experiment Used in Manufacturing

Roger L. Goodwin

https://doi.org/10.6339/JDS.2009.07(1).421

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 7, Issue 1 (2009), pp. 129–138

Abstract

Abstract: Controlled experiments give researchers a statistical tool for determining the yield from subjecting an experimental unit to various treat ments. We will discuss a replicated, block design applied to the experimental unit yeast. We subjected the yeast to six treatments. The purpose of the experiment is to extract a compound to be used in the manufacturing in dustry. We considered an ANOVA and a MANOVA model to analyze the data. The rationale for selecting one model over the other will be discussed. Results and recommendations of which treatments to use when processing the yeast will be presented, also.

Estimation of a Scale Parameter of Morgenstern Type Bivariate Uniform Distribution by Ranked Set Sampling

Saeid Tahmasebi Ali Akbar Jafari

https://doi.org/10.6339/JDS.2012.10(1).1007

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 10, Issue 1 (2012), pp. 129–141

Abstract

Abstract: In this paper, we obtain several estimators of a scale parameter of Morgenstern type bivariate uniform distribution (MTBUD) based on the observations made on the units of the ranked set sampling regarding the study variable Y which is correlated with the auxiliary variable X, when (X, Y ) follows a MTBUD. Efficiency comparisons among these estimators are also made in this work. Finally, we illustrate the methods developed by using a real data set.

Predicting Loss Reserves Using Quantile Regression Running Title: Quantile Regression Loss Reserve Models

Chan J. S .K.

https://doi.org/10.6339/JDS.201501_13(1).0008

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 13, Issue 1 (2015), pp. 127–156

Abstract

Abstract: Traditional loss reserves models focus on the mean of the conditional loss distribution. If the factors driving high claims differ systematically from those driving medium to low claims, alternative models that differentiate such differences are required. We propose quantile regression model loss reserving as the model offers potentially different solutions at distinct quantiles so that the effects of risk factors are differentiated at different points of the conditional loss distribution. Due to its nonparametric nature, quantile regression is free of the model assumptions for traditional mean regression models, including homogeneous variance across risk factors and symmetric and light tails, etc. These model assumptions have posed a great barrier in applications as they are often not met in the claim data. Using two sets of run-off triangle claim data from Israel and Queensland, Australia, we present the quantile regression approach that illustrates the sensitivity of claim size to risk factors, namely the trend pattern and initial claim level, in different quantiles. Trained models are applied to predict future claims in the lower run-off triangle. Findings suggest that reliance on standard loss reserves techniques gives rise to misleading inferences and that claim size is not homogeneously driven by the same risk factors across quantiles.

Adjusting for Treatment Effect when Estimating or Testing Genetic Effect is of Main Interest

Yuanjia Wang Yixin Fang

https://doi.org/10.6339/JDS.201101_09(1).0010

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 9, Issue 1 (2011), pp. 127–138

Abstract

Abstract: It is known that “standard methods for estimating the causal effect of a time-varying treatment on the mean of a repeated measures outcome (for example, GEE regression) may be biased when there are time-dependent variables that are simultaneously confounders of the effect of interest and are predicted by previous treatment” (Hern´an et al. 2002). Inverse-probability of treatment weighted (IPTW) methods are developed in the literature of causal inference. In genetic studies, however, the main interest is to estimate or test the genetic effect rather than the treatment effect. In this work, we describe an IPTW method that provides unbiased estimate for the genetic effect, and discuss how to develop a family-based association test using IPTW for family-based studies. We apply the developed methods to systolic blood pressure data in Framingham Heart Study, where some subjects took antihypertensive treatment during the course of study.

A Bayesian Estimator of the Intracluster Correlation Coefficient from Correlated Binary Responses

Marwa Ahmed Mohamed Shoukri

https://doi.org/10.6339/JDS.2010.08(1).585

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 8, Issue 1 (2010), pp. 127–137

Abstract

Abstract: Clustered binary samples arise often in biomedical investigations. An important feature of such samples is that the binary responses within clusters tend to be correlated. The Beta-Binomial model is commonly applied to account for the intra-cluster correlation – the correlation between responses within the clusters – among dichotomous outcomes in cluster sampling. The intracluster correlation coefficient (ICC) quantifies this correlation or level of similarity. In this paper, we propose Bayesian point and interval estimators for the ICC under the Beta-Binomial model. Using Laplace’s method, the asymptotic posterior distribution of the ICC is approximated by a normal distribution. The posterior mean of this normal density is used as a central point estimator for the ICC, and 95% credible sets are calculated. A Monte Carlo simulation is used to evaluate the coverage probability and average length of the credible set of the proposed interval estimator. The simulations indicate that for the situation when the number of clusters is above 40, the underlying mean response probability falls in the range of [0.3;0.7], and the underlying ICC values are ≤ 0.4, the proposed interval estimator performs quite well and attains the correct coverage level. Even for number of clusters as small as 20, the proposed interval estimator may still be useful in the case of small ICC (≤ 0.2).

Underlying and Multiple Causes of Death in Preterm Infants

Panagiota Kitsantas

https://doi.org/10.6339/JDS.2008.06(1).392

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 6, Issue 1 (2008), pp. 125–134

Abstract

Abstract: A limited number of studies have utilized multiple causes of death to investigate infant mortality patterns. The purpose of the present study was to examine the risk distribution of underlying and multiple causes of infant death for congenital anomalies, short gestation/low birth weight (LBW), respiratory conditions, infections, sudden infant death syndrome and external causes across four gestational age groups, namely ≤ 23, 24 − 30, 31 − 36, ≥ 37, and determine the extent to which mortality from each condition is underestimated when only the underlying cause of death is used. The data were obtained from the North Carolina linked birth/infant death files (1999 to 2003) and included 4908 death records. The findings of this study indicate that infants born less than 30 weeks old are more likely (odds ratio ranging from 1.99 to 6.03) to have multiple causes recorded when the underlying cause is congenital anomalies, respiratory conditions and infec tions in comparison to infants whose gestational age is at least 37 weeks. The underlying cause of death underestimated mortality for a number of cause specific deaths including short gestation/LBW, respiratory conditions, infec tions and external causes. This was particularly evident among infants born preterm. Based on these findings, it is recommended that multiple causes, whenever available, should be studied in conjunction with the underlying cause of death data.

A State Duration Model for Brand Choice and Inter-Purchase Time

Lynn Kuo Zhen Chen

https://doi.org/10.6339/JDS.2004.02(2).138

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 2, Issue 2 (2004), pp. 125–147

Abstract

Abstract: A new approach for analyzing state duration data in brand-choice studies is explored. This approach not only incorporates the correlation among repeated purchases for a subject, it also models the purchase timing and the brand decision jointly. The former is accomplished by applying transition model approaches from longitudinal studies while the latter is done by conditioning on the brand choice variable. Then mixed multinomial logit models and Cox proportional hazards models are employed to model the marginal densities of the brand choice and the conditional densities of the interpurchase time given the brand choice. We illustrate the approach using a Nielsen household scanner panel data set

History and Potential of Binary Segmentation for Exploratory Data Analysis

James N. Morgan

https://doi.org/10.6339/JDS.2005.03(2).198

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 3, Issue 2 (2005), pp. 123–136

Abstract

Abstract: Exploratory data analysis has become more important as large rich data sets become available, with many explanatory variables representing competing theoretical constructs. The restrictive assumptions of linearity and additivity of effects as in regression are no longer necessary to save degrees of freedom. Where there is a clear criterion (dependent) variable or classification, sequential binary segmentation (tree) programs are being used. We explain why, using the current enhanced version (SEARCH) of the original Automatic Interaction Detector program as an illustration. Even the simple example uncovers an interaction that might well have been missed with the usual multivariate regression. We then suggest some promising uses and provide one simple example.

67 68 69 70 71

Items per page

Export citation

Copy and paste formatted citation

Formatted citation

Placeholder

Citation style

Download citation in file

Export format

Authors

Placeholder

RSS

Journal of data science

Online ISSN: 1683-8602
Print ISSN: 1680-743X

About

About journal

For contributors

Submit
OA Policy
Become a Peer-reviewer

Contact us

JDS@ruc.edu.cn
No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China