Abstract: Traditional loss reserves models focus on the mean of the conditional loss distribution. If the factors driving high claims differ systematically from those driving medium to low claims, alternative models that differentiate such differences are required. We propose quantile regression model loss reserving as the model offers potentially different solutions at distinct quantiles so that the effects of risk factors are differentiated at different points of the conditional loss distribution. Due to its nonparametric nature, quantile regression is free of the model assumptions for traditional mean regression models, including homogeneous variance across risk factors and symmetric and light tails, etc. These model assumptions have posed a great barrier in applications as they are often not met in the claim data. Using two sets of run-off triangle claim data from Israel and Queensland, Australia, we present the quantile regression approach that illustrates the sensitivity of claim size to risk factors, namely the trend pattern and initial claim level, in different quantiles. Trained models are applied to predict future claims in the lower run-off triangle. Findings suggest that reliance on standard loss reserves techniques gives rise to misleading inferences and that claim size is not homogeneously driven by the same risk factors across quantiles.
Abstract: It is known that “standard methods for estimating the causal effect of a time-varying treatment on the mean of a repeated measures outcome (for example, GEE regression) may be biased when there are time-dependent variables that are simultaneously confounders of the effect of interest and are predicted by previous treatment” (Hern´an et al. 2002). Inverse-probability of treatment weighted (IPTW) methods are developed in the literature of causal inference. In genetic studies, however, the main interest is to estimate or test the genetic effect rather than the treatment effect. In this work, we describe an IPTW method that provides unbiased estimate for the genetic effect, and discuss how to develop a family-based association test using IPTW for family-based studies. We apply the developed methods to systolic blood pressure data in Framingham Heart Study, where some subjects took antihypertensive treatment during the course of study.
Abstract: Clustered binary samples arise often in biomedical investigations. An important feature of such samples is that the binary responses within clusters tend to be correlated. The Beta-Binomial model is commonly applied to account for the intra-cluster correlation – the correlation between responses within the clusters – among dichotomous outcomes in cluster sampling. The intracluster correlation coefficient (ICC) quantifies this correlation or level of similarity. In this paper, we propose Bayesian point and interval estimators for the ICC under the Beta-Binomial model. Using Laplace’s method, the asymptotic posterior distribution of the ICC is approximated by a normal distribution. The posterior mean of this normal density is used as a central point estimator for the ICC, and 95% credible sets are calculated. A Monte Carlo simulation is used to evaluate the coverage probability and average length of the credible set of the proposed interval estimator. The simulations indicate that for the situation when the number of clusters is above 40, the underlying mean response probability falls in the range of [0.3;0.7], and the underlying ICC values are ≤ 0.4, the proposed interval estimator performs quite well and attains the correct coverage level. Even for number of clusters as small as 20, the proposed interval estimator may still be useful in the case of small ICC (≤ 0.2).
Abstract: A limited number of studies have utilized multiple causes of death to investigate infant mortality patterns. The purpose of the present study was to examine the risk distribution of underlying and multiple causes of infant death for congenital anomalies, short gestation/low birth weight (LBW), respiratory conditions, infections, sudden infant death syndrome and external causes across four gestational age groups, namely ≤ 23, 24 − 30, 31 − 36, ≥ 37, and determine the extent to which mortality from each condition is underestimated when only the underlying cause of death is used. The data were obtained from the North Carolina linked birth/infant death files (1999 to 2003) and included 4908 death records. The findings of this study indicate that infants born less than 30 weeks old are more likely (odds ratio ranging from 1.99 to 6.03) to have multiple causes recorded when the underlying cause is congenital anomalies, respiratory conditions and infec tions in comparison to infants whose gestational age is at least 37 weeks. The underlying cause of death underestimated mortality for a number of cause specific deaths including short gestation/LBW, respiratory conditions, infec tions and external causes. This was particularly evident among infants born preterm. Based on these findings, it is recommended that multiple causes, whenever available, should be studied in conjunction with the underlying cause of death data.
Abstract: A new approach for analyzing state duration data in brand-choice studies is explored. This approach not only incorporates the correlation among repeated purchases for a subject, it also models the purchase timing and the brand decision jointly. The former is accomplished by applying transition model approaches from longitudinal studies while the latter is done by conditioning on the brand choice variable. Then mixed multinomial logit models and Cox proportional hazards models are employed to model the marginal densities of the brand choice and the conditional densities of the interpurchase time given the brand choice. We illustrate the approach using a Nielsen household scanner panel data set
Abstract: Exploratory data analysis has become more important as large rich data sets become available, with many explanatory variables representing competing theoretical constructs. The restrictive assumptions of linearity and additivity of effects as in regression are no longer necessary to save degrees of freedom. Where there is a clear criterion (dependent) variable or classification, sequential binary segmentation (tree) programs are being used. We explain why, using the current enhanced version (SEARCH) of the original Automatic Interaction Detector program as an illustration. Even the simple example uncovers an interaction that might well have been missed with the usual multivariate regression. We then suggest some promising uses and provide one simple example.
Abstract: Contraception is not commonly used by Omani women because of socio-cultural traditions, religious beliefs and poor knowledge but among the users modern contraceptive methods are more popular than traditional methods. Multilevel analysis is conducted to investigate associations between individual and religion level characteristics and different type of contraceptive method and to obtain a better understanding of the factors associated with contraceptive method choices used by 15-49 years women in Oman using Oman National Reproductive Health Survey data. The results confirm the importance of individual’s own characteristics have enduring effects on contraceptive method choices and it is found that for a given individual, contraceptive method choice varies across women’s age, education level and their number of living children. We have found considerable differences in the results of the estimates between single and multilevel approaches.
Abstract: The generalized Poisson regression model has been used to model dispersed count data. It is a good competitor to the negative binomial regression model when the count data is over-dispersed. Zero-inflated Poisson and zero-inflated negative binomial regression models have been proposed for the situations where the data generating process results into too many zeros. In this paper, we propose a zero-inflated generalized Poisson (ZIGP) regression model to model domestic violence data with too many zeros. Estimation of the model parameters using the method of maximum likelihood is provided. A score test is presented to test whether the number of zeros is too large for the generalized Poisson model to adequately fit the domestic violence data
Bayesian hierarchical regression (BHR) is often used in small area estimation (SAE). BHR conditions on the samples. Therefore, when data are from a complex sample survey, neither survey sampling design nor survey weights are used. This can introduce bias and/or cause large variance. Further, if non-informative priors are used, BHR often requires the combination of multiple years of data to produce sample sizes that yield adequate precision; this can result in poor timeliness and can obscure trends. To address bias and variance, we propose a design assisted model-based approach for SAE by integrating adjusted sample weights. To address timeliness, we use historical data to define informative priors (power prior); this allows estimates to be derived from a single year of data. Using American Community Survey data for validation, we applied the proposed method to Behavioral Risk Factor Surveillance System data. We estimated the prevalence of disability for all U.S. counties. We show that our method can produce estimates that are both more timely than those arising from widely-used alternatives and are closer to ACS’ direct estimates, particularly for low-data counties. Our method can be generalized to estimate the county-level prevalence of other health related measurements.