Abstract: A new family of copulas generated by a univariate distribution function is introduced, relations between this copula and other well-known ones are discussed. The new copula is applied to model the dependence of two real data sets as illustrations.
Abstract: Principal components analysis (PCA) is a widely used technique in nutritional epidemiology, to extract dietary patterns. To improve the interpretation of the derived patterns, it has been suggested to rotate the axes defined by PCA. This study aimed to evaluate whether rotation influences the repeatability of these patterns. For this reason PCA was applied in nutrient data of 500 participants (37 ± 15 years, 38% male) who were voluntarily enrolled in the study and asked to complete a semi-quantitative food frequency questionnaire (FFQ), twice within 15 days. The varimax and the quartimax orthogonal rotation methods, as well as the non-orthogonal promax and the oblimin methods were applied. The degree of agreement between the similar extracted patterns by each rotation method was assessed using the Bland and Altman method and Kendall’s tau-b coefficient. Good agreement was observed between the two administrations of the FFQ for the un-rotated components, while low-to-moderate agreement was observed for all rotation types (the quartimax and the oblimin method lead to more repeatable results). To conclude, when rotation is needed to improve food patterns’ interpretation, the quartimax and the oblimin methods seems to produce more robust results.
Abstract: Simple parametric functional forms, if appropriate, are preferred over more complicated functional forms in clinical prediction models. In this paper, we illustrate our practical approach to obtaining the appropriate functional forms for continuous variables in developing a clinical prediction model for risk of Clostridium difficile infection. First, we used a nonpara metric regression smoother to establish the reference curve. Then, we used regression spline function-restricted cubic spline (RCS) and simple para metric forms to approximate the reference curve. Based on the shape of the reference curve, the model fit information (AIC), and the formal statistical test (Vuong test), we selected the simple parametric forms to replace the more elaborated RCS functions. Finally, we refined the simple parametric forms in the multiple variable regression model using the Wald test and the likelihood-ratio test. In addition, we compared the calibration and discrim ination aspects between the model with appropriate functional forms and the model with simple linear terms. The calibration χ 2 (8.4 versus 10) and calibration plot, the area under ROC curve (0.88 vs 0.84, p < 0.05), and inte grated discrimination improvement (0.0072, p < 0.001) indicated the model with appropriate forms was better calibrated and had higher discrimination ability.
Abstract: Interval estimation for the proportion parameter in one-sample misclassified binary data has caught much interest in the literature. Re cently, an approximate Bayesian approach has been proposed. This ap proach is simpler to implement and performs better than existing frequen tist approaches. However, because a normal approximation to the marginal posterior density was used in this Bayesian approach, some efficiency may be lost. We develop a closed-form fully Bayesian algorithm which draws a posterior sample of the proportion parameter from the exact marginal posterior distribution. We conducted simulations to show that our fully Bayesian algorithm is easier to implement and has better coverage than the approximate Bayesian approach.
Abstract: Searching for data structure and decision rules using classification and regression tree (CART) methodology is now well established. An alternative procedure, search partition analysis (SPAN), is less well known. Both provide classifiers based on Boolean structures; in CART these are generated by a hierarchical series of local sub-searches and in SPAN by a global search. One issue with CART is its perceived instability, another the awkward nature of the Boolean structures generated by a hierarchical tree. Instability arises because the final tree structure is sensitive to early splits. SPAN, as a global search, seems more likely to render stable partitions. To examine these issues in the context of identifying mothers at risk of giving birth to low birth weight babies, we have taken a very large sample, divided it at random into ten non-overlapping sub-samples and performed SPAN and CART analyses on each sub-sample. The stability of the SPAN and CART models is described and, in addition, the structure of the Boolean representation of classifiers is examined. It is found that SPAN partitions have more intrinsic stability and less prone to Boolean structural irregularities.
Abstract: In this paper, freight transportation is taken into account. One of the models used for modelling “Origin-Destination” freight flows is log regression model obtained by applying a log-transformation to the tradi tional gravity model. Freight flows between ten provinces of Turkey is ana lyzed by using generalized maximum entropy estimator of the log-regression model for freight flow. The data set is gathered together from the axle load survey performed by Turkish Directorate of Highways and other so cioeconomic and demographic variables related with provinces of interest. Relations between considered socioeconomic and demographic variables and freight flows are figured out and results are discussed.
Abstract: An empirical study is employed to investigate the performance of implied GARCH models in option pricing. The implied GARCH models are established by either the Esscher transform or the extended Girsanov principle. The empirical P-martingale simulation is adopted to compute the options efficiently. The empirical results show that: (i) the implied GARCH models obtain accurate standard option prices even the innova tions are conveniently assumed to be normal distributed; (ii) the Esscher transform describes the data better than the extended Girsanov principle; (iii) significant model risk arises when using implied GARCH model with non-proper innovations in exotic option pricing.
Abstract: Change point problem has been studied extensively since 1950s due to its broad applications in many fields such as finance, biology and so on. As a special case of the multiple change point problem, the epidemic change point problem has received a lot of attention especially in medical studies. In this paper, a nonparametric method based on the empirical likelihood is proposed to detect the epidemic changes of the mean after unknown change points. Under some mild conditions, the asymptotic null distribution of the empirical likelihood ratio test statistic is proved to be the extreme distribution. The consistency of the test is also proved. Simulations indicate that the test behaves comparable to the other available tests while it enjoys less constraint on the data distribution. The method is applied to the Standford heart transplant data and detects the change points successfully.
Abstract: In this paper, we obtain several estimators of a scale parameter of Morgenstern type bivariate uniform distribution (MTBUD) based on the observations made on the units of the ranked set sampling regarding the study variable Y which is correlated with the auxiliary variable X, when (X, Y ) follows a MTBUD. Efficiency comparisons among these estimators are also made in this work. Finally, we illustrate the methods developed by using a real data set.
Abstract: Existing methods on sample size calculations for right-censored data largely assume the failure times follow exponential distribution or the Cox proportional hazards model. Methods under the additive hazards model are scarce. Motivated by a well known example of right-censored failure time data which the additive hazards model fits better than the Cox model, we proposed a method for power and sample size calculation for a two-group comparison assuming the additive hazards model. This model allows the investigator to specify a group difference in terms of a hazard difference and choose increasing, constant or decreasing baseline hazards. The power computation is based on the Wald test. Extensive simulation studies are performed to demonstrate the performance of the proposed approach. Our simulation also shows substantially decreased power if the additive hazards models is misspecified as the Cox proportional hazards model.