Abstract: We propose a new method of adding two parameters to a contin uous distribution that extends the idea first introduced by Lehmann (1953) and studied by Nadarajah and Kotz (2006). This method leads to a new class of exponentiated generalized distributions that can be interpreted as a double construction of Lehmann alternatives. Some special models are dis cussed. We derive some mathematical properties of this class including the ordinary moments, generating function, mean deviations and order statis tics. Maximum likelihood estimation is investigated and four applications to real data are presented.
Abstract: The aim of this paper is to represent the Bonus-Malus System (BMS) of Iran, which is a mandatory scheme based on Insurance act num ber 56. We examine the current Iranian BMS, using various criteria such as elasticity and time of convergence to steady state with respect to the claim frequency as well as financial balance. We also find the closed form of stationary distribution of the Iranian BMS that plays a key role in study of BMSs. Moreover, we compare the results with the German and Japan BMS. Finally we give some hints that can be used to improve the performance of the current Iranian BMS.
Abstract: Let {(Xi , Yi), i ≥ 1} be a sequence of bivariate random variables from a continuous distribution. If {Rn, n ≥ 1} is the sequence of record values in the sequence of X’s, then the Y which corresponds with the nth record will be called the concomitant of the nth-record, denoted by R[n] . In FGM family, we determine the amount of information contained in R[n] and compare it with amount of information given in Rn. Also, we show that the Kullback-Leibler distance among the concomitants of record values is distribution-free. Finally, we provide some numerical results of mutual information and Pearson correlation coefficient for measuring the amount of dependency between Rn and R[n] in the copula model of FGM family.
Abstract: Clustering is an extremely important task in a wide variety of ap plication domains especially in management and social science research. In this paper, an iterative procedure of clustering method based on multivariate outlier detection was proposed by using the famous Mahalanobis distance. At first, Mahalanobis distance should be calculated for the entire sample, then using T 2 -statistic fix a UCL. Above the UCL are treated as outliers which are grouped as outlier cluster and repeat the same procedure for the remaining inliers, until the variance-covariance matrix for the variables in the last cluster achieved singularity. At each iteration, multivariate test of mean used to check the discrimination between the outlier clusters and the inliers. Moreover, multivariate control charts also used to graphically visual izes the iterations and outlier clustering process. Finally multivariate test of means helps to firmly establish the cluster discrimination and validity. This paper employed this procedure for clustering 275 customers of a famous two wheeler in India based on 19 different attributes of the two wheeler and its company. The result of the proposed technique confirms there exist 5 and 7 outlier clusters of customers in the entire sample at 5% and 1% significance level respectively.
Abstract: For many years actuaries and demographers have been doing curve fitting of age-specific mortality data. We use the eight-parameter Heligman Pollard (HP) empirical law to fit the mortality curve. It consists of three nonlinear curves, child mortality, mid-life mortality and adult mortality. It is now well-known that the eight unknown parameters in the HP law are difficult to estimate because numerical algorithms generally do not converge when model fitting is done. We consider a novel idea to fit the three curves (nonlinear splines) separately, and then connect them smoothly at the two knots. To connect the curves smoothly, we express uncertainty about the knots because these curves do not have turning points. We have important prior information about the location of the knots, and this helps in the es timation convergence problem. Thus, the Bayesian paradigm is particularly attractive. We show the theory, method and application of our approach. We discuss estimation of the curve for English and Welsh mortality data. We also make comparisons with the recent Bayesian method.
Abstract: Shared frailty models are often used to model heterogeneity in survival analysis. The most common shared frailty model is a model in which hazard function is a product of random factor (frailty) and baseline hazard function which is common to all individuals. There are certain as sumptions about the baseline distribution and distribution of frailty. Mostly assumption of gamma distribution is considered for frailty distribution. To compare the results with gamma frailty model, we introduce three shared frailty models with generalized exponential as baseline distribution. The other three shared frailty models are inverse Gaussian shared frailty model, compound Poisson shared frailty model and compound negative binomial shared frailty model. We fit these models to a real life bivariate survival data set of McGilchrist and Aisbett (1991) related to kidney infection using Markov Chain Monte Carlo (MCMC) technique. Model comparison is made using Bayesian model selection criteria and a better model is suggested for the data.
Abstract: Watching videos online has become a popular activity for people around the world. To be able to manage revenue from online advertising an efficient Ad server that can match advertisement to targeted users is needed. In general the users’ demographics are provided to an Ad server by an inference engine which infers users’ demographics based on a profile reasoning technique. Rich media streaming through broadband networks has made significant impact on how online television users’ profiles reasoning can be implemented. Compared to traditional broadcasting services such as satellite and cable, broadcasting through broadband networks enables bidirectional communication between users and content providers. In this paper, a user profile reasoning technique based on a logistic regression model is introduced. The inference model takes into account genre preferences and viewing time from users in different age/gender groups. Historical viewing data were used to train and build the model. Different input data processing and model building strategies are discussed. Also, experimental results are provided to show how effective the proposed technique is.
Abstract: Two methods for clustering data and choosing a mixture model are proposed. First, we derive a new classification algorithm based on the classification likelihood. Then, the likelihood conditional on these clusters is written as the product of likelihoods of each cluster, and AIC- respectively BIC-type approximations are applied. The resulting criteria turn out to be the sum of the AIC or BIC relative to each cluster plus an entropy term. The performance of our methods is evaluated by Monte-Carlo methods and on a real data set, showing in particular that the iterative estimation algorithm converges quickly in general, and thus the computational load is rather low.
Abstract: Trials for comparing interventions where cluster of subjects, rather than individuals, are randomized, are commonly called cluster randomized trials (CRTs). For comparison of binary outcomes in a CRT, although there are a few published formulations for sample size computation, the most commonly used is the one developed by Donner, Birkett, and Buck (Am J Epidemiol, 1981) probably due to its incorporation in the text book by Fleiss, Levin, and Paik (Wiley, 2003). In this paper, we derive a new χ 2 approximation formula with a general continuity correction factor (c) and show that specially for the scenarios of small event rates (< 0.01), the new formulation recommends lower number of clusters than the Donner et al. formulation thereby providing better efficiency. All known formulations can be shown to be special cases at specific value of the general correction factor (e.g., Donner formulation is equivalent to the new formulation for c = 1). Statistical simulation is presented with data on comparative efficacy of the available methods identifying correction factors that are optimal for rare event rates. Table of sample size recommendation for variety of rare event rates along with code in“R” language for easy computation of sample size in other settings is also provided. Sample size calculations for a published CRT (“Pathways to Health study” that evaluates the value of intervention for smoking cessation) are computed for various correction factors to illustrate that with an optimal choice of the correction factor, the study could have maintained the same power with a 20% less sample size.