Abstract: The development and application of computational data mining techniques in financial fraud detection and business failure prediction has become a popular cross-disciplinary research area in recent times involving financial economists, forensic accountants and computational modellers. Some of the computational techniques popularly used in the context of financial fraud detection and business failure prediction can also be effectively applied in the detection of fraudulent insurance claims and therefore, can be of immense practical value to the insurance industry. We provide a comparative analysis of prediction performance of a battery of data mining techniques using real-life automotive insurance fraud data. While the data we have used in our paper is US-based, the computational techniques we have tested can be adapted and generally applied to detect similar insurance frauds in other countries as well where an organized automotive insurance industry exists.
Abstract: Through a series of carefully chosen illustrations from biometry and biomedicine, this note underscores the importance of using appropriate analytical techniques to increase power in statistical modeling and testing. These examples also serve to highlight some of the important recent devel opments in applied statistics of use to practitioner
Abstract: Supervised classifying of biological samples based on genetic information, (e.g., gene expression profiles) is an important problem in biostatistics. In order to find both accurate and interpretable classification rules variable selection is indispensable. This article explores how an assessment of the individual importance of variables (effect size estimation) can be used to perform variable selection. I review recent effect size estimation approaches in the context of linear discriminant analysis (LDA) and propose a new conceptually simple effect size estimation method which is at the same time computationally efficient. I then show how to use effect sizes to perform variable selection based on the misclassification rate, which is the data independent expectation of the prediction error. Simulation studies and real data analyses illustrate that the proposed effect size estimation and variable selection methods are com petitive. Particularly, they lead to both compact and interpretable feature sets. Program files to be used with the statistical software R implementing the variable selection approaches presented in this article are available from my homepage: http://b-klaus.de.
As the COVID-19 pandemic has strongly disrupted people’s daily work and life, a great amount of scientific research has been conducted to understand the key characteristics of this new epidemic. In this manuscript, we focus on four crucial epidemic metrics with regard to the COVID-19, namely the basic reproduction number, the incubation period, the serial interval and the epidemic doubling time. We collect relevant studies based on the COVID-19 data in China and conduct a meta-analysis to obtain pooled estimates on the four metrics. From the summary results, we conclude that the COVID-19 has stronger transmissibility than SARS, implying that stringent public health strategies are necessary.
Compound distributions gained their importance from the fact that natural factors have compound effects, as in the medical, social and logical experiments. Dubey (1968) introduced the compound Weibull by compounding Weibull distribution with gamma distribution. The main aim of this paper is to define a bivariate generalized Burr (compound Weibull) distribution so that the marginals have univariate generalized Burr distributions. Several properties of this distribution such as marginals, conditional distributions and product moments have been discussed. The maximum likelihood estimates for the unknown parameters of this distribution and their approximate variance- covariance matrix have been obtained. Some simulations have been performed to see the performances of the MLEs. One data analysis has been performed for illustrative purpose.
In this paper, we introduce a new lifetime model, called the Gen- eralized Weibull-Burr XII distribution. We discuss some of its mathematical properties such as density, hazard rate functions, quantile function and mo- ments. Maximum likelihood method is used to estimate model parameters. A simulation study is performed to assess the performance of maximum like- lihood estimators by means of biases, mean squared errors. Finally, we prove that the proposed distribution is a very competitive model to other classical models by means of application on real data set.
Abstract: In the paper, we propose power weighted quantile regression(PWQR), which can reduce the effect of heterogeneous of the conditional densities of the response effectively and improve efficiency of quantile regression). In addition to PWQR, this article also proves that all the weighting of those that the actual value is less than the estimated value of PWQR and the proportion of all the weighting is very close to the corresponding quantile. At last, this article establishes the relationship between Geomagentic Indices and GIC. According to the problems of power system security operation, we make GIC risk value table. This table can have stronger practical operation ability, can provide power system security operation with important inferences.
Abstract: An individual in a finite population is represented by a random variable whose expectation is linearly composed of explanatory variables and a personal effect. This expectation locates her (his) random variable on a scale when s(he) responds to a questionnaire item or physical instrument. This formulation reinterprets design-based sampling, which represents an individual as a constant waiting to be observed. Retaining constant expecta tions , however, along with fixed realizations of random variables, preserves and strengthens design-based theory through the Horvitz-Thompson (1952) theorem. This interpretation reaffirms the usual design-based regression es timates, whose normality is seen to be free of any assumptions about the distribution of the outcome variable. It also formulates response error in a way that renders a superpopulation, postulated by model-based sampling, unnecessary. The value of distribution-free regression is illustrated with an analysis of American presidential approval.
The statistical modeling of natural disasters is an indispensable tool for extracting information for prevention and risk reduction casualties. The Poisson distribution can reveal the characteristics of 1 a natural disaster. However, this distribution is insufficient for the clustering of natural events and related casualties. The best approach is to use a Neyman type A (NTA) distribution which has the feature that two or more events occur in a short time. We obtain some properties of the NTA distribution and suggest that it could provide a suitable description to analyze the natural disaster distribution and casualties. We support this argument using disaster events, including earthquakes, floods, landslides, forest fires, avalanches, and rock falls in Turkey between 1900 and 2013. The data strongly supports that the NTA distribution represents the main tool for handling disaster data. The findings indicate that approximately three earthquakes, fifteen landslides, five floods, six rock falls, six avalanches, and twenty nine forest fires are expected in a year. The results from this model suggest that the probability of the total number of casualties is the highest for the earthquakes and the lowest for the rock falls. This study also finds that the expected number of natural disasters approximately equals to 64 per year and inter-event time between two successive earthquakes is approximately four months. The inter-event time for the natural disasters is approximately six days in Turkey.
Abstract: The Asian financial crisis that struck most of the East Asian countries in 1997 have caught the attention of many researchers in finance and economic. This is due to realization that during the crisis the countries affected saw their currencies depreciate for more than 50% and their stock markets sharply fall about 30% to 50%. In this paper, we investigate the relationship among the return of stock markets from three Southeast Asian countries (Malaysia, Singapore and Thailand) or the ASEAN countries using monthly data between 1990 and 2004. We found the three stock markets are not cointegrated. Therefore, instead of modelling the returns data using linear vector autoregressive (VAR) models, we assume the returns data are regime-dependent and we use the two regime multivariate Markov switching vector autoregressive (MS-VAR) model with regime shifts in both the mean and the variance to extract common regime shifts behaviour from the return series. It is found that MS-VAR model with two regimes manage to detect common shifts in all the stock markets return series and this show evidence of comovement among the three returns series. Furthermore, we also found that the MS-VAR model manage to capture a satisfactory timing of the 1997 financial crisis that happen in the three countries.