Abstract: In this study, we propose a pattern matching procedure to seize similar price movements of two stocks. First, the algorithm of searching the longest common subsequence is introduced to sieve out the time periods in which the two stocks have the same integrated volatility levels and price rise/drop trends. Next we transform the price data in the found matching time periods to the Bollinger Percent b data. The low frequency power spectra of the transformed data are used to extract trends. Pearson’s chi square test is used to assess similarity of the price movement patterns in the matching periods. Simulation results show the proposed procedure can effectively detect the co-movement periods of two price sequences. Finally, we apply the proposed procedure to empirical high frequency transaction data of NYSE.
Abstract: We have developed an enhanced spike and slab model for variable selection in linear regression models via restricted final prediction error (FPE) criteria; classic examples of which are AIC and BIC. Based on our proposed Bayesian hierarchical model, a Gibbs sampler is developed to sample models. The special structure of the prior enforces a unique mapping between sampling a model and calculating constrained ordinary least squares estimates for that model, which helps to formulate the restricted FPE criteria. Empirical comparisons are done to the lasso, adaptive lasso and relaxed lasso; followed by a real life data example.
Abstract: Despite the availability of software for interactive graphics, current survey processing systems make limited use of this modern tool. Interactive graphics offer insights, which are difficult to obtain with traditional statis tical tools. This paper shows the use of interactive graphics for analysing survey data. Using Labour Force Survey data from Pakistan, we describe how plotting data in different ways and using interactive tools enables analysts to obtain information from the dataset that would normally not be possible using standard statistical methods. It is also shown that interacative graphics can help the analyst to improve data quality by identifying erroneous cases.
Abstract: Various statistical models have been proposed to analyze fMRI data. The usual goal is to make inferences about the effects that are related to an external stimulus. The primary focus of this paper is on those statistical methods that enable one to detect ‘significantly activated’ regions of the brain due to event-related stimuli. Most of these methods share a common property, requiring estimation of the hemodynamic response function (HRF) as part of the deterministic component of the statistical model. We propose and investigate a new approach that does not require HRF fits to detect ‘activated’ voxels. We argue that the method not only avoids fitting a specific HRF, but still takes into account that the unknown response is delayed and smeared in time. This method also adapts to differential responses of the BOLD response across different brain regions and experimental sessions. The maximum cross-correlation between the kernel-smoothed stimulus sequence and shifted (lagged) values of the observed response is the proposed test statistic. Using our recommended approach we show through realistic simulations and with real data that we obtain better sensitivity than simple correlation methods using default values of SPM2. The simulation experiment incorporates different HRFs empirically determined from real data. The noise models are also different AR(3) fits and fractional Gaussians estimated from real data. We conclude that our proposed method is more powerful than simple correlation procedures, because of its robustness to variation in the HRF.
Abstract: Hyperplane fitting factor rotations perform better than conventional rotations in attaining simple structure for complex configurations. Hyperplane rotations are reviewed and then compared using familiar exam es from the literature selected to vary in complexity. Included is a new method for fitting hyperplanes, hypermax, which updates the work of Horst (1941) and Derflinger and Kaiser (1989). Hypercon, a method for confirmatory target rotation, is a natural extension. These performed very well when compared with selected hyperplane and conventional rotations. The concluding sections consider the pros and cons of each method.
Abstract: According to 2006 Programme for International Student Assess ment (PISA), sixteen Organization for Economic Cooperation and Develop ment (OECD) countries had scores that were significantly higher than the US. The top three performers were Finland, Canada, and Japan. While Finland and Japan are vastly different from the US in terms of cultures and educational systems, the US and Canada are similar to each other in many aspects, thus their performance gap was investigated. In this study data mining was employed to identify factors regarding access to and use of resources, as well as student views on science for predicting PISA science scores among Grade 10 American and Canadian students. It was found that science enjoyment and frequent use of educational software play important roles in the academic achievement of Canadian students.
Abstract: Missing data is a common problem in statistical analyses. To make use of information in data with incomplete observation, missing values can be imputed so that standard statistical methods can be used to analyze the data. Variables with missing values are often categorical and the miss ing pattern may not be monotone. Currently, commonly used imputation methods for data with a non-monotone missing pattern do not allow di rect inclusion of categorical variables. Categorical variables are converted to numerical variables before imputation. For many applications, the imputed numerical values for those categorical variables must then be converted back to categorical values. However, this conversion introduces bias which can seriously affect subsequent analyses. In this paper, we propose two direct imputation methods for categorical variables with a non-monotone missing pattern: the direct imputation approach incorporated with the expectation maximization algorithm and the direct imputation approach incorporated with a new algorithm: the imputation-maximization algorithm. Simulation studies show that both methods perform better than the method using vari able conversion. An application to real data is provided to compare the direct imputation method and the method using variable conversion.
Abstract: In this article, we consider a model of time-varying volatility which generalizes the classical Black-Scholes model to include regime-switching properties. Specifically, the unobservable state variables for stock fluctuations are modeled by a Markov process, and the drift and volatility parameters take different values depending on the state of this hidden Markov process. We provide a closed-form formula for the arbitrage-free price of the European call option, when the hidden Markov process has finite number of states. Two simulation methods, the discrete diffusion method and the Markovian tree method, for computing the European call option price are presented for comparison.
Abstract: In this paper we propose a new bivariate long-term distribution based on the Farlie-Gumbel-Morgenstern copula model. The proposed model allows for the presence of censored data and covariates in the cure parameter. For inferential purpose a Bayesian approach via Markov Chain Monte Carlo (MCMC) is considered. Further, some discussions on the model selection criteria are given. In order to examine outlying and influential observations, we develop a Bayesian case deletion influence diagnostics based on the Kullback-Leibler divergence. The newly developed procedures are illustrated on artificial and real HIV data.
Abstract: The development and application of computational data mining techniques in financial fraud detection and business failure prediction has become a popular cross-disciplinary research area in recent times involving financial economists, forensic accountants and computational modellers. Some of the computational techniques popularly used in the context of financial fraud detection and business failure prediction can also be effectively applied in the detection of fraudulent insurance claims and therefore, can be of immense practical value to the insurance industry. We provide a comparative analysis of prediction performance of a battery of data mining techniques using real-life automotive insurance fraud data. While the data we have used in our paper is US-based, the computational techniques we have tested can be adapted and generally applied to detect similar insurance frauds in other countries as well where an organized automotive insurance industry exists.