Abstract: Principal components analysis (PCA) is a widely used technique in nutritional epidemiology, to extract dietary patterns. To improve the interpretation of the derived patterns, it has been suggested to rotate the axes defined by PCA. This study aimed to evaluate whether rotation influences the repeatability of these patterns. For this reason PCA was applied in nutrient data of 500 participants (37 ± 15 years, 38% male) who were voluntarily enrolled in the study and asked to complete a semi-quantitative food frequency questionnaire (FFQ), twice within 15 days. The varimax and the quartimax orthogonal rotation methods, as well as the non-orthogonal promax and the oblimin methods were applied. The degree of agreement between the similar extracted patterns by each rotation method was assessed using the Bland and Altman method and Kendall’s tau-b coefficient. Good agreement was observed between the two administrations of the FFQ for the un-rotated components, while low-to-moderate agreement was observed for all rotation types (the quartimax and the oblimin method lead to more repeatable results). To conclude, when rotation is needed to improve food patterns’ interpretation, the quartimax and the oblimin methods seems to produce more robust results.
Abstract: The University of Michigan’s Consumer Sentiment Index has pre occupied politicians, journalists, and Wall Street for decades (Uchitelle, 2002). This American economic indicator is now co-published with Thomson Reuters in London. The international reach of this index cries out for an other look at George Katona’s consumer sentiment construct as a predictor of consumer demand. Regressions from the British Household Panel Sur vey (BHPS) show that consumer sentiment is ineffectual in predicting micro variation in discretionary spending between consumers, within consumers over time, or between and within consumers overall. Moreover, consumer sentiment bears no relationship whatsoever to national consumer demand over annual BHPS surveys from 1997 to 2008. In contrast, an indicator of economic anxiety accounts for all three types of variation in micro demand, as well as variation in macro demand over time.
Abstract: We explore the possibility of modeling clustered count data using the Poisson Inverse Gaussian distribution. We develop a regression model, which relates the number of mastitis cases in a sample of dairy farms in Ontario, Canada, to various farm level covariates, to illustrate the method ology. Residual plots are constructed to explore the quality of the fit. We compare the results with a negative binomial regression model using max imum likelihood estimation, and to the generalized linear mixed regression model fitted in SAS.
Abstract: For model selection in mixed effects models, Vaida and Blan chard (2005) demonstrated that the marginal Akaike information criterion is appropriate as to the questions regarding the population and the conditional Akaike information criterion is appropriate as to the questions regarding the particular clusters in the data. This article shows that the marginal Akaike information criterion is asymptotically equivalent to the leave-one-cluster-out cross-validation and the conditional Akaike information criterion is asymptotically equivalent to the leave-one-observation-out cross-validation.
Abstract: Risks for many chronic diseases (coronary heart disease, can cer, mental illness, diabetes, asthma, etc) are strongly linked both to socio economic and ethnic group and so prevalence varies considerably between areas. Variations in prevalence are important in assessing health care needs and in comparing health care provision (e.g. of surgical intervention rates) to health need. This paper focuses on estimating prevalence of coronary heart disease and uses a Bayesian approach to synthesise information of dif ferent types to make indirect prevalence estimates for geographic units where prevalence data are not otherwise available. One source is information on prevalence risk gradients from national health survey data; such data typ ically provide only regional identifiers (for confidentiality reasons) and so gradients by age, sex, ethnicity, broad region, and socio-economic status may be obtained by regression methods. Often a series of health surveys is available and one may consider pooling strength over surveys by using information on prevalence gradients from earlier surveys (e.g. via a power prior approach). The second source of information is population totals by age, sex, ethnicity, etc from censuses or intercensal population estimates, to which survey based prevalence rates are applied. The other potential data source is information on area mortality, since for heart disease and some other major chronic diseases there is a positive correlation over areas be tween prevalence of disease and mortality from that disease. A case study considers the development of estimates of coronary heart disease prevalence in 354 English areas using (a) data from the Health Surveys for England for 2003 and 1999 (b) population data from the 2001 UK Census, and (c) area mortality data for 2003.
Abstract: Mixed effects models are often used for estimating fixed effects and variance components in continuous longitudinal outcomes. An EM based estimation approach for mixed effects models when the outcomes are truncated was proposed by Hughes (1999). We consider the situation when the longitudinal outcomes are also subject to non-ignorable missing in addition to truncation. A shared random effect parameter model is presented where the missing data mechanism depends on the random effects used to model the longitudinal outcomes. Data from the Indianapolis-Ibadan dementia project is used to illustrate the proposed approach
We introduce a new family of distributions namely inverse truncated discrete Linnik G family of distributions. This family is a generalization of inverse Marshall-Olkin family of distributions, inverse family of distributions generated through truncated negative binomial distribution and inverse family of distributions generated through truncated discrete Mittag-Leffler distribution. A particular member of the family, inverse truncated negative binomial Weibull distribution is studied in detail. The shape properties of the probability density function and hazard rate, model identifiability, moments, median, mean deviation, entropy, distribution of order statistics, stochastic ordering property, mean residual life function and stress-strength properties of the new generalized inverse Weibull distribution are studied. The unknown parameters of the distribution are estimated using maximum likelihood method, product spacing method and least square method. The existence and uniqueness of the maximum likelihood estimates are proved. Simulation is carried out to illustrate the performance of maximum likelihood estimates of model parameters. An AR(1) minification model with this distribution as marginal is developed. The inverse truncated negative binomial Weibull distribution is fitted to a real data set and it is shown that the distribution is more appropriate for modeling in comparison with some other competitive models.
For statistical classification problems where the total sample size is slightly greater than the feature dimension, regularized statistical discriminant rules may reduce classification error rates. We review ten dispersion-matrix regularization approaches, four for the pooled sample covariance matrix, four for the inverse pooled sample covariance matrix, and two for a diagonal covariance matrix, for use in Anderson’s (1951) linear discriminant function (LDF). We compare these regularized classifiers against the traditional LDF for a variety of parameter configurations, and use the estimated expected error rate (EER) to assess performance. We also apply the regularized LDFs to a well-known real-data example on colon cancer. We found that no regularized classifier uniformly outperformed the others. However, we found that the more contemporary classifiers (e.g., Thomaz and Gillies, 2005; Tong et al., 2012; and Xu et al., 2009) tended to outperform the older classifiers, and that certain simple methods (e.g., Pang et al., 2009; Thomaz and Gillies, 2005; and Tong et al., 2012) performed very well, questioning the need for involved cross-validation in estimating regularization parameters. Nonetheless, an older regularized classifier proposed by Smidt and McDonald (1976) yielded consistently low misclassification rates across all scenarios, despite the shape of the true covariance matrix. Finally, our simulations showed that regularized classifiers that relied primarily on asymptotic approximations with respect to the training sample size rarely outperformed the traditional LDF, and are thus not recommended. We discuss our results as they pertain to the effect of high dimension, and offer general guidelines for choosing a regularization method for poorly-posed problems.
Investigation of household electricity usage patterns, and mat- ching the patterns to behaviours, is an important area of research given the centrality of such patterns in addressing the needs of the electricity indu- stry. Additional knowledge of household behaviours will allow more effective targeting of demand side management (DSM) techniques. This paper addresses the question as to whether a reasonable number of meaningful motifs, that each represent a regular activity within a domestic household, can be identified solely using the household level electricity meter data. Using UK data collected from several hundred households in Spring 2011 monitored at a frequency of five minutes, a process for finding repeating short patterns (motifs) is defined. Different ways of representing the motifs exist and a qualitative approach is presented that allows for choosing between the options based on the number of regular behaviours detected (neither too few nor too many).
Abstract: this paper provides a novel research on the pricing ability of the hybrid ANNs based upon the Hang Seng Index Options spanning a period of from Nov, 2005 to Oct, 2011, during which time the 2007-20008 financial crisis had developed. We study the performances of two hybrid networks integrated with Black-Scholes model and Corrado and Su model respectively. We find that hybrid neural networks trained by using the financial data retained from a booming period of a market cannot have good predicting performance for options for the period that undergoes a financial crisis (tumbling period in the market), therefore, it should be cautious for researchers/practitioners when carry out the predictions of option prices by using hybrid ANNs. Our findings have likely answered the recent puzzles about NN models regarding to their counterintuitive performance for option pricing during financial crises, and suggest that the incompetence of NN models for option pricing is likely due to the fact NN models may have been trained by using data from improper periods of market cycles (regimes), and is not necessarily due to the learning ability and the flexibility of NN models.