Abstract: Information regarding small area prevalence of chronic disease is important for public health strategy and resourcing equity. This paper develops a prevalence model taking account of survey and census data to derive small area prevalence estimates for diabetes. The application involves 32000 small area subdivisions (zip code census tracts) of the US, with the prevalence estimates taking account of information from the US-wide Behavioral Risk Factor Surveillance System (BRFSS) survey on population prevalence differentials by age, gender, ethnic group and education. The effects of such aspects of population composition on prevalence are widely recognized. However, the model also incorporates spatial or contextual influences via spatially structured effects for each US state; such contextual effects are allowed to differ between ethnic groups and other demographic categories using a multivariate spatial prior. A Bayesian estimation approach is used and analysis demonstrates the considerably improved fit of a fully specified compositional-contextual model as compared to simpler ‘standard’ approaches which are typically limited to age and area effects.
Abstract: This note underscores important considerations that should be taken into account when teaching students to check for inadequacies of a given linear, nonlinear or logistic regression models. Key illustrations are provided which underscore the shortcomings of currently used procedures. A brief overview of nonlinear regression models is given in order to lay the foundation for testing for lack of fit in nonlinear models. This paper also introduces a new ’scaled’ binary logistic regression model to highlight po tential problems with the usual logistic model, and implications for choosing a robust optimal experimental design are also underscored and discussed. Key words: Lack of fit, logistic regression, nonlinear regression, optimal de
Abstract: In this paper we analyze the weight loss behaviour of Mexican garlic under different storage conditions. Garlic is an important Mexican export product. Quality losses during storage are important to understand due to cost and sale opportunity implications. Weight losses profiles for each experimental conditions, represented as functions, are modeled by means of functional linear models and hypotheses tests are performed to compare treatments. Monte Carlo sampling version of permutation tests are used to obtain p-values. Using the functional approach clearly defined storage regimes that significantly decrease the speed of deterioration of the product relative to traditional Mexican agricultural practices.
Abstract: There has been great interest in the Southern Illinois mine war by historians. An explanation has been that this war was caused by miners who had radical political beliefs. We examine this view by applying four methods of ecological inference to estimate the proportion of coal miners who were socialist voters in this time period. Based on these results (especially considering the assumptions of the methods) we conclude that miners were politically less radical than previously thought.
This study investigates whether Support Vector Machine (SVM) can be used to predict the problem solving performance of students in the computerbased learning environment. The SVM models using RBF, linear, polynomial and sigmoid kernels were developed to estimate the probability for middle school students to get mathematics problems correct at their first attempt without using hints available in the computer-based learning environment based on their problem solving performance observed in the past. The SVM models showed better predictions than the standard Bayesian Knowledge Tracing (BKT) model, one of the most widely used prediction models in educational data mining research, in terms of Area Under the receiver operating characteristic Curve (AUC). Four SVM models got AUC values from 0.73 to 0.77, which is approximately 29% improvement, compared to the standard BKT model whose AUC was 0.58.
Abstract: In longitudinal studies where the same individuals are followed over time, bias caused by unobserved data raises a serious concern, particularly when the data are missing in a non-ignorable manner. One approach to deal with non-ignorable missing data is a pattern mixture model. In this paper, we combine the pattern mixture model with latent trajectory analysis using the SAS TRAJ procedure, which offers a practical solution to many problems of the same nature. Our model assumes a stochastic process that categorizes a relative large number of missing-data patterns into several latent groups, each of which has unique outcome trajectory, which allows patterns with missing values to share information with patterns with more data points. We estimated the longitudinal trajectories of a memory test over 12 years of follow-up, using data from the prospective epidemiological study of dementia. Missing data patterns were created conditional on survival, and final marginal response was obtained by excluding those who had died at each time point. The approach presented here is appealing since it can be easily implemented using common software.
Abstract: We introduce a new class of the slash distribution using folded normal distribution. The proposed model defined on non-negative measure ments extends the slashed half normal distribution and has higher kurtosis than the ordinary half normal distribution. We study the characterization and properties involving moments and some measures based on moments of this distribution. Finally, we illustrate the proposed model with a simulation study and a real application.