Abstract: In this paper we tried to fit a predictive model for the average annual rainfall of Bangladesh through a geostatistical approach. From geostatistical point of view, we studied the spatial dependence pattern of average annual rainfall data (measured in mm) collected from 246 stations of Bangladesh. We have employed kriging or spatial interpolation for rainfall data. The data reveals a linear trend when investigated, so by fitting a linear model we tried to remove the trend and, then we used the trend-free data for further calculations. Four theoretical semivariogram models Exponential, Spherical, Gaussian and Matern were used to explain the spatial variation among the average annual rainfall. These models are chosen according to the pattern of empirical semivariogram. The prediction performance of Ordinary kriging with these four fitted models are then compared through 𝑘 fold cross-validation and it is found that Ordinary Kriging performs better when the spatial dependency in average annual rainfall of Bangladesh is modeled through Gaussian semivariogram model.
This article presents a classification of disease severity for patients with cystic fibrosis (CF). CF is a genetic disease that dramatically decreases life expectancy and quality. The disease is characterized by polymicrobial infections which lead to lung remodeling and airway mucus plugging. In order to quantify disease severity of CF patients and compute a continuous severity index measure, quantile regression, rank scores, and corresponding normalized ranks are calculated for CF patients. Based on the rank scores calculated from the set of quantile regression models, a continuous severity index is computed for each CF patient and can be considered a robust estimate of CF disease severity.
Hierarchical Bayes models have been used in disease mapping to examine small scale geographic variation. State level geographic variation for less common causes of mortality outcomes have been reported however county level variation is rarely examined. Due to concerns about statistical reliability and confidentiality, county-level mortality rates based on fewer than 20 deaths are suppressed based on Division of Vital Statistics, National Center for Health Statistics (NCHS) statistical reliability criteria, precluding an examination of spatio-temporal variation in less common causes of mortality outcomes such as suicide rates (SRs) at the county level using direct estimates. Existing Bayesian spatio-temporal modeling strategies can be applied via Integrated Nested Laplace Approximation (INLA) in R to a large number of rare causes of mortality outcomes to enable examination of spatio-temporal variations on smaller geographic scales such as counties. This method allows examination of spatiotemporal variation across the entire U.S., even where the data are sparse. We used mortality data from 2005- 2015 to explore spatiotemporal variation in SRs, as one particular application of the Bayesian spatio-temporal modeling strategy in R-INLA to predict year and county-specific SRs. Specifically, hierarchical Bayesian spatio-temporal models were implemented with spatially structured and unstructured random effects, correlated time effects, time varying confounders and space-time interaction terms in the software RINLA, borrowing strength across both counties and years to produce smoothed county level SRs. Model-based estimates of SRs were mapped to explore geographic variation.
Abstract: Lifestyles can be used to explain existent and to anticipate future consumer behavior, both in a geographical and a temporal context. Basing market segmentations on consumer lifestyles enables the development of purposeful advertising strategies and the design of new products meeting future demands. The present paper introduces a new growing self-organizing neural network which identifies lifestyles, or rather consumer types, in survey data largely autonomously. Before applying the algorithm to real marketing data we are going to demonstrate its general performance and adaptability by means of synthetic 2D data featuring distinct heterogeneity with respect to the arrangement of the individual data points.
As a robust data analysis technique, quantile regression has attracted extensive interest. In this study, the weighted quantile regression (WQR) technique is developed based on sparsity function. We first consider the linear regression model and show that the relative efficiency of WQR compared with least squares (LS) and composite quantile regression (CQR) is greater than 70% regardless of the error distributions. To make the pro- posed method practically more useful, we consider two nontrivial extensions. The first concerns with a nonparametric model. Local WQR estimate is introduced to explore the nonlinear data structure and shown to be much more efficient compared to other estimates under various non-normal error distributions. The second extension concerns with a multivariate problem where variable selection is needed along with regulation. We couple the WQR with penalization and show that under mild conditions, the penalized WQR en- joys the oracle property. The WQR has an intuitive formulation and can be easily implemented. Simulation is conducted to examine its finite sample performance and compare against alternatives. Analysis of mammal dataset is also conducted. Numerical studies are consistent with the theoretical findings and indicate the usefulness of WQR
Abstract: Watching videos online has become a popular activity for people around the world. To be able to manage revenue from online advertising an efficient Ad server that can match advertisement to targeted users is needed. In general the users’ demographics are provided to an Ad server by an inference engine which infers users’ demographics based on a profile reasoning technique. Rich media streaming through broadband networks has made significant impact on how online television users’ profiles reasoning can be implemented. Compared to traditional broadcasting services such as satellite and cable, broadcasting through broadband networks enables bidirectional communication between users and content providers. In this paper, a user profile reasoning technique based on a logistic regression model is introduced. The inference model takes into account genre preferences and viewing time from users in different age/gender groups. Historical viewing data were used to train and build the model. Different input data processing and model building strategies are discussed. Also, experimental results are provided to show how effective the proposed technique is.
Abstract: Existing methods on sample size calculations for right-censored data largely assume the failure times follow exponential distribution or the Cox proportional hazards model. Methods under the additive hazards model are scarce. Motivated by a well known example of right-censored failure time data which the additive hazards model fits better than the Cox model, we proposed a method for power and sample size calculation for a two-group comparison assuming the additive hazards model. This model allows the investigator to specify a group difference in terms of a hazard difference and choose increasing, constant or decreasing baseline hazards. The power computation is based on the Wald test. Extensive simulation studies are performed to demonstrate the performance of the proposed approach. Our simulation also shows substantially decreased power if the additive hazards models is misspecified as the Cox proportional hazards model.
Abstract: Central composite design (CCD) is widely applied in many fields to construct a second-order response surface model with quantitative factors to help to increase the precision of the estimated model. When an experiment also includes qualitative factors, the effects between the quantitative and qualitative factors should be taken into consideration. In the present paper, D-optimal designs are investigated for models where the qualitative factors interact with, respectively, the linear effects, or the linear effects and 2-factor interactions or quadratic effects of the quantitative factors. It is shown that, at each qualitative level, the corresponding D-optimal design also consists of three portions as CCD, i.e. the cube design, the axial design and center points, but with different weights. An example about a chemical study is used to demonstrate how the D-optimal design obtained here may help to design an experiment with both quantitative and qualitative factors more efficiently.