Abstract: PSA measurements are used to assess the risk for prostate cancer. PSA range and PSA kinetics such as PSA velocity have been correlated with in creased cancer detection and assist the clinician in deciding when prostate biopsy should be performed. Our aim is to evaluate the use of a novel, maxi mum likelihood estimation - prostate specific antigen (MLE-PSA) model for predicting the probability of prostate cancer using serial PSA measurements combined with PSA velocity in order to assess whether this reduces the need for prostate biopsy. A total of 1976 Caucasian patients were included. All these patients had at least 6 PSA serial measurements; all underwent trans-rectal biopsy with minimum 12 cores within the past 10 years. A multivariate logistic re gression model was developed using maximum likelihood estimation (MLE) based on the following parameters (age, at least 6 PSA serial measurements, baseline median natural logarithm of the PSA (ln(PSA)) and PSA velocity (ln(PSAV)), baseline process capability standard deviation of ln(PSA) and ln(PSAV), significant special causes of variation in ln(PSA) and ln(PSAV) detected using control chart logic, and the volatility of the ln(PSAV). We then compared prostate cancer probability using MLE-PSA to the results of prostate needle biopsy. The MLE-PSA model with a 50% cut-off probability has a sensitivity of 87%, specificity of 85%, positive predictive value (PPV) of 89%, and negative predictive value (NPV) of 82%. By contrast, a single PSA value with a 4ng/ml threshold has a sensitivity of 59%, specificity of 33%, PPV of 56%, and NPV of 36% using the same population of patients used to generate the MLE-PSA model. Based on serial PSA measurements, the use of the MLE-PSA model significantly (p-value < 0.0001) improves prostate cancer detection and reduces the need for prostate biopsy.
Abstract: A new set of methods are developed to perform cluster analysis of functions, motivated by a data set consisting of hydraulic gradients at several locations distributed across a wetland complex. The methods build on previous work on clustering of functions, such as Tarpey and Kinateder (2003) and Hitchcock et al. (2007), but explore functions generated from an additive model decomposition (Wood, 2006) of the original time series. Our decomposition targets two aspects of the series, using an adaptive smoother for the trend and circular spline for the diurnal variation in the series. Different measures for comparing locations are discussed, including a method for efficiently clustering time series that are of different lengths using a functional data approach. The complicated nature of these wetlands are highlighted by the shifting group memberships depending on which scale of variation and year of the study are considered.
Abstract: This paper describes and compares three clustering techniques: traditional clustering methods, Kohonen maps and latent class models. The paper also proposes some novel measures of the quality of a clustering. To the best of our knowledge, this is the first contribution in the literature to compare these three techniques in a context where the classes are not known in advance.
Subsampling the data is used in this paper as a learning method about the influence of the data points for drawing inference on the parameters of a fitted logistic regression model. The alternative, alternative regularized, alternative regularized lasso, and alternative regularized ridge estimators are proposed for the parameter estimation of logistic regression models and are then compared with the maximum likelihood estimators. The proposed alternative regularized estimators are obtained by using a tuning parameter but the proposed alternative estimators are not regularized. The proposed alternative regularized lasso estimators are the averaged standard lasso estimators and the alternative regularized ridge estimators are also the averaged standard ridge estimators over subsets of groups where the number of subsets could be smaller than the number of parameters. The values of the tuning parameters are obtained to make the alternative regularized estimators very close to the maximum likelihood estimators and the process is explained with two real data as well as a simulated study. The alternative and alternative regularized estimators always have the closed form expressions in terms of observations that the maximum likelihood estimators do not have. When the maximum likelihood estimators do not have the closed form expressions, the alternative regularized estimators thus obtained provide the approximate closed form expressions for them.
Abstract: Information regarding small area prevalence of chronic disease is important for public health strategy and resourcing equity. This paper develops a prevalence model taking account of survey and census data to derive small area prevalence estimates for diabetes. The application involves 32000 small area subdivisions (zip code census tracts) of the US, with the prevalence estimates taking account of information from the US-wide Behavioral Risk Factor Surveillance System (BRFSS) survey on population prevalence differentials by age, gender, ethnic group and education. The effects of such aspects of population composition on prevalence are widely recognized. However, the model also incorporates spatial or contextual influences via spatially structured effects for each US state; such contextual effects are allowed to differ between ethnic groups and other demographic categories using a multivariate spatial prior. A Bayesian estimation approach is used and analysis demonstrates the considerably improved fit of a fully specified compositional-contextual model as compared to simpler ‘standard’ approaches which are typically limited to age and area effects.