Abstract: For binary regression model with observed responses (Y s), spec ified predictor vectors (Xs), assumed model parameter vector (β) and case probability function (Pr(Y = 1|X, β)), we propose a simple screening method to test goodness-of-fit when the number of observations (n) is large and Xs are continuous variables. Given any threshold τ ∈ [0, 1], we consider classi fying each subject with predictor X into Y ∗=1 or 0 (a deterministic binary variable other than the observed random binary variable Y ) according to whether the calculated case probability (Pr(Y = 1|X, β)) under hypothe sized true model ≥ or < τ . For each τ , we check the difference between the expected marginal classification error rate (false positives [Y ∗=1, Y =0] or false negatives [Y ∗=0, Y =1]) under hypothesized true model with the ob served marginal error rate which is directly observed due to this classification rule. The screening profile is created by plotting τ -specific marginal error rates (expected and observed) versus τ ∈ [0, 1]. Inconsistency indicates lack of-fit and consistence indicates good model fit. We note that, the variation of the difference between the expected marginal classification error rate and the observed one is constant (O(n −1/2 )) and free of τ . The smallest homo geneous variation at each τ potentially detects flexible model discrepancies with high power. Simulation study shows that, this profile approach named as CERC (classification-error-rate-calibration) is useful for checking wrong parameter value, incorrect predictor vector component subset and link func tion misspecification. We also provide some theoretical results as well as numerical examples to show that, ROC (receiver operating characteristics) curve is not suitable for binary model goodness-of-fit test.
Abstract: Early phase clinical trials may not have a known variation (σ) for the response variable. In the light of applying t-test statistics, several procedures were proposed to use the information gained from stage-I (pilot study) to adaptively re estimate the sample size for managing the overall hypothesis test. We are interested in choosing a reasonable stage-I sample size (m) towards achieving an accountable overall sample size (stage-I and later). Conditional on any specified m, this paper replaces σ by the estimated σ (from stage-I with sample size m) to use the conventional formula under normal distribution assumption to re-estimate an overall sample size. The estimated σ, re-estimated overall sample size and the collective information (stage-I and later) would be incorporated into a surrogate normal variable which undergoes hypothesis test based on standard normal distribution. We plot the actual type I&II error rates and the expected sample size against m in order to choose a good universal stage-I sample size (𝑚∗ ) to start
Abstract: Several statistical approaches have been proposed to consider circumstances under which one universal distribution is not capable of fit ting into the whole domain. This paper studies Bayesian detection of mul tiple interior epidemic/square waves in the interval domain, featured by two identical statistical distributions at both ends. We introduce a simple dimension-matching parameter proposal to implement the sampling-based posterior inference for special cases where each segmented distribution on a circle has the same set of regulating parameters. Molecular biology research reveals that, cancer progression may involve DNA copy number alteration at genome regions and connection of two biologically inactive chromosome ends results in a circle holding multiple epidemic/square waves. A slight modification of a simple novel Bayesian change point identification algo rithm, random grafting-pruning Markov chain Monte Carlo (RGPMCMC), is proposed by adjusting the original change point birth/death symmetric transition probability with a differ-by-one change point number ratio. The algorithm performance is studied through simulations with connection to DNA copy number alteration detection, which promises potential applica tion to cancer diagnosis at the genome level.