A Simple Method for Screening Binary Models with Large Sample Size and Continuous Predictor Variables

Shih, Weichung Joe; Liu, Junfeng

doi:10.6339/JDS.2009.07(4).500

Journal of Data Science

A Simple Method for Screening Binary Models with Large Sample Size and Continuous Predictor Variables

Volume 7, Issue 4 (2009), pp. 513–536

Weichung Joe Shih Junfeng Liu

https://doi.org/10.6339/JDS.2009.07(4).500

Pub. online: 4 August 2022 Type: Research Article

Open Access

Published
4 August 2022

Abstract

Abstract: For binary regression model with observed responses (Y s), spec ified predictor vectors (Xs), assumed model parameter vector (β) and case probability function (Pr(Y = 1|X, β)), we propose a simple screening method to test goodness-of-fit when the number of observations (n) is large and Xs are continuous variables. Given any threshold τ ∈ [0, 1], we consider classi fying each subject with predictor X into Y ∗=1 or 0 (a deterministic binary variable other than the observed random binary variable Y ) according to whether the calculated case probability (Pr(Y = 1|X, β)) under hypothe sized true model ≥ or < τ . For each τ , we check the difference between the expected marginal classification error rate (false positives [Y ∗=1, Y =0] or false negatives [Y ∗=0, Y =1]) under hypothesized true model with the ob served marginal error rate which is directly observed due to this classification rule. The screening profile is created by plotting τ -specific marginal error rates (expected and observed) versus τ ∈ [0, 1]. Inconsistency indicates lack of-fit and consistence indicates good model fit. We note that, the variation of the difference between the expected marginal classification error rate and the observed one is constant (O(n −1/2 )) and free of τ . The smallest homo geneous variation at each τ potentially detects flexible model discrepancies with high power. Simulation study shows that, this profile approach named as CERC (classification-error-rate-calibration) is useful for checking wrong parameter value, incorrect predictor vector component subset and link func tion misspecification. We also provide some theoretical results as well as numerical examples to show that, ROC (receiver operating characteristics) curve is not suitable for binary model goodness-of-fit test.

No copyright data available.

Metrics

since February 2021

566

Article info
views

330

PDF
downloads

RSS

Authors

Abstract

Export citation

Copy and paste formatted citation

Download citation in file