A New Variable Selection Approach Inspired by Supersaturated Designs Given a Large-Dimensional Dataset

Parpoula, Christina; Drosou, Krystallenia; Koukouvinos, Christos; Mylona, Kalliopi

doi:10.6339/JDS.2014.12(1).1183

Journal of Data Science

A New Variable Selection Approach Inspired by Supersaturated Designs Given a Large-Dimensional Dataset

Volume 12, Issue 1 (2014), pp. 35–52

Christina Parpoula Krystallenia Drosou Christos Koukouvinos All authors (4)

https://doi.org/10.6339/JDS.2014.12(1).1183

Pub. online: 4 August 2022 Type: Research Article

Open Access

Published
4 August 2022

Abstract

Abstract: The problem of variable selection is fundamental to statistical modelling in diverse fields of sciences. In this paper, we study in particular the problem of selecting important variables in regression problems in the case where observations and labels of a real-world dataset are available. At first, we examine the performance of several existing statistical methods for analyzing a real large trauma dataset which consists of 7000 observations and 70 factors, that include demographic, transport and intrahospital data. The statistical methods employed in this work are the nonconcave penalized likelihood methods (SCAD, LASSO, and Hard), the generalized linear logis tic regression, and the best subset variable selection (with AIC and BIC), used to detect possible risk factors of death. Supersaturated designs (SSDs) are a large class of factorial designs which can be used for screening out the important factors from a large set of potentially active variables. This paper presents a new variable selection approach inspired by supersaturated designs given a dataset of observations. The merits and the effectiveness of this approach for identifying important variables in observational studies are evaluated by considering several two-levels supersaturated designs, and a variety of different statistical models with respect to the combinations of factors and the number of observations. The derived results are encour aging since the alternative approach using supersaturated designs provided specific information that are logical and consistent with the medical experi ence, which may also assist as guidelines for trauma management.

No copyright data available.

Keywords

Generalized linear model penalized likelihood supersaturated design

Metrics

since February 2021

912

Article info
views

540

PDF
downloads

RSS

Authors

Abstract

Export citation

Copy and paste formatted citation

Download citation in file