Journal of Data Science

Keywords: predictive modeling

Title

Author

Types

Abstract

Keywords

Published

Pages

Volumes

Issues

DOI

Affiliation

Search results 4

Order by:

Select: All None Download:

Impact of Data Perturbation for Statistical Disclosure Control on the Predictive Performance of Machine Learning Techniques

Thomas Johnson III Sayed A. Mostafa

https://doi.org/10.6339/25-JDS1186

Pub. online: 23 Apr 2025 Type: Statistical Data Science

Open Access

Journal: Journal of Data Science Volume 23, Issue 2 (2025): Special Issue: the 2024 Symposium on Data Science and Statistics (SDSS), pp. 312–331

Abstract

Bayesian Multivariate Joint Modeling of Longitudinal, Recurrent, and Competing Risk Terminal Events in Patients with Chronic Kidney Disease

Qi Qian Danh V. Nguyen Esra Kurum All authors (6)

https://doi.org/10.6339/25-JDS1182

Pub. online: 16 Apr 2025 Type: Data Science In Action

Open Access

Journal: Journal of Data Science

Abstract

Interaction Selection and Prediction Performance in High-Dimensional Data: A Comparative Study of Statistical and Tree-Based Methods

Chinedu J. Nzekwe Seongtae Kim Sayed A. Mostafa

https://doi.org/10.6339/24-JDS1127

Pub. online: 22 May 2024 Type: Statistical Data Science

Open Access

Journal: Journal of Data Science Volume 22, Issue 2 (2024): Special Issue: 2023 Symposium on Data Science and Statistics (SDSS): “Inquire, Investigate, Implement, Innovate”, pp. 259–279

Abstract

Predictive modeling often ignores interaction effects among predictors in high-dimensional data because of analytical and computational challenges. Research in interaction selection has been galvanized along with methodological and computational advances. In this study, we aim to investigate the performance of two types of predictive algorithms that can perform interaction selection. Specifically, we compare the predictive performance and interaction selection accuracy of both penalty-based and tree-based predictive algorithms. Penalty-based algorithms included in our comparative study are the regularization path algorithm under the marginality principle (RAMP), the least absolute shrinkage selector operator (LASSO), the smoothed clipped absolute deviance (SCAD), and the minimax concave penalty (MCP). The tree-based algorithms considered are random forest (RF) and iterative random forest (iRF). We evaluate the effectiveness of these algorithms under various regression and classification models with varying structures and dimensions. We assess predictive performance using the mean squared error for regression and accuracy, sensitivity, specificity, balanced accuracy, and F1 score for classification. We use interaction coverage to judge the algorithm’s efficacy for interaction selection. Our findings reveal that the effectiveness of the selected algorithms varies depending on the number of predictors (data dimension) and the structure of the data-generating model, i.e., linear or nonlinear, hierarchical or non-hierarchical. There were at least one or more scenarios that favored each of the algorithms included in this study. However, from the general pattern, we are able to recommend one or more specific algorithm(s) for some specific scenarios. Our analysis helps clarify each algorithm’s strengths and limitations, offering guidance to researchers and data analysts in choosing an appropriate algorithm for their predictive modeling task based on their data structure.

An Empirical Study of Starting Salaries and Employment Trends of Engineering Students in India

Shrihari Vasudevan Ritwik Chaudhuri Madhavan Pallan All authors (4)

https://doi.org/10.6339/JDS.201707_15(3).0010

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 15, Issue 3 (2017), pp. 553–574

Abstract

Items per page

RSS

Detailed search

Search results 4

Export citation

Copy and paste formatted citation

Download citation in file

Authors