Pub. online:13 Nov 2025Type:Statistical Data ScienceOpen Access
Journal:Journal of Data Science
Volume 24, Issue 1 (2026): Special Issue: Statistical aspects of Trustworthy Machine Learning, pp. 167–186
Abstract
Neuroimaging technology has received considerable attention in recent years. One of the key problems in the imaging data analysis is the heterogeneity among individual subjects. In particular, the relationship between the imaging biomarkers and the clinical outcomes may vary across different individuals. Popular existing statistical methodologies such as the functional linear regression and high dimensional linear regression can be inadequate because the homogeneous regression relationship is assumed for all subjects. In this paper, we propose the Subject-Specific Scalar-on-Image Regression (S3IR) model to handle heterogeneous populations. Specifically, we utilize a binary subject-specific masking image to capture the heterogeneous sparsity among individuals. The proposed S3IR model incorporates the spatial structure of the imaging data and is able to achieve both local smoothness and subject-specific sparsity of the estimated regression coefficients. Furthermore, we design an EM-type adaptive algorithm to estimate the model coefficients. Simulation studies are presented to show the superior performance of our proposed method over some existing ones in handling heterogeneity. Finally, we apply the S3IR model to analyze data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). The results show that our model can effectively identify interpretable and significant disease-related regions and improve prediction performance of the cognitive scores.