Journal of Data Science logo


Login Register

  1. Home
  2. To appear
  3. Explainable Machine Learning for Functio ...

Journal of Data Science

Submit your article Information
  • Article info
  • Related articles
  • More
    Article info Related articles

Explainable Machine Learning for Functional Data
Katherine Goode   J. Derek Tucker   Daniel Ries     All authors (4)

Authors

 
Placeholder
https://doi.org/10.6339/25-JDS1212
Pub. online: 16 December 2025      Type: Statistical Data Science      Open accessOpen Access

Received
27 March 2025
Accepted
8 December 2025
Published
16 December 2025

Abstract

Black-box machine learning models are recognized as useful tools for prediction applications, but the algorithmic complexity of some models causes interpretation challenges. Explainability methods have been proposed to provide insight into these models, but there is little research focused on supervised modeling with functional data inputs. We argue that, especially in applications of high consequence, it is important to explicitly model the functional dependence in a black-box analysis to not obscure or misrepresent patterns in explanations. As such, we propose the Variable importance Explainable Elastic Shape Analysis (VEESA) pipeline for training supervised machine learning models with functional inputs. The pipeline is an analysis process that includes the data preprocessing, modeling, and post-hoc explanations. The preprocessing is done using elastic functional principal components analysis, which accounts for vertical and horizontal variability in functional data and, ultimately, allows for explanations in the original data space that identify the important functional variability without bias due to correlated variables. Here, we demonstrate the pipeline on two high-consequence applications: explosives classification for national security and inkjet printer identification in forensic science. The applications exhibit the VEESA pipeline’s ability to provide an understanding of the characteristics of the functional data useful for prediction. Code for implementing the pipeline is available in the veesa R package (and supplemental python code).

Supplementary material

 Supplementary Material
The supplementary materials include additional analyses on the shifted peaks, H-CT, and inkjet printer data, implementation details, R and Python code, and the shifted peaks and inkjet datasets. Due to proprietary reasons, the H-CT dataset is not able to be shared.

References

 
Adadi A, Berrada M (2018). Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE Access, 6: 52138–52160. https://doi.org/10.1109/ACCESS.2018.2870052
 
Bertsekas DP (1996). Dynamic programming and optimal control. Journal of the Operational Research Society, 47(6): 833–834. https://doi.org/10.2307/3010291
 
Breiman L (2001). Random forests. Machine Learning, 45(1): 5–32. https://doi.org/10.1023/A:1010933404324
 
Breiman L, Friedman J, Olshen RA, Stone CJ (1984). Classification and Regression Trees. Chapman and Hall/CRC.
 
Butts-Wilmsmeyer CJ, Rapp S, Guthrie B (2020). The technological advancements that enabled the age of big data in the environmental sciences: A history and future directions. Current Opinion in Environmental Science and Health, 18. 63–69. Environmental Chemistry: Innovative Approaches and Instrumentation in Environmental Chemistry.
 
Buzzini P, Curran J, Polston C (2021). Comparison between visual assessments and different variants of linear discriminant analysis to the classification of Raman patterns of inkjet printer inks. Forensic Chemistry, 24:100336. https://doi.org/10.1016/j.forc.2021.100336
 
Chen C, Li O, Tao C, Barnett AJ, Su J, Rudin C (2019). This looks like that: Deep learning for interpretable image recognition. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA.
 
Danne T, Nimri R, Battelino T, Bergenstal RM, Close KL, DeVries JH, et al. (2017). International consensus on use of continuous glucose monitoring. Diabetes Care, 40(12): 1631–1640. https://doi.org/10.2337/dc17-1600
 
Febrero-Bande M, Galeano P, González-Manteiga W (2017). Functional principal component regression and functional partial least-squares regression: An overview and a comparative study. International Statistical Review, 85(1): 61–83. https://doi.org/10.1111/insr.12116
 
Fisher A, Rudin C, Dominici F (2019). All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. Journal of Machine Learning Research, 20(177): 1–81.
 
Friedman JH (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5). https://doi.org/10.1214/aos/1013203451
 
Gallegos IO, Dalton GM, Stohn AM, Koundinyan SP, Thompson KR, Jimenez ES (2019). High-fidelity calibration and characterization of a spectral computed tomography system. In: Hard X-Ray, Gamma-Ray, and Neutron Detector Physics XXI, volume 11114, 223–236. SPIE.
 
Gallegos IO, Koundinyan S, Suknot AN, Jimenez ES, Thompson KR, Goodner RN (2018). Unsupervised learning methods to perform material identification tasks on spectral computed tomography data. In: Radiation Detectors in Medicine, Industry, and National Security XIX, volume 10763, 91–104. SPIE.
 
Goldstein A, Kapelner A, Bleich J, Pitkin E (2015). Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation. Journal of Computational and Graphical Statistics, 24(1): 44–65. https://doi.org/10.1080/10618600.2014.907095
 
Goode K, Ries D, McClernon K (2024). Characterizing climate pathways using feature importance on echo state networks. Statistical Analysis and Data Mining: An ASA Data Science Journal. 17(4):e11706.
 
Goode K, Ries D, Zollweg J (2020). Explaining neural network predictions for functional data using principal component analysis and feature importance. In: Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI) Fall Symposium Series 2020: Artificial Intelligence in Government and Public Sector, Geib C, Petrick R (Eds.).
 
Goode K, Tucker JD (2025). veesa: VEESA Pipeline for Explainable Machine Learning with Functional Data. R package version 0.1.7.
 
Ha W, Singh C, Lanusse F, Upadhyayula S, Yu B (2021). Adaptive wavelet distillation from neural networks through interpretations. In: Proceedings of the 35th International Conference on Neural Information Processing Systems, NIPS ‘21, Ranzato M et al. (Eds.), Curran Associates Inc., Red, Hook, NY, USA.
 
Hooker G, Mentch L, Zhou S (2021). Unrestricted permutation forces extrapolation: Variable importance requires at least one more model, or there is no free variable importance. Statistics and Computing, 31: 1–16. https://doi.org/10.1007/s11222-021-10057-z
 
Jimenez ES, Thompson KR, Stohn A, Goodner RN (2017). Leveraging multi-channel x-ray detector technology to improve quality metrics for industrial and security applications. In: Radiation Detectors in Medicine, Industry, and National Security XVIII, Grim F, Furenlid L, Barber H B (Eds.), volume 10393, 137–147. SPIE.
 
Joshi SH, Klassen E, Srivastava A, Jermyn I (2007). A novel representation for Riemannian analysis of elastic curves in Rn. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, Lucey S, Chen T (Eds.), 1–7.
 
Lee S (2017). Integrative analysis of variation structure in high-dimensional multi-block data, Ph.D. thesis, University of Pittsburgh, Pittsburgh, PA. Supervised by Sungkyu Jung.
 
Lee S, Jung S (2017). Combined analysis of amplitude and phase variations in functional data. arXiv preprint: https://arxiv.org/abs/1603.01775.
 
Li H, Xiao G, Xia T, Tang YY, Li L (2014). Hyperspectral image classification using functional data analysis. IEEE Transactions on Cybernetics, 44(9): 1544–1555. https://doi.org/10.1109/TCYB.2013.2289331
 
Liaw A, Wiener M (2002). Classification and regression by randomforest. R News, 2(3): 18–22.
 
Martin-Barragan B, Lillo R, Romo J (2014). Interpretable support vector machines for functional data. European Journal of Operational Research, 232(1): 146–155. https://doi.org/10.1016/j.ejor.2012.08.017
 
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12: 2825–2830.
 
Ramsay J, Silverman B (2005). Functional Data Analysis. Number 0172–7397 in Springer Series in Statistics. Springer - Verlag New York, Verlag, New York, 2 edition.
 
Ries D, Gabriel Huerta J (2023). Predicting fatigue from heart rate signatures using functional logistic regression. Stat, 12(1): e595. https://doi.org/10.1002/sta4.595
 
Rudin C, Chen C, Chen Z, Huang H, Semenova L, Zhong C (2022). Interpretable machine learning: Fundamental principles and 10 grand challenges. Statistics Surveys, 16(none): 1–85. https://doi.org/10.1214/21-SS133
 
Sankaran K (2024). Data science principles for interpretable and explainable AI. Journal of Data Science, 1–27. https://doi.org/10.6339/24-JDS1150
 
Srivastava A, Klassen E, Joshi SH, Jermyn IH (2011). Shape analysis of elastic curves in Euclidean spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(7): 1415–1428. https://doi.org/10.1109/TPAMI.2010.184
 
Srivastava A, Klassen EP (2016). Functional Shape and Data Analysis. Springer Nature, New York.
 
Strobl C, Boulesteix AL, Kneib T, Augustin T, Zeileis A (2008). Conditional variable importance for random forests. BMC Bioinformatics, 9(1): 307. https://doi.org/10.1186/1471-2105-9-307
 
Thind B, Multani K, Cao J (2023). Deep learning with functional inputs. Journal of Computational and Graphical Statistics, 32(1): 171–180. https://doi.org/10.1080/10618600.2022.2097914
 
Tian TS (2010). Functional data analysis in brain imaging studies. Frontiers in Psychology, 1:35.
 
Tucker JD (2025a). fdasrsf. Python package version 2.6.3.
 
Tucker JD (2025b). fdasrvf: Elastic Functional Data Analysis. R package version 2.4.0.
 
Tucker JD, Lewis JR, King C, Kurtek S (2020). A geometric approach for computing tolerance bounds for elastic functional data. Journal of Applied Statistics, 47(3): 481–505. https://doi.org/10.1080/02664763.2019.1645818
 
Tucker JD, Lewis JR, Srivastava A (2019). Elastic functional principal component regression. Statistical Analysis and Data Mining: The ASA Data Science Journal, 12(2): 101–115.
 
Tucker JD, Shand L, Chowdhary K (2021). Multimodal Bayesian registration of noisy functions using Hamiltonian Monte Carlo. Computational Statistics & Data Analysis, 163:107298.
 
Tucker JD, Wu W, Srivastava A (2013). Generative models for functional data using phase and amplitude separation. Computational Statistics & Data Analysis, 61: 50–66. https://doi.org/10.1016/j.csda.2012.12.001
 
Ullah S, Finch CF (2013). Applications of functional data analysis: A systematic review. BMC Medical Research Methodology, 13(1): 43. https://doi.org/10.1186/1471-2288-13-43

Related articles PDF XML
Related articles PDF XML

Copyright
2025 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
elastic shape analysis explainability functional principal components interpretability variable importance

Funding
Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government.

Metrics
since February 2021
264

Article info
views

84

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy