Explainable Machine Learning for Functional Data
Pub. online: 16 December 2025
Type: Statistical Data Science
Open Access
Received
27 March 2025
27 March 2025
Accepted
8 December 2025
8 December 2025
Published
16 December 2025
16 December 2025
Abstract
Black-box machine learning models are recognized as useful tools for prediction applications, but the algorithmic complexity of some models causes interpretation challenges. Explainability methods have been proposed to provide insight into these models, but there is little research focused on supervised modeling with functional data inputs. We argue that, especially in applications of high consequence, it is important to explicitly model the functional dependence in a black-box analysis to not obscure or misrepresent patterns in explanations. As such, we propose the Variable importance Explainable Elastic Shape Analysis (VEESA) pipeline for training supervised machine learning models with functional inputs. The pipeline is an analysis process that includes the data preprocessing, modeling, and post-hoc explanations. The preprocessing is done using elastic functional principal components analysis, which accounts for vertical and horizontal variability in functional data and, ultimately, allows for explanations in the original data space that identify the important functional variability without bias due to correlated variables. Here, we demonstrate the pipeline on two high-consequence applications: explosives classification for national security and inkjet printer identification in forensic science. The applications exhibit the VEESA pipeline’s ability to provide an understanding of the characteristics of the functional data useful for prediction. Code for implementing the pipeline is available in the veesa R package (and supplemental python code).
Supplementary material
Supplementary MaterialThe supplementary materials include additional analyses on the shifted peaks, H-CT, and inkjet printer data, implementation details, R and Python code, and the shifted peaks and inkjet datasets. Due to proprietary reasons, the H-CT dataset is not able to be shared.
References
Adadi A, Berrada M (2018). Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE Access, 6: 52138–52160. https://doi.org/10.1109/ACCESS.2018.2870052
Bertsekas DP (1996). Dynamic programming and optimal control. Journal of the Operational Research Society, 47(6): 833–834. https://doi.org/10.2307/3010291
Breiman L (2001). Random forests. Machine Learning, 45(1): 5–32. https://doi.org/10.1023/A:1010933404324
Butts-Wilmsmeyer CJ, Rapp S, Guthrie B (2020). The technological advancements that enabled the age of big data in the environmental sciences: A history and future directions. Current Opinion in Environmental Science and Health, 18. 63–69. Environmental Chemistry: Innovative Approaches and Instrumentation in Environmental Chemistry.
Buzzini P, Curran J, Polston C (2021). Comparison between visual assessments and different variants of linear discriminant analysis to the classification of Raman patterns of inkjet printer inks. Forensic Chemistry, 24:100336. https://doi.org/10.1016/j.forc.2021.100336
Danne T, Nimri R, Battelino T, Bergenstal RM, Close KL, DeVries JH, et al. (2017). International consensus on use of continuous glucose monitoring. Diabetes Care, 40(12): 1631–1640. https://doi.org/10.2337/dc17-1600
Febrero-Bande M, Galeano P, González-Manteiga W (2017). Functional principal component regression and functional partial least-squares regression: An overview and a comparative study. International Statistical Review, 85(1): 61–83. https://doi.org/10.1111/insr.12116
Friedman JH (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5). https://doi.org/10.1214/aos/1013203451
Goldstein A, Kapelner A, Bleich J, Pitkin E (2015). Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation. Journal of Computational and Graphical Statistics, 24(1): 44–65. https://doi.org/10.1080/10618600.2014.907095
Goode K, Ries D, Zollweg J (2020). Explaining neural network predictions for functional data using principal component analysis and feature importance. In: Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI) Fall Symposium Series 2020: Artificial Intelligence in Government and Public Sector, Geib C, Petrick R (Eds.).
Ha W, Singh C, Lanusse F, Upadhyayula S, Yu B (2021). Adaptive wavelet distillation from neural networks through interpretations. In: Proceedings of the 35th International Conference on Neural Information Processing Systems, NIPS ‘21, Ranzato M et al. (Eds.), Curran Associates Inc., Red, Hook, NY, USA.
Hooker G, Mentch L, Zhou S (2021). Unrestricted permutation forces extrapolation: Variable importance requires at least one more model, or there is no free variable importance. Statistics and Computing, 31: 1–16. https://doi.org/10.1007/s11222-021-10057-z
Jimenez ES, Thompson KR, Stohn A, Goodner RN (2017). Leveraging multi-channel x-ray detector technology to improve quality metrics for industrial and security applications. In: Radiation Detectors in Medicine, Industry, and National Security XVIII, Grim F, Furenlid L, Barber H B (Eds.), volume 10393, 137–147. SPIE.
Lee S, Jung S (2017). Combined analysis of amplitude and phase variations in functional data. arXiv preprint: https://arxiv.org/abs/1603.01775.
Li H, Xiao G, Xia T, Tang YY, Li L (2014). Hyperspectral image classification using functional data analysis. IEEE Transactions on Cybernetics, 44(9): 1544–1555. https://doi.org/10.1109/TCYB.2013.2289331
Martin-Barragan B, Lillo R, Romo J (2014). Interpretable support vector machines for functional data. European Journal of Operational Research, 232(1): 146–155. https://doi.org/10.1016/j.ejor.2012.08.017
Ries D, Gabriel Huerta J (2023). Predicting fatigue from heart rate signatures using functional logistic regression. Stat, 12(1): e595. https://doi.org/10.1002/sta4.595
Rudin C, Chen C, Chen Z, Huang H, Semenova L, Zhong C (2022). Interpretable machine learning: Fundamental principles and 10 grand challenges. Statistics Surveys, 16(none): 1–85. https://doi.org/10.1214/21-SS133
Sankaran K (2024). Data science principles for interpretable and explainable AI. Journal of Data Science, 1–27. https://doi.org/10.6339/24-JDS1150
Srivastava A, Klassen E, Joshi SH, Jermyn IH (2011). Shape analysis of elastic curves in Euclidean spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(7): 1415–1428. https://doi.org/10.1109/TPAMI.2010.184
Strobl C, Boulesteix AL, Kneib T, Augustin T, Zeileis A (2008). Conditional variable importance for random forests. BMC Bioinformatics, 9(1): 307. https://doi.org/10.1186/1471-2105-9-307
Thind B, Multani K, Cao J (2023). Deep learning with functional inputs. Journal of Computational and Graphical Statistics, 32(1): 171–180. https://doi.org/10.1080/10618600.2022.2097914
Tucker JD, Lewis JR, King C, Kurtek S (2020). A geometric approach for computing tolerance bounds for elastic functional data. Journal of Applied Statistics, 47(3): 481–505. https://doi.org/10.1080/02664763.2019.1645818
Tucker JD, Wu W, Srivastava A (2013). Generative models for functional data using phase and amplitude separation. Computational Statistics & Data Analysis, 61: 50–66. https://doi.org/10.1016/j.csda.2012.12.001
Ullah S, Finch CF (2013). Applications of functional data analysis: A systematic review. BMC Medical Research Methodology, 13(1): 43. https://doi.org/10.1186/1471-2288-13-43