Supplementary Material

JDS

Journal of Data Science

1683-86021680-743X

1680-743X

School of Statistics, Renmin University of China

JDS1093

10.6339/23-JDS1093

Statistical Data Science

Neural Generalized Ordinary Differential Equations with Layer-Varying Parameters

Duo

1 Miao

Hongyu

2 Wu

Hulin

hulin.wu@uth.tmc.edu3∗ 1Department of Population Health, The University of Texas at Austin, United States 2College of Nursing, Florida State University, United States 3Department of Biostatistics and Data Science, The University of Texas Health Science Center at Houston, United States

∗Corresponding author. Email: hulin.wu@uth.tmc.edu.

2024

2322023

2211024

Supplementary Material

Programming code to reproduce our results and figures can be found at https://github.com/Duo-Yu/Neural-GODE. In the Supplementary Material, we list the code directories and corresponding results.

151220221922023

2024 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.

2024

Open access article under the CC BY license.

Deep residual networks (ResNets) have shown state-of-the-art performance in various real-world applications. Recently, the ResNets model was reparameterized and interpreted as solutions to a continuous ordinary differential equation or Neural-ODE model. In this study, we propose a neural generalized ordinary differential equation (Neural-GODE) model with layer-varying parameters to further extend the Neural-ODE to approximate the discrete ResNets. Specifically, we use nonparametric B-spline functions to parameterize the Neural-GODE so that the trade-off between the model complexity and computational efficiency can be easily balanced. It is demonstrated that ResNets and Neural-ODE models are special cases of the proposed Neural-GODE model. Based on two benchmark datasets, MNIST and CIFAR-10, we show that the layer-varying Neural-GODE is more flexible and general than the standard Neural-ODE. Furthermore, the Neural-GODE enjoys the computational and memory benefits while performing comparably to ResNets in prediction accuracy.

Keywords B-splines deep residual networks neural ordinary differential equations

This work was supported in part by NIH grant R01 AI087135 and P03AI161943 (HW), grant from Cancer Prevention and Research Institute of Texas (PR170668) (HW), grant NSF/ECCS 2133106 (HM), and NSF/DMS 1620957 (HM).

References

Abdeltawab

, Shehata

, Shalaby

, Khalifa

, Mahmoud

, El-Ghar

, et al. (2019). A novel cnn-based cad system for early assessment of transplanted kidney dysfunction. Scientific Reports, 9(1): 1–11. https://doi.org/10.1038/s41598-018-37186-2

Arnold

(2012). Geometrical Methods in the Theory of Ordinary Differential Equations. Springer Science & Business Media.

Bahdanau

, Cho

, Bengio

(2014). Neural machine translation by jointly learning to align and translate. arXiv preprint https://arxiv.org/abs/1409.0473.

Bai

, Kolter

, Koltun

(2019). Deep equilibrium models. Advances in Neural Information Processing Systems, 32.

Bartels

, Beatty

, Barsky

(1995). An Introduction to Splines for Use in Computer Graphics and Geometric Modeling. Morgan Kaufmann.

Bishop

, et al. (1995). Neural Networks for Pattern Recognition. Oxford University Press.

Chang

, Chen

, Haber

, Chi

(2019). Antisymmetricrnn: A dynamical system view on recurrent neural networks. arXiv preprint https://arxiv.org/abs/1902.09689.

Chang

, Meng

, Haber

, Ruthotto

, Begert

, Holtham

(2018). Reversible architectures for arbitrarily deep residual neural networks. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1).

Chen

, Wu

(2008). Efficient local estimation for time-varying coefficients in deterministic dynamic models with applications to hiv-1 dynamics. Journal of the American Statistical Association, 103(481): 369–384. https://doi.org/10.1198/016214507000001382

Chen

, Rubanova

, Bettencourt

, Duvenaud

(2018). Neural ordinary differential equations. Advances in Neural Information Processing Systems, 31. https://doi.org/10.1007/978-3-030-04167-0

Chen

RTQ

(2018). torchdiffeq. https://github.com/rtqichen/torchdiffeq.

Cranmer

, Greydanus

, Hoyer

, Battaglia

, Spergel

, Ho

(2020). Lagrangian neural networks. arXiv preprint https://arxiv.org/abs/2003.04630.

Dupont

, Doucet

, Teh

(2019). Augmented neural odes. Advances in Neural Information Processing Systems, 32.

Esteva

, Kuprel

, Novoa

, Ko

, Swetter

, Blau

, et al. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639): 115–118. https://doi.org/10.1038/nature21056

Goodfellow

, Pouget-Abadie

, Mirza

, Xu

, Warde-Farley

, Ozair

, et al. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems, 27.

Graves

, Mohamed

Hinton

(2013). Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech And Signal Processing. 6645–6649.

Greydanus

, Dzamba

, Yosinski

(2019). Hamiltonian neural networks. Advances in Neural Information Processing Systems, 32.

Günther

, Pazner

, Qi

(2021). Spline parameterization of neural network controls for deep learning. arXiv preprint https://arxiv.org/abs/2103.00301.

Haber

, Ruthotto

(2017). Stable architectures for deep neural networks. Inverse Problems, 34(1): 014004. https://doi.org/10.1088/1361-6420/aa9a90

, Zhang

, Ren

, Sun

(2016a). Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.

, Zhang

, Ren

, Sun

(2016b). Identity mappings in deep residual networks. In: European Conference on Computer Vision. 630–645.

Hornik

, Stinchcombe

, White

(1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5): 359–366. https://doi.org/10.1016/0893-6080(89)90020-8

Ioffe

, Szegedy

(2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning. 448–456.

Kim

, Kim

, Lee

, Kim

(2018). Performance of the deep convolutional neural network based magnetic resonance image scoring algorithm for differentiating between tuberculous and pyogenic spondylitis. Scientific Reports, 8(1): 1–10. https://doi.org/10.1038/s41598-018-35713-9

Krizhevsky

, Hinton

, et al. (2009). Learning multiple layers of features from tiny images, Master’s Thesis, University of Tront.

LaSalle

(1968). Stability theory for ordinary differential equations. Journal of Differential Equations, 4(1): 57–65. https://doi.org/10.1016/0022-0396(68)90048-X

LeCun

, Bottou

, Bengio

, Haffner

(1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11): 2278–2324. https://doi.org/10.1109/5.726791

, Chen

, Tai

, et al. (2017). Maximum principle based algorithms for deep learning. arXiv preprint https://arxiv.org/abs/1710.09513.

, Lin

, Shen

(2019). Deep learning via dynamical systems: An approximation perspective. arXiv preprint https://arxiv.org/abs/1912.10382.

Liang

, Miao

, Wu

(2010). Estimation of constant and time-varying dynamic parameters of hiv infection in a nonlinear differential equation model. The Annals of Applied Statistics, 4(1): 460. https://doi.org/10.1214/09-AOAS290

Lim

(2021). Understanding recurrent neural networks using nonequilibrium response theory. Journal of Machine Learning Research, 22: 1–47.

Lim

, Erichson

, Hodgkinson

, Mahoney

(2021). Noisy recurrent neural networks. Advances in Neural Information Processing Systems, 34: 5124–5137.

Long

, Shelhamer

, Darrell

(2015). Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3431–3440.

, Zhong

, Li

, Dong

(2018). Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations. In: International Conference on Machine Learning. 3276–3285.

Massaroli

, Poli

, Park

, Yamashita

, Asama

(2020). Dissecting neural odes. Advances in Neural Information Processing Systems, 33: 3952–3963.

Miao

, Wu

, Xue

(2014). Generalized ordinary differential equation models. Journal of the American Statistical Association, 109(508): 1672–1682. https://doi.org/10.1080/01621459.2014.957287

Noda

, Yamaguchi

, Nakadai

, Okuno

, Ogata

(2015). Audio-visual speech recognition using deep learning. Applied Intelligence, 42(4): 722–737. https://doi.org/10.1007/s10489-014-0629-7

Perperoglou

, Sauerbrei

, Abrahamowicz

, Schmid

(2019). A review of spline function procedures in r. BMC Medical Research Methodology, 19(1): 1–16. https://doi.org/10.1186/s12874-018-0650-3

Qiu

, Yao

, Mei

(2017). Learning spatio-temporal representation with pseudo-3d residual networks. In: Proceedings of the IEEE International Conference on Computer Vision. 5533–5541.

Queiruga

, Erichson

, Hodgkinson

, Mahoney

(2021). Stateful ode-nets using basis function expansions. Advances in Neural Information Processing Systems, 34: 21770–21781.

Queiruga

, Erichson

, Taylor

, Mahoney

(2020). Continuous-in-depth neural networks. arXiv preprint https://arxiv.org/abs/2008.02389.

Ripley

(2007). Pattern Recognition and Neural Networks. Cambridge University Press.

Rusch

, Mishra

(2020). Coupled oscillatory recurrent neural network (cornn): An accurate and (gradient) stable architecture for learning long time dependencies. arXiv preprint https://arxiv.org/abs/2010.00951.

Rusch

, Mishra

(2021). Unicornn: A recurrent model for learning very long time dependencies. In: International Conference on Machine Learning, 9168–9178. PMLR.

Ruthotto

, Haber

(2020). Deep neural networks motivated by partial differential equations. Journal of Mathematical Imaging and Vision, 62(3): 352–364. https://doi.org/10.1007/s10851-019-00903-1

Sigaki

, Lenzi

, Zola

, Perc

, Ribeiro

(2020). Learning physical properties of liquid crystals with deep convolutional neural networks. Scientific Reports, 10(1): 1–10. https://doi.org/10.1038/s41598-019-56847-4

Silver

, Huang

, Maddison

, Guez

, Sifre

, Van Den Driessche

, et al. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587): 484–489. https://doi.org/10.1038/nature16961

Silver

, Schrittwieser

, Simonyan

, Antonoglou

, Huang

, Guez

, et al. (2017). Mastering the game of go without human knowledge. Nature, 550(7676): 354–359. https://doi.org/10.1038/nature24270

Simmons

(2016). Differential Equations with Applications and Historical Notes. CRC Press.

Tang

, Tang

, Wang

, Xiao

(2015). Holling ii predator–prey impulsive semi-dynamic model with complex poincaré map. Nonlinear Dynamics, 81(3): 1575–1596. https://doi.org/10.1007/s11071-015-2092-3

Weinan

(2017). A proposal on machine learning via dynamical systems. Communications in Mathematics and Statistics, 1(5): 1–11.

, He

(2018). Group normalization. In: Proceedings of the European Conference on Computer Vision (ECCV). 3–19.

Xue

, Miao

, Wu

(2010). Sieve estimation of constant and time-varying coefficients in nonlinear ordinary differential equation models by considering both numerical error and measurement error. Annals of Statistics, 38(4): 2351. https://doi.org/10.1214/09-AOS784

Young

, Hazarika

, Poria

, Cambria

(2018). Recent trends in deep learning based natural language processing. IEEE Computational Intelligence Magazine, 13(3): 55–75. https://doi.org/10.1109/MCI.2018.2840738

, Deng

(2016). Automatic Speech Recognition, volume 1. Springer.

, Lin

, Chiu

, He

(2017). Effects of reactive social distancing on the 1918 influenza pandemic. PloS One, 12(7): e0180545. https://doi.org/10.1371/journal.pone.0180545

, Tang

, Lou

(2016). Revisiting logistic population model for assessing periodically harvested closures. Communications in Mathematical Biology and Neuroscience, 2016: Article ID 14.

, Yaseen

, Luo

(2020). Neural network and deep learning methods for ehr data. In: Statistics and Machine Learning Methods for EHR Data (

Wu,

Yamal,

Yaseen,

Maroufy, eds.), 253–271. Chapman and Hall/CRC.

, Zhu

, Wang

, Zhang

, Soltanalizadeh

, Wang

, et al. (2021). Assessing effects of reopening policies on COVID-19 pandemic in texas with a data-driven transmission model. Infectious Disease Modelling, 6: 461–473. https://doi.org/10.1016/j.idm.2021.02.001

Zhang

, Sun

, Han

, Yuan

, Guo

, Liu

(2017). Residual networks of residual networks: Multilevel residual networks. IEEE Transactions on Circuits and Systems for Video Technology, 28(6): 1303–1314. https://doi.org/10.1109/TCSVT.2017.2654543

Zhang

, Yao

, Gholami

, Gonzalez

, Keutzer

, Mahoney

, et al. (2019). Anodev2: A coupled neural ode framework. Advances in Neural Information Processing Systems, 32.

https://arxiv.org/abs/1909.12077