Neural Generalized Ordinary Differential Equations with Layer-Varying Parameters

Yu, Duo; Miao, Hongyu; Wu, Hulin

doi:10.6339/23-JDS1093

Journal of Data Science

Neural Generalized Ordinary Differential Equations with Layer-Varying Parameters

Volume 22, Issue 1 (2024), pp. 10–24

Duo Yu Hongyu Miao Hulin Wu

https://doi.org/10.6339/23-JDS1093

Pub. online: 23 February 2023 Type: Statistical Data Science

Open Access

Received
15 December 2022

Accepted
19 February 2023

Published
23 February 2023

Abstract

Deep residual networks (ResNets) have shown state-of-the-art performance in various real-world applications. Recently, the ResNets model was reparameterized and interpreted as solutions to a continuous ordinary differential equation or Neural-ODE model. In this study, we propose a neural generalized ordinary differential equation (Neural-GODE) model with layer-varying parameters to further extend the Neural-ODE to approximate the discrete ResNets. Specifically, we use nonparametric B-spline functions to parameterize the Neural-GODE so that the trade-off between the model complexity and computational efficiency can be easily balanced. It is demonstrated that ResNets and Neural-ODE models are special cases of the proposed Neural-GODE model. Based on two benchmark datasets, MNIST and CIFAR-10, we show that the layer-varying Neural-GODE is more flexible and general than the standard Neural-ODE. Furthermore, the Neural-GODE enjoys the computational and memory benefits while performing comparably to ResNets in prediction accuracy.

Supplementary material

Supplementary Material

Programming code to reproduce our results and figures can be found at https://github.com/Duo-Yu/Neural-GODE. In the Supplementary Material, we list the code directories and corresponding results.

References

Abdeltawab H, Shehata M, Shalaby A, Khalifa F, Mahmoud A, El-Ghar MA, et al. (2019). A novel cnn-based cad system for early assessment of transplanted kidney dysfunction. Scientific Reports, 9(1): 1–11. https://doi.org/10.1038/s41598-018-37186-2

Arnold VI (2012). Geometrical Methods in the Theory of Ordinary Differential Equations. Springer Science & Business Media.

Bahdanau D, Cho K, Bengio Y (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint https://arxiv.org/abs/1409.0473.

Bai S, Kolter JZ, Koltun V (2019). Deep equilibrium models. Advances in Neural Information Processing Systems, 32.

Bartels RH, Beatty JC, Barsky BA (1995). An Introduction to Splines for Use in Computer Graphics and Geometric Modeling. Morgan Kaufmann.

Bishop CM, et al. (1995). Neural Networks for Pattern Recognition. Oxford University Press.

Chang B, Chen M, Haber E, Chi EH (2019). Antisymmetricrnn: A dynamical system view on recurrent neural networks. arXiv preprint https://arxiv.org/abs/1902.09689.

Chang B, Meng L, Haber E, Ruthotto L, Begert D, Holtham E (2018). Reversible architectures for arbitrarily deep residual neural networks. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1).

Chen J, Wu H (2008). Efficient local estimation for time-varying coefficients in deterministic dynamic models with applications to hiv-1 dynamics. Journal of the American Statistical Association, 103(481): 369–384. https://doi.org/10.1198/016214507000001382

Chen RT, Rubanova Y, Bettencourt J, Duvenaud DK (2018). Neural ordinary differential equations. Advances in Neural Information Processing Systems, 31. https://doi.org/10.1007/978-3-030-04167-0

Chen RTQ (2018). torchdiffeq. https://github.com/rtqichen/torchdiffeq.

Cranmer M, Greydanus S, Hoyer S, Battaglia P, Spergel D, Ho S (2020). Lagrangian neural networks. arXiv preprint https://arxiv.org/abs/2003.04630.

Dupont E, Doucet A, Teh YW (2019). Augmented neural odes. Advances in Neural Information Processing Systems, 32.

Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, et al. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639): 115–118. https://doi.org/10.1038/nature21056

Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems, 27.

Graves A, Mohamed Ar Hinton G (2013). Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech And Signal Processing. 6645–6649.

Greydanus S, Dzamba M, Yosinski J (2019). Hamiltonian neural networks. Advances in Neural Information Processing Systems, 32.

Günther S, Pazner W, Qi D (2021). Spline parameterization of neural network controls for deep learning. arXiv preprint https://arxiv.org/abs/2103.00301.

Haber E, Ruthotto L (2017). Stable architectures for deep neural networks. Inverse Problems, 34(1): 014004. https://doi.org/10.1088/1361-6420/aa9a90

He K, Zhang X, Ren S, Sun J (2016a). Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.

He K, Zhang X, Ren S, Sun J (2016b). Identity mappings in deep residual networks. In: European Conference on Computer Vision. 630–645.

Hornik K, Stinchcombe M, White H (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5): 359–366. https://doi.org/10.1016/0893-6080(89)90020-8

Ioffe S, Szegedy C (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning. 448–456.

Kim K, Kim S, Lee YH, Lee SH, Lee HS, Kim S (2018). Performance of the deep convolutional neural network based magnetic resonance image scoring algorithm for differentiating between tuberculous and pyogenic spondylitis. Scientific Reports, 8(1): 1–10. https://doi.org/10.1038/s41598-018-35713-9

Krizhevsky A, Hinton G, et al. (2009). Learning multiple layers of features from tiny images, Master’s Thesis, University of Tront.

LaSalle JP (1968). Stability theory for ordinary differential equations. Journal of Differential Equations, 4(1): 57–65. https://doi.org/10.1016/0022-0396(68)90048-X

LeCun Y, Bottou L, Bengio Y, Haffner P (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11): 2278–2324. https://doi.org/10.1109/5.726791

Li Q, Chen L, Tai C, et al. (2017). Maximum principle based algorithms for deep learning. arXiv preprint https://arxiv.org/abs/1710.09513.

Li Q, Lin T, Shen Z (2019). Deep learning via dynamical systems: An approximation perspective. arXiv preprint https://arxiv.org/abs/1912.10382.

Liang H, Miao H, Wu H (2010). Estimation of constant and time-varying dynamic parameters of hiv infection in a nonlinear differential equation model. The Annals of Applied Statistics, 4(1): 460. https://doi.org/10.1214/09-AOAS290

Lim SH (2021). Understanding recurrent neural networks using nonequilibrium response theory. Journal of Machine Learning Research, 22: 1–47.

Lim SH, Erichson NB, Hodgkinson L, Mahoney MW (2021). Noisy recurrent neural networks. Advances in Neural Information Processing Systems, 34: 5124–5137.

Long J, Shelhamer E, Darrell T (2015). Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3431–3440.

Lu Y, Zhong A, Li Q, Dong B (2018). Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations. In: International Conference on Machine Learning. 3276–3285.

Massaroli S, Poli M, Park J, Yamashita A, Asama H (2020). Dissecting neural odes. Advances in Neural Information Processing Systems, 33: 3952–3963.

Miao H, Wu H, Xue H (2014). Generalized ordinary differential equation models. Journal of the American Statistical Association, 109(508): 1672–1682. https://doi.org/10.1080/01621459.2014.957287

Noda K, Yamaguchi Y, Nakadai K, Okuno HG, Ogata T (2015). Audio-visual speech recognition using deep learning. Applied Intelligence, 42(4): 722–737. https://doi.org/10.1007/s10489-014-0629-7

Perperoglou A, Sauerbrei W, Abrahamowicz M, Schmid M (2019). A review of spline function procedures in r. BMC Medical Research Methodology, 19(1): 1–16. https://doi.org/10.1186/s12874-018-0650-3

Qiu Z, Yao T, Mei T (2017). Learning spatio-temporal representation with pseudo-3d residual networks. In: Proceedings of the IEEE International Conference on Computer Vision. 5533–5541.

Queiruga A, Erichson NB, Hodgkinson L, Mahoney MW (2021). Stateful ode-nets using basis function expansions. Advances in Neural Information Processing Systems, 34: 21770–21781.

Queiruga AF, Erichson NB, Taylor D, Mahoney MW (2020). Continuous-in-depth neural networks. arXiv preprint https://arxiv.org/abs/2008.02389.

Ripley BD (2007). Pattern Recognition and Neural Networks. Cambridge University Press.

Rusch TK, Mishra S (2020). Coupled oscillatory recurrent neural network (cornn): An accurate and (gradient) stable architecture for learning long time dependencies. arXiv preprint https://arxiv.org/abs/2010.00951.

Rusch TK, Mishra S (2021). Unicornn: A recurrent model for learning very long time dependencies. In: International Conference on Machine Learning, 9168–9178. PMLR.

Ruthotto L, Haber E (2020). Deep neural networks motivated by partial differential equations. Journal of Mathematical Imaging and Vision, 62(3): 352–364. https://doi.org/10.1007/s10851-019-00903-1

Sigaki HY, Lenzi EK, Zola RS, Perc M, Ribeiro HV (2020). Learning physical properties of liquid crystals with deep convolutional neural networks. Scientific Reports, 10(1): 1–10. https://doi.org/10.1038/s41598-019-56847-4

Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, et al. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587): 484–489. https://doi.org/10.1038/nature16961

Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, et al. (2017). Mastering the game of go without human knowledge. Nature, 550(7676): 354–359. https://doi.org/10.1038/nature24270

Simmons GF (2016). Differential Equations with Applications and Historical Notes. CRC Press.

Tang S, Tang B, Wang A, Xiao Y (2015). Holling ii predator–prey impulsive semi-dynamic model with complex poincaré map. Nonlinear Dynamics, 81(3): 1575–1596. https://doi.org/10.1007/s11071-015-2092-3

Weinan E (2017). A proposal on machine learning via dynamical systems. Communications in Mathematics and Statistics, 1(5): 1–11.

Wu Y, He K (2018). Group normalization. In: Proceedings of the European Conference on Computer Vision (ECCV). 3–19.

Xue H, Miao H, Wu H (2010). Sieve estimation of constant and time-varying coefficients in nonlinear ordinary differential equation models by considering both numerical error and measurement error. Annals of Statistics, 38(4): 2351. https://doi.org/10.1214/09-AOS784

Young T, Hazarika D, Poria S, Cambria E (2018). Recent trends in deep learning based natural language processing. IEEE Computational Intelligence Magazine, 13(3): 55–75. https://doi.org/10.1109/MCI.2018.2840738

Yu D, Deng L (2016). Automatic Speech Recognition, volume 1. Springer.

Yu D, Lin Q, Chiu AP, He D (2017). Effects of reactive social distancing on the 1918 influenza pandemic. PloS One, 12(7): e0180545. https://doi.org/10.1371/journal.pone.0180545

Yu D, Tang S, Lou Y (2016). Revisiting logistic population model for assessing periodically harvested closures. Communications in Mathematical Biology and Neuroscience, 2016: Article ID 14.

Yu D, Yaseen A, Luo X (2020). Neural network and deep learning methods for ehr data. In: Statistics and Machine Learning Methods for EHR Data (H Wu, JM Yamal, A Yaseen, V Maroufy, eds.), 253–271. Chapman and Hall/CRC.

Yu D, Zhu G, Wang X, Zhang C, Soltanalizadeh B, Wang X, et al. (2021). Assessing effects of reopening policies on COVID-19 pandemic in texas with a data-driven transmission model. Infectious Disease Modelling, 6: 461–473. https://doi.org/10.1016/j.idm.2021.02.001

Zhang K, Sun M, Han TX, Yuan X, Guo L, Liu T (2017). Residual networks of residual networks: Multilevel residual networks. IEEE Transactions on Circuits and Systems for Video Technology, 28(6): 1303–1314. https://doi.org/10.1109/TCSVT.2017.2654543

Zhang T, Yao Z, Gholami A, Gonzalez JE, Keutzer K, Mahoney MW, et al. (2019). Anodev2: A coupled neural ode framework. Advances in Neural Information Processing Systems, 32.

https://arxiv.org/abs/1909.12077

2024 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.

Open access article under the CC BY license.

Keywords

B-splines deep residual networks neural ordinary differential equations

Funding

This work was supported in part by NIH grant R01 AI087135 and P03AI161943 (HW), grant from Cancer Prevention and Research Institute of Texas (PR170668) (HW), grant NSF/ECCS 2133106 (HM), and NSF/DMS 1620957 (HM).

Metrics

since February 2021

1359

Article info
views

565

PDF
downloads

RSS

Authors

Abstract

Supplementary material

References

Export citation

Copy and paste formatted citation

Download citation in file