Neural Generalized Ordinary Differential Equations with Layer-Varying Parameters
Volume 22, Issue 1 (2024), pp. 10–24
Pub. online: 23 February 2023
Type: Statistical Data Science
Open Access
Received
15 December 2022
15 December 2022
Accepted
19 February 2023
19 February 2023
Published
23 February 2023
23 February 2023
Abstract
Deep residual networks (ResNets) have shown state-of-the-art performance in various real-world applications. Recently, the ResNets model was reparameterized and interpreted as solutions to a continuous ordinary differential equation or Neural-ODE model. In this study, we propose a neural generalized ordinary differential equation (Neural-GODE) model with layer-varying parameters to further extend the Neural-ODE to approximate the discrete ResNets. Specifically, we use nonparametric B-spline functions to parameterize the Neural-GODE so that the trade-off between the model complexity and computational efficiency can be easily balanced. It is demonstrated that ResNets and Neural-ODE models are special cases of the proposed Neural-GODE model. Based on two benchmark datasets, MNIST and CIFAR-10, we show that the layer-varying Neural-GODE is more flexible and general than the standard Neural-ODE. Furthermore, the Neural-GODE enjoys the computational and memory benefits while performing comparably to ResNets in prediction accuracy.
Supplementary material
Supplementary MaterialProgramming code to reproduce our results and figures can be found at https://github.com/Duo-Yu/Neural-GODE. In the Supplementary Material, we list the code directories and corresponding results.
References
Abdeltawab H, Shehata M, Shalaby A, Khalifa F, Mahmoud A, El-Ghar MA, et al. (2019). A novel cnn-based cad system for early assessment of transplanted kidney dysfunction. Scientific Reports, 9(1): 1–11. https://doi.org/10.1038/s41598-018-37186-2
Bahdanau D, Cho K, Bengio Y (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint https://arxiv.org/abs/1409.0473.
Chang B, Chen M, Haber E, Chi EH (2019). Antisymmetricrnn: A dynamical system view on recurrent neural networks. arXiv preprint https://arxiv.org/abs/1902.09689.
Chen J, Wu H (2008). Efficient local estimation for time-varying coefficients in deterministic dynamic models with applications to hiv-1 dynamics. Journal of the American Statistical Association, 103(481): 369–384. https://doi.org/10.1198/016214507000001382
Chen RT, Rubanova Y, Bettencourt J, Duvenaud DK (2018). Neural ordinary differential equations. Advances in Neural Information Processing Systems, 31. https://doi.org/10.1007/978-3-030-04167-0
Chen RTQ (2018). torchdiffeq. https://github.com/rtqichen/torchdiffeq.
Cranmer M, Greydanus S, Hoyer S, Battaglia P, Spergel D, Ho S (2020). Lagrangian neural networks. arXiv preprint https://arxiv.org/abs/2003.04630.
Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, et al. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639): 115–118. https://doi.org/10.1038/nature21056
Günther S, Pazner W, Qi D (2021). Spline parameterization of neural network controls for deep learning. arXiv preprint https://arxiv.org/abs/2103.00301.
Haber E, Ruthotto L (2017). Stable architectures for deep neural networks. Inverse Problems, 34(1): 014004. https://doi.org/10.1088/1361-6420/aa9a90
Hornik K, Stinchcombe M, White H (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5): 359–366. https://doi.org/10.1016/0893-6080(89)90020-8
Kim K, Kim S, Lee YH, Lee SH, Lee HS, Kim S (2018). Performance of the deep convolutional neural network based magnetic resonance image scoring algorithm for differentiating between tuberculous and pyogenic spondylitis. Scientific Reports, 8(1): 1–10. https://doi.org/10.1038/s41598-018-35713-9
LaSalle JP (1968). Stability theory for ordinary differential equations. Journal of Differential Equations, 4(1): 57–65. https://doi.org/10.1016/0022-0396(68)90048-X
LeCun Y, Bottou L, Bengio Y, Haffner P (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11): 2278–2324. https://doi.org/10.1109/5.726791
Li Q, Chen L, Tai C, et al. (2017). Maximum principle based algorithms for deep learning. arXiv preprint https://arxiv.org/abs/1710.09513.
Li Q, Lin T, Shen Z (2019). Deep learning via dynamical systems: An approximation perspective. arXiv preprint https://arxiv.org/abs/1912.10382.
Liang H, Miao H, Wu H (2010). Estimation of constant and time-varying dynamic parameters of hiv infection in a nonlinear differential equation model. The Annals of Applied Statistics, 4(1): 460. https://doi.org/10.1214/09-AOAS290
Miao H, Wu H, Xue H (2014). Generalized ordinary differential equation models. Journal of the American Statistical Association, 109(508): 1672–1682. https://doi.org/10.1080/01621459.2014.957287
Noda K, Yamaguchi Y, Nakadai K, Okuno HG, Ogata T (2015). Audio-visual speech recognition using deep learning. Applied Intelligence, 42(4): 722–737. https://doi.org/10.1007/s10489-014-0629-7
Perperoglou A, Sauerbrei W, Abrahamowicz M, Schmid M (2019). A review of spline function procedures in r. BMC Medical Research Methodology, 19(1): 1–16. https://doi.org/10.1186/s12874-018-0650-3
Queiruga AF, Erichson NB, Taylor D, Mahoney MW (2020). Continuous-in-depth neural networks. arXiv preprint https://arxiv.org/abs/2008.02389.
Rusch TK, Mishra S (2020). Coupled oscillatory recurrent neural network (cornn): An accurate and (gradient) stable architecture for learning long time dependencies. arXiv preprint https://arxiv.org/abs/2010.00951.
Ruthotto L, Haber E (2020). Deep neural networks motivated by partial differential equations. Journal of Mathematical Imaging and Vision, 62(3): 352–364. https://doi.org/10.1007/s10851-019-00903-1
Sigaki HY, Lenzi EK, Zola RS, Perc M, Ribeiro HV (2020). Learning physical properties of liquid crystals with deep convolutional neural networks. Scientific Reports, 10(1): 1–10. https://doi.org/10.1038/s41598-019-56847-4
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, et al. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587): 484–489. https://doi.org/10.1038/nature16961
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, et al. (2017). Mastering the game of go without human knowledge. Nature, 550(7676): 354–359. https://doi.org/10.1038/nature24270
Tang S, Tang B, Wang A, Xiao Y (2015). Holling ii predator–prey impulsive semi-dynamic model with complex poincaré map. Nonlinear Dynamics, 81(3): 1575–1596. https://doi.org/10.1007/s11071-015-2092-3
Xue H, Miao H, Wu H (2010). Sieve estimation of constant and time-varying coefficients in nonlinear ordinary differential equation models by considering both numerical error and measurement error. Annals of Statistics, 38(4): 2351. https://doi.org/10.1214/09-AOS784
Young T, Hazarika D, Poria S, Cambria E (2018). Recent trends in deep learning based natural language processing. IEEE Computational Intelligence Magazine, 13(3): 55–75. https://doi.org/10.1109/MCI.2018.2840738
Yu D, Lin Q, Chiu AP, He D (2017). Effects of reactive social distancing on the 1918 influenza pandemic. PloS One, 12(7): e0180545. https://doi.org/10.1371/journal.pone.0180545
Yu D, Zhu G, Wang X, Zhang C, Soltanalizadeh B, Wang X, et al. (2021). Assessing effects of reopening policies on COVID-19 pandemic in texas with a data-driven transmission model. Infectious Disease Modelling, 6: 461–473. https://doi.org/10.1016/j.idm.2021.02.001
Zhang K, Sun M, Han TX, Yuan X, Guo L, Liu T (2017). Residual networks of residual networks: Multilevel residual networks. IEEE Transactions on Circuits and Systems for Video Technology, 28(6): 1303–1314. https://doi.org/10.1109/TCSVT.2017.2654543