Journal of Data Science logo


Login Register

  1. Home
  2. Issues
  3. Volume 22, Issue 1 (2024)
  4. Neural Generalized Ordinary Differential ...

Journal of Data Science

Submit your article Information
  • Article info
  • More
    Article info

Neural Generalized Ordinary Differential Equations with Layer-Varying Parameters
Volume 22, Issue 1 (2024), pp. 10–24
Duo Yu   Hongyu Miao   Hulin Wu  

Authors

 
Placeholder
https://doi.org/10.6339/23-JDS1093
Pub. online: 23 February 2023      Type: Statistical Data Science      Open accessOpen Access

Received
15 December 2022
Accepted
19 February 2023
Published
23 February 2023

Abstract

Deep residual networks (ResNets) have shown state-of-the-art performance in various real-world applications. Recently, the ResNets model was reparameterized and interpreted as solutions to a continuous ordinary differential equation or Neural-ODE model. In this study, we propose a neural generalized ordinary differential equation (Neural-GODE) model with layer-varying parameters to further extend the Neural-ODE to approximate the discrete ResNets. Specifically, we use nonparametric B-spline functions to parameterize the Neural-GODE so that the trade-off between the model complexity and computational efficiency can be easily balanced. It is demonstrated that ResNets and Neural-ODE models are special cases of the proposed Neural-GODE model. Based on two benchmark datasets, MNIST and CIFAR-10, we show that the layer-varying Neural-GODE is more flexible and general than the standard Neural-ODE. Furthermore, the Neural-GODE enjoys the computational and memory benefits while performing comparably to ResNets in prediction accuracy.

Supplementary material

 Supplementary Material
Programming code to reproduce our results and figures can be found at https://github.com/Duo-Yu/Neural-GODE. In the Supplementary Material, we list the code directories and corresponding results.

References

 
Abdeltawab H, Shehata M, Shalaby A, Khalifa F, Mahmoud A, El-Ghar MA, et al. (2019). A novel cnn-based cad system for early assessment of transplanted kidney dysfunction. Scientific Reports, 9(1): 1–11. https://doi.org/10.1038/s41598-018-37186-2
 
Arnold VI (2012). Geometrical Methods in the Theory of Ordinary Differential Equations. Springer Science & Business Media.
 
Bahdanau D, Cho K, Bengio Y (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint https://arxiv.org/abs/1409.0473.
 
Bai S, Kolter JZ, Koltun V (2019). Deep equilibrium models. Advances in Neural Information Processing Systems, 32.
 
Bartels RH, Beatty JC, Barsky BA (1995). An Introduction to Splines for Use in Computer Graphics and Geometric Modeling. Morgan Kaufmann.
 
Bishop CM, et al. (1995). Neural Networks for Pattern Recognition. Oxford University Press.
 
Chang B, Chen M, Haber E, Chi EH (2019). Antisymmetricrnn: A dynamical system view on recurrent neural networks. arXiv preprint https://arxiv.org/abs/1902.09689.
 
Chang B, Meng L, Haber E, Ruthotto L, Begert D, Holtham E (2018). Reversible architectures for arbitrarily deep residual neural networks. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1).
 
Chen J, Wu H (2008). Efficient local estimation for time-varying coefficients in deterministic dynamic models with applications to hiv-1 dynamics. Journal of the American Statistical Association, 103(481): 369–384. https://doi.org/10.1198/016214507000001382
 
Chen RT, Rubanova Y, Bettencourt J, Duvenaud DK (2018). Neural ordinary differential equations. Advances in Neural Information Processing Systems, 31. https://doi.org/10.1007/978-3-030-04167-0
 
Chen RTQ (2018). torchdiffeq. https://github.com/rtqichen/torchdiffeq.
 
Cranmer M, Greydanus S, Hoyer S, Battaglia P, Spergel D, Ho S (2020). Lagrangian neural networks. arXiv preprint https://arxiv.org/abs/2003.04630.
 
Dupont E, Doucet A, Teh YW (2019). Augmented neural odes. Advances in Neural Information Processing Systems, 32.
 
Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, et al. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639): 115–118. https://doi.org/10.1038/nature21056
 
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems, 27.
 
Graves A, Mohamed Ar Hinton G (2013). Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech And Signal Processing. 6645–6649.
 
Greydanus S, Dzamba M, Yosinski J (2019). Hamiltonian neural networks. Advances in Neural Information Processing Systems, 32.
 
Günther S, Pazner W, Qi D (2021). Spline parameterization of neural network controls for deep learning. arXiv preprint https://arxiv.org/abs/2103.00301.
 
Haber E, Ruthotto L (2017). Stable architectures for deep neural networks. Inverse Problems, 34(1): 014004. https://doi.org/10.1088/1361-6420/aa9a90
 
He K, Zhang X, Ren S, Sun J (2016a). Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
 
He K, Zhang X, Ren S, Sun J (2016b). Identity mappings in deep residual networks. In: European Conference on Computer Vision. 630–645.
 
Hornik K, Stinchcombe M, White H (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5): 359–366. https://doi.org/10.1016/0893-6080(89)90020-8
 
Ioffe S, Szegedy C (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning. 448–456.
 
Kim K, Kim S, Lee YH, Lee SH, Lee HS, Kim S (2018). Performance of the deep convolutional neural network based magnetic resonance image scoring algorithm for differentiating between tuberculous and pyogenic spondylitis. Scientific Reports, 8(1): 1–10. https://doi.org/10.1038/s41598-018-35713-9
 
Krizhevsky A, Hinton G, et al. (2009). Learning multiple layers of features from tiny images, Master’s Thesis, University of Tront.
 
LaSalle JP (1968). Stability theory for ordinary differential equations. Journal of Differential Equations, 4(1): 57–65. https://doi.org/10.1016/0022-0396(68)90048-X
 
LeCun Y, Bottou L, Bengio Y, Haffner P (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11): 2278–2324. https://doi.org/10.1109/5.726791
 
Li Q, Chen L, Tai C, et al. (2017). Maximum principle based algorithms for deep learning. arXiv preprint https://arxiv.org/abs/1710.09513.
 
Li Q, Lin T, Shen Z (2019). Deep learning via dynamical systems: An approximation perspective. arXiv preprint https://arxiv.org/abs/1912.10382.
 
Liang H, Miao H, Wu H (2010). Estimation of constant and time-varying dynamic parameters of hiv infection in a nonlinear differential equation model. The Annals of Applied Statistics, 4(1): 460. https://doi.org/10.1214/09-AOAS290
 
Lim SH (2021). Understanding recurrent neural networks using nonequilibrium response theory. Journal of Machine Learning Research, 22: 1–47.
 
Lim SH, Erichson NB, Hodgkinson L, Mahoney MW (2021). Noisy recurrent neural networks. Advances in Neural Information Processing Systems, 34: 5124–5137.
 
Long J, Shelhamer E, Darrell T (2015). Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3431–3440.
 
Lu Y, Zhong A, Li Q, Dong B (2018). Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations. In: International Conference on Machine Learning. 3276–3285.
 
Massaroli S, Poli M, Park J, Yamashita A, Asama H (2020). Dissecting neural odes. Advances in Neural Information Processing Systems, 33: 3952–3963.
 
Miao H, Wu H, Xue H (2014). Generalized ordinary differential equation models. Journal of the American Statistical Association, 109(508): 1672–1682. https://doi.org/10.1080/01621459.2014.957287
 
Noda K, Yamaguchi Y, Nakadai K, Okuno HG, Ogata T (2015). Audio-visual speech recognition using deep learning. Applied Intelligence, 42(4): 722–737. https://doi.org/10.1007/s10489-014-0629-7
 
Perperoglou A, Sauerbrei W, Abrahamowicz M, Schmid M (2019). A review of spline function procedures in r. BMC Medical Research Methodology, 19(1): 1–16. https://doi.org/10.1186/s12874-018-0650-3
 
Qiu Z, Yao T, Mei T (2017). Learning spatio-temporal representation with pseudo-3d residual networks. In: Proceedings of the IEEE International Conference on Computer Vision. 5533–5541.
 
Queiruga A, Erichson NB, Hodgkinson L, Mahoney MW (2021). Stateful ode-nets using basis function expansions. Advances in Neural Information Processing Systems, 34: 21770–21781.
 
Queiruga AF, Erichson NB, Taylor D, Mahoney MW (2020). Continuous-in-depth neural networks. arXiv preprint https://arxiv.org/abs/2008.02389.
 
Ripley BD (2007). Pattern Recognition and Neural Networks. Cambridge University Press.
 
Rusch TK, Mishra S (2020). Coupled oscillatory recurrent neural network (cornn): An accurate and (gradient) stable architecture for learning long time dependencies. arXiv preprint https://arxiv.org/abs/2010.00951.
 
Rusch TK, Mishra S (2021). Unicornn: A recurrent model for learning very long time dependencies. In: International Conference on Machine Learning, 9168–9178. PMLR.
 
Ruthotto L, Haber E (2020). Deep neural networks motivated by partial differential equations. Journal of Mathematical Imaging and Vision, 62(3): 352–364. https://doi.org/10.1007/s10851-019-00903-1
 
Sigaki HY, Lenzi EK, Zola RS, Perc M, Ribeiro HV (2020). Learning physical properties of liquid crystals with deep convolutional neural networks. Scientific Reports, 10(1): 1–10. https://doi.org/10.1038/s41598-019-56847-4
 
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, et al. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587): 484–489. https://doi.org/10.1038/nature16961
 
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, et al. (2017). Mastering the game of go without human knowledge. Nature, 550(7676): 354–359. https://doi.org/10.1038/nature24270
 
Simmons GF (2016). Differential Equations with Applications and Historical Notes. CRC Press.
 
Tang S, Tang B, Wang A, Xiao Y (2015). Holling ii predator–prey impulsive semi-dynamic model with complex poincaré map. Nonlinear Dynamics, 81(3): 1575–1596. https://doi.org/10.1007/s11071-015-2092-3
 
Weinan E (2017). A proposal on machine learning via dynamical systems. Communications in Mathematics and Statistics, 1(5): 1–11.
 
Wu Y, He K (2018). Group normalization. In: Proceedings of the European Conference on Computer Vision (ECCV). 3–19.
 
Xue H, Miao H, Wu H (2010). Sieve estimation of constant and time-varying coefficients in nonlinear ordinary differential equation models by considering both numerical error and measurement error. Annals of Statistics, 38(4): 2351. https://doi.org/10.1214/09-AOS784
 
Young T, Hazarika D, Poria S, Cambria E (2018). Recent trends in deep learning based natural language processing. IEEE Computational Intelligence Magazine, 13(3): 55–75. https://doi.org/10.1109/MCI.2018.2840738
 
Yu D, Deng L (2016). Automatic Speech Recognition, volume 1. Springer.
 
Yu D, Lin Q, Chiu AP, He D (2017). Effects of reactive social distancing on the 1918 influenza pandemic. PloS One, 12(7): e0180545. https://doi.org/10.1371/journal.pone.0180545
 
Yu D, Tang S, Lou Y (2016). Revisiting logistic population model for assessing periodically harvested closures. Communications in Mathematical Biology and Neuroscience, 2016: Article ID 14.
 
Yu D, Yaseen A, Luo X (2020). Neural network and deep learning methods for ehr data. In: Statistics and Machine Learning Methods for EHR Data (H Wu, JM Yamal, A Yaseen, V Maroufy, eds.), 253–271. Chapman and Hall/CRC.
 
Yu D, Zhu G, Wang X, Zhang C, Soltanalizadeh B, Wang X, et al. (2021). Assessing effects of reopening policies on COVID-19 pandemic in texas with a data-driven transmission model. Infectious Disease Modelling, 6: 461–473. https://doi.org/10.1016/j.idm.2021.02.001
 
Zhang K, Sun M, Han TX, Yuan X, Guo L, Liu T (2017). Residual networks of residual networks: Multilevel residual networks. IEEE Transactions on Circuits and Systems for Video Technology, 28(6): 1303–1314. https://doi.org/10.1109/TCSVT.2017.2654543
 
Zhang T, Yao Z, Gholami A, Gonzalez JE, Keutzer K, Mahoney MW, et al. (2019). Anodev2: A coupled neural ode framework. Advances in Neural Information Processing Systems, 32.
 
https://arxiv.org/abs/1909.12077

PDF XML
PDF XML

Copyright
2024 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
B-splines deep residual networks neural ordinary differential equations

Funding
This work was supported in part by NIH grant R01 AI087135 and P03AI161943 (HW), grant from Cancer Prevention and Research Institute of Texas (PR170668) (HW), grant NSF/ECCS 2133106 (HM), and NSF/DMS 1620957 (HM).

Metrics
since February 2021
1193

Article info
views

497

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy