Accelerating Fixed-Point Algorithms in Statistics and Data Science: A State-of-Art Review

Tang, Bohao; Henderson, Nicholas C.; Varadhan, Ravi

doi:10.6339/22-JDS1051

Journal of Data Science

Accelerating Fixed-Point Algorithms in Statistics and Data Science: A State-of-Art Review

Volume 21, Issue 1 (2023), pp. 1–26

Bohao Tang Nicholas C. Henderson Ravi Varadhan

https://doi.org/10.6339/22-JDS1051

Pub. online: 19 July 2022 Type: Data Science Reviews

Open Access

Received
7 March 2022

Accepted
25 May 2022

Published
19 July 2022

Abstract

Fixed-point algorithms are popular in statistics and data science due to their simplicity, guaranteed convergence, and applicability to high-dimensional problems. Well-known examples include the expectation-maximization (EM) algorithm, majorization-minimization (MM), and gradient-based algorithms like gradient descent (GD) and proximal gradient descent. A characteristic weakness of these algorithms is their slow convergence. We discuss several state-of-art techniques for accelerating their convergence. We demonstrate and evaluate these techniques in terms of their efficiency and robustness in six distinct applications. Among the acceleration schemes, SQUAREM shows robust acceleration with a mean 18-fold speedup. DAAREM and restarted-Nesterov schemes also demonstrate consistently impressive accelerations. Thus, it is possible to accelerate the original fixed-point algorithm by using one of SQUAREM, DAAREM, or restarted-Nesterov acceleration schemes. We describe implementation details and software packages to facilitate the application of the acceleration schemes. We also discuss strategies for selecting a particular acceleration scheme for a given problem.

Supplementary material

Supplementary Material

1. We provide an R package AccelBenchmark, available from Github that can be used to reproduce the experiments in this paper. Please check the vignette and the demo.R file under vignette folder for reference. 2. We provide an additional pdf file that includes a) some basic properties of fixed point iterations; b) visualization of the convergence for all the experiments and c) additional analysis for the Sinkhorn scaling problem.

References

Altschuler J, Weed J, Rigollet P (2017). Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration. arXiv preprint: https://arxiv.org/abs/1705.09634.

Anderson DG (1965). Iterative procedures for nonlinear integral equations. Journal of the ACM (JACM), 12(4): 547–560.

Atkinson KE (1976). A Survey of Numerical Methods for the Solution of Fredholm Integral Equations of the Second Kind, volume 16. Society for Industrial and Applied Mathematics, Philadelphia.

Berlinet A, Roland C (2009). Parabolic acceleration of the EM algorithm. Statistics and Computing, 19(1): 35–47.

Bobb JF, Varadhan R (2021). turboEM: A Suite of Convergence Acceleration Schemes for EM, MM and Other Fixed-Point Algorithms. R package version 2021.1.

Bottolo L, Richardson S, et al. (2010). Evolutionary stochastic search for Bayesian model exploration. Bayesian Analysis, 5(3): 583–618.

Boyd S, Boyd SP, Vandenberghe L (2004). Convex Optimization. Cambridge University Press.

Carbonetto P, Stephens M, et al. (2012). Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies. Bayesian Analysis, 7(1): 73–108.

Chipman H, George EI, McCulloch RE, Clyde M, Foster DP, Stine RA (2001). The practical implementation of Bayesian model selection. Lecture Notes-Monograph Series, 65–134.

Clyde MA, Ghosh J, Littman ML (2011). Bayesian adaptive sampling for variable selection and model averaging. Journal of Computational and Graphical Statistics, 20(1): 80–101.

Cook J, Sutskever I, Mnih A, Hinton G (2007). Visualizing similarity data with a mixture of maps. In: Artificial Intelligence and Statistics, 67–74. PMLR.

Dempster AP, Laird NM, Rubin DB (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1): 1–22.

Du Y, Varadhan R (2020). SQUAREM: An R package for off-the-shelf acceleration of EM, MM and other EM-like monotone algorithms. Journal of Statistical Software, 92: 1–41.

Evans C, Pollock S, Rebholz LG, Xiao M (2020). A proof that Anderson acceleration improves the convergence rate in linearly converging fixed-point methods (but not in those converging quadratically). SIAM Journal on Numerical Analysis, 58(1): 788–810.

Fang Hr, Saad Y (2009). Two classes of multisecant methods for nonlinear acceleration. Numerical Linear Algebra with Applications, 16(3): 197–221.

Friedman JH (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 1189–1232.

Geist M, Scherrer B (2018). Anderson acceleration for reinforcement learning. arXiv preprint: https://arxiv.org/abs/1809.09501.

George EI, McCulloch RE (1997). Approaches for Bayesian variable selection. Statistica Sinica, 339–373.

Guyon I, Gunn SR, Ben-Hur A, Dror G (2004). Result analysis of the NIPS 2003 feature selection challenge. In: NIPS, volume 4, 545–552.

Hasannasab M, Hertrich J, Laus F, Steidl G (2021). Alternatives to the EM algorithm for ML estimation of location, scatter matrix, and degree of freedom of the Student t distribution. Numerical Algorithms, 87(1): 77–118.

Hasselblad V (1966). Estimation of parameters for a mixture of normal distributions. Technometrics, 8(3): 431–444.

Henderson N, Varadhan R (2020). daarem: Damped Anderson Acceleration with Epsilon Monotonicity for Accelerating EM-Like Monotone Algorithms. R package version 0.5.

Henderson NC, Varadhan R (2019). Damped Anderson acceleration with restarts and monotonicity control for accelerating EM and EM-like algorithms. Journal of Computational and Graphical Statistics, 28(4): 834–846.

Higham NJ, Strabić N (2016). Anderson acceleration of the alternating projections method for computing the nearest correlation matrix. Numerical Algorithms, 72(4): 1021–1042.

Hobolth A, Guo Q, Kousholt A, Jensen JL (2020). A unifying framework and comparison of algorithms for non-negative matrix factorisation. International Statistical Review, 88(1): 29–53.

Hunter DR, Lange K (2004). A tutorial on MM algorithms. The American Statistician, 58(1): 30–37.

Jin Y, Tam OH, Paniagua E, Hammell M (2015). TEtranscripts: A package for including transposable elements in differential expression analysis of RNA-seq datasets. Bioinformatics, 31(22): 3593–3599.

Junkins JL, Bani Younes A, Woollands RM, Bai X (2013). Picard iteration, chebyshev polynomials and chebyshev-picard methods: Application in astrodynamics. The Journal of the Astronautical Sciences, 60(3): 623–653.

Knight PA (2008). The Sinkhorn–Knopp algorithm: Convergence and applications. SIAM Journal on Matrix Analysis and Applications, 30(1): 261–275.

Liu C, Rubin DB (1995). ML estimation of the multivariate t distribution with unknown degrees of freedom. Statistica Sinica, 5: 19–39.

Mena G, Belanger D, Linderman S, Snoek J (2018). Learning latent permutations with gumbel-sinkhorn networks. arXiv preprint: https://arxiv.org/abs/1802.08665.

Nesterov Y (2013). Gradient methods for minimizing composite functions. Mathematical Programming, 140(1): 125–161.

O’Donoghue B, Candes E (2015). Adaptive restart for accelerated gradient schemes. Foundations of Computational Mathematics, 15(3): 715–732.

Parlett B, Landis TL (1982). Methods for scaling to doubly stochastic form. Linear Algebra and its Applications, 48: 53–79.

Raket LL, Grimme B, Schöner G, Igel C, Markussen B (2016). Separating timing, movement conditions and individual differences in the analysis of human movement. PLoS Computational Biology, 12(9): e1005092.

Raydan M, Svaiter BF (2002). Relaxed steepest descent and cauchy-barzilai-borwein method. Computational Optimization and Applications, 155–167.

Shiraishi Y, Tremmel G, Miyano S, Stephens M (2015). A simple model-based approach to inferring and visualizing cancer mutation signatures. PLoS genetics, 11(12): e1005657.

Sinkhorn R, Knopp P (1967). Concerning nonnegative matrices and doubly stochastic matrices. Pacific Journal of Mathematics, 21(2): 343–348.

Song J, Babu P, Palomar DP (2016). Sequence set design with good correlation properties via majorization-minimization. IEEE Transactions on Signal Processing, 64(11): 2866–2879.

Tibshirani R (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1): 267–288.

Toth A, Kelley CT (2015). Convergence analysis for Anderson acceleration. SIAM Journal on Numerical Analysis, 53(2): 805–819.

Tseng P (2009). On accelerated proximal gradient methods for convex-concave optimization. 2008, http://www.math.washington.edu/~tseng/papers/apgm.pdf.

Van der Maaten L, Hinton G (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(11).

Varadhan R, Roland C (2004). Squared extrapolation methods (SQUAREM): A new class of simple and efficient numerical schemes for accelerating the convergence of the EM algorithm. Department of Biostatistics Working Paper 63, Johns Hopkins University. https://biostats.bepress.com/jhubiostat/paper63/.

Varadhan R, Roland C (2008). Simple and globally convergent methods for accelerating the convergence of any EM algorithm. Scandinavian Journal of Statistics, 35(2): 335–353.

Walker HF, Ni P (2011). Anderson acceleration for fixed-point iterations. SIAM Journal on Numerical Analysis, 49(4): 1715–1735.

Yang Z, Peltonen J, Kaski S (2015). Majorization-minimization for manifold embedding. In: Artificial Intelligence and Statistics, 1088–1097. PMLR.

Zhang J, O’Donoghue B, Boyd S (2020). Globally convergent type-I Anderson acceleration for nonsmooth fixed-point iterations. SIAM Journal on Optimization, 30(4): 3170–3197.

Zhou H, Alexander D, Lange K (2011). A quasi-Newton acceleration for high-dimensional optimization algorithms. Statistics and Computing, 21(2): 261–273.

Zhu X, Stephens M (2018). Large-scale genome-wide enrichment analyses identify new trait-associated genes and pathways across 31 human phenotypes. Nature Communications, 9(1): 1–14.

2023 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.

Open access article under the CC BY license.

Keywords

convergence acceleration EM high dimensional models MM proximal gradient

Metrics

since February 2021

1346

Article info
views

588

PDF
downloads

RSS

Authors

Abstract

Supplementary material

References

Export citation

Copy and paste formatted citation

Download citation in file