A Review on Optimal Subsampling Methods for Massive Datasets

Subsampling is an eﬀective way to deal with big data problems and many subsampling approaches have been proposed for diﬀerent models, such as leverage sampling for linear regression models and local case control sampling for logistic regression models. In this article, we focus on optimal subsampling methods, which draw samples according to optimal subsampling probabilities formulated by minimizing some function of the asymptotic distribution. The optimal subsampling methods have been investigated to include logistic regression models, softmax regression models, generalized linear models, quantile regression models, and quasi-likelihood estimation. Real data examples are provided to show how optimal subsampling methods are applied.


Introduction
As we step into the big data era, more and more attention is focused on how to deal with data with enormous size and complex frame under limited computational resources. In the field of statistics, various techniques were developed to analyze massive datasets, such as divideand-conquer method (Lin and Xie, 2011), online updating for streaming data (Schifano et al., 2016), stochastic gradient descent (Toulis et al., 2017), random projection (Drineas et al., 2011;Mahoney, 2011) and subsampling (Drineas et al., 2006;Wang et al., 2018Wang et al., , 2019. Subsampling method draws a subdata set from the full dataset and estimates the interested parameters by the chosen subdata. The fundamental concern of the subsampling method is how to select the subdata. The more informative observations we choose, the better approximation performance we could expect. Hence, uniform subsampling is not preferred because every observations are treated equally no matter how much information one observation carries. For linear regression, leverage sampling has been discussed by Drineas et al. (2006); Mahoney (2011), and it was named algorithm leveraging in ; Ma and Sun (2015). The subsamples obtained by this method are drawn from the full dataset with replacement based on the normalized leverage scores or their variants. The asymptotic normality and asymptotic unbiasedness of the leveraging sampling estimator were studied in Ma et al. (2020). The leveraged volume sampling was proposed by Derezinski et al. (2018) for linear regression, which yields an unbiased coefficient estimator and has the same tail bonds as leverage sampling. Besides these probabilistic methods for linear models, a deterministic method named information-based optimal subdata selection (IBOSS) was proposed by Wang et al. (2019) aiming at finding a subdata that has maximal information matrix under D-optimality. This method is also applicable under divide-and-conquer setting, which was discussed in (Wang, 2019a). The IBOSS approach was extended to include the logistic regression in Cheng et al. (2020). The local case control sam-pling for logistic regression was proposed by Fithian and Hastie (2014), which draws samples by Poisson subsampling and determines whether one observation is in or not in the sample using information from both the response and covariates. By extending the idea of the local case control sampling, a local uncertainty sampling algorithm was introduced by Han et al. (2020) for softmax regression. Pronzato and Wang (2021) proposed an algorithm for steaming data where the subdata is selected sequentially based on the estimated quantile.
Optimal subsampling method is a probabilistic approach, where subsamples are expected to be drawn based on the optimal subsampling probabilities that are derived by minimizing the asymptotic covariance matrix of the random sampling based estimators under certain optimality criterion. The optimal subsampling method for logistic regression was introduced by Wang et al. (2018), which formulates the optimal subsampling probabilities by minimizing the asymptotic mean squared error (MSE) of the subsample estimator. Since the expressions of the optimal subsampling probabilities involves the maximum likelihood estimator (MLE) of the full data, the authors proposed a two-stage adaptive algorithm which uses a pilot sample estimator to substitute the full data MLE. This method was named as optimal subsampling methods motivated from the A-optimality criterion (OSMAC), and was improved in Wang (2019b) by adopting unweighted target functions for subsamples and Poisson subsampling. In addition to logistic regression, OSMAC was investigated to include softmax regression (Yao and Wang, 2018), generalized linear models (Ai et al., 2019), quantile regression (Wang and Ma, 2020) and quasilikelihood (Yu et al., 2020). This article aims at introducing the optimal subsampling method and illustrates its practical implements in R (R Core Team, 2020) with the following real data examples.
-Income dataset (Dua and Graff, 2017). This dataset was extracted from 1994 Census database and aimed at predicting whether one person's annual income is over 50000 or not based on various personal information such as age, education level, gender and financial situation. -Bike sharing dataset (Fanaee-T and Gama, 2014). Bike sharing system monitors bike rental situation hourly. It records the hourly weather information and working day information. This dataset intends to modeling the hourly bike rental numbers under different conditions. -Physicochemical properties of protein tertiary structure dataset (Dua and Graff, 2017). This dataset was extracted from Critical Assessment of protein Structure Prediction and provides information of the protein structure. We are going to model the association between the size of the residue and other given information of the protein.
The rest of the paper is organized as follows. Section 2 talks about the adaptive optimal subsampling method for logistic regression and softmax regression. Section 3 presents more efficient algorithms for logistic regression by introducing unweighted estimator and Poisson subsampling into the adaptive optimal subsampling method. Section 4 discusses the adaptive optimal subsampling method for generalized linear models. Section 5 shows the application of optimal subsampling for quantile regression. A brief summary is presented in Section 6.

Optimal Subsampling Methods under the A-optimality Criterion
Suppose that {x i , y i } N i=1 are N independent and identically distributed observations, where x i ∈ R d , i = 1, 2, ..., N , are covariates, and y i , i = 1, 2, ..., N , are responses. For a logistic regression, Algorithm 1 General Subsampling Algorithm.

Subsampling with replacement:
-Assign subsampling probabilities {π i } N i=1 to each observation. -Draw n data points with replacement based on {π i } N i=1 , and denoted the subsample as (1) 1} is a binary variable. Given x i , the response y i satisfies that where β ∈ R d is the unknown regression coefficient, and can be estimated by the MLEβ MLE , which is the maximizer of This optimization problem can be solved by the Newton-Raphson method in O(ηNd 2 ) time where η is the number of iterations for the Newton-Raphson method to converge. To reduce the computational burden when N is large, an optimal subsampling method named OSMAC targeting at approximating the full data MLEβ MLE was proposed in Wang et al. (2018). To begin with, we introduce the general subsample estimator obtained by a subsample drawing from the full dataset with arbitrary subsampling probabilities {π i } N i=1 in Algorithm 1. It has been proved thatβ sub is consistent toβ MLE and the approximation errorβ sub −β MLE is asymptotically normal conditional on the full data. The underlying idea of the OSMAC is to find the optimal subsampling probabilities which minimize the asymptotic variance-covariance matrix ofβ sub −β MLE , denoted as V N . To compare matrices, A-optimality criterion is adopted, which minimizes the trace of this asymptotic variance-covariance matrix. The optimal subsampling probabilities under A-optimality criterion are To reduce the computational burden, L-optimality is also considered, intending to minimize the trace of the asymptotic variancecovariance matrix of M L (β sub −β MLE ). Thus, the L-optimal subsampling probabilities minimize tr(M T L V N M T L ) and have expressions Both A-and L-optimal subsampling probabilities depend on the responses and covariates, and containβ MLE , which is the quantity that we are approximating. To solve this problem, a pilot Algorithm 2 Adaptive optimal subsampling algorithm for logistic regression. Pilot sampling: -Run Algorithm 1 with subsample size n 0 and subsampling probabilities π 0 i . Obtain the pilot subsample estimatorβ sub,0 . -Store the pilot subsample and the corresponding subsampling probabilities

Second step sampling:
-Calculate the approximate optimal subsampling probabilitieŝ under selected optimality criterion, wherê -Run Algorithm 1 with subsample size n 1 and subsampling probabilities {π -Record the second step subsample and the corresponding subsampling probabilities {x * 1 i , y * 1 i , π * 1 i } n 1 i=1 . Estimation: Combine pilot sample and second step sample, and denote the combined sample as {x * i , y * i , π * i } n 0 +n 1 i=1 . Obtain the final estimatorβ OS by maximizing * sample estimator is used to substituteβ MLE in (2) and (3). The pilot sample can be drawn from the full dataset by uniform subsampling or case control subsampling whose subsampling probabilities are π 0 . The OSMAC is summarized in Algorithm 2.
Theorem 6 in Wang et al. (2018) has proved the asymptotic normality ofβ OS conditionally on the full data and the pilot sample estimator. The convergence rate is at the order of n −1/2 1 , which is not related to the full data size. This means that even the full data size increases, the information contained in the subsample may not change. In addition, Algorithm 2 is an adaptive algorithm in that the approximately optimal subsample probabilities rely on the pilot sample estimator. Thus an inaccurate pilot sample estimator may affect the accuracy of the final estimator. Algorithm 2 greatly reduces the computational cost compared with the full data computation, but still needs to process every observation in the full dataset when calculating the approximately optimal subsampling probabilities, making the computational time at the order of N .
For faster calculation, the variance-covariance matrix ofβ OS can be estimated bỹ where Note that the Algorithm 2 is built under the circumstance that the regression model is correctly specified. Another thing is that, when practically implementing Algorithm 2, the second stage sample size should be always much larger than the pilot sample size. This is a theoretical assumption ensuring the asymptotic normality ofβ OS , and it makes the second stage sample much more influential to the subsample target function. These two statements are applicable to all optimal subsampling methods in this article.

Optimal Subsampling Method for Softmax Regression
The OSMAC was investigated to include softmax regression, which is also called multinomial logistic regression, in Yao and Wang (2018). Suppose that the categorical response of the softmax regression contains K + 1 distinct outcomes, say y i ∈ {0, 1, ..., K}. The softmax regression has the following form where β k is the unknown coefficient for category k. Let β 0 = 0 for identifiability. The unknown parameter for the whole model is denoted as β = (β T 1 , β T 2 , ..., β T K ) T , and (7) becomes .
Under this model, the log-likelihood function for the observed dataset Maximizing this log-likelihood function, we can obtain the full data MLEβ MLE though Newon-Raphson method. By deriving the variance-covariance matrix of a general subsample estimator for softmax regression, the optimal subsampling probabilities are , under A-optimality criterion, and With the strategy that uses pilot sample estimator to replaceβ MLE when calculating optimal subsampling probabilities, we have the adaptive optimal subsampling algorithm for softmax regression.

Income Dataset
The behavior of Algorithm 2 is illustrated by the income dataset (Dua and Graff, 2017), which contains 48842 observations in total. The response is an indicator variable which shows whether one person's income is over 50K or not, and around 24% of participants have income exceeding 50K. We use 5 continuous covariates to build the logistic model, which are age, final weight (fnlwgt), education (edu), capital loss (loss) and working hours per week (hours). The original dataset was partitioned into training dataset and test dataset. We combined these two datasets, selected variables involving in the logistic model, and name this newly generated data as adult1.
Applying glm function in stats package (R Core Team, 2020) to adult1, we can obtain the coefficient estimator for the covariates using the following chunk of code.

##
In the following, we implemented Algorithm 2 to adult1 by function AdpOptSubLog, in which the subsample estimator is calculated through svyglm function from survey package (Lumley, 2020) along with weights option.
X <-cbind(1, as.matrix(adult1[, -dim(adult1)[2]])) y <-adult1$V15 set.seed(123) AdpOptSubLog(X, y, r0 = 500, r = 1000, optmethod = "A", data = adult1, covariate = "V1 + V3 + V5 + V12 + V13") In the function AdpOptSubLog, X is the covariate matrix, y is the response variable with numerical format, r0 stands for the pilot sample size, r stands for the second step sample size, and optmethod indicates the optimality criterion, which can be "A" and "L". The output gives coefficient estimators and estimated standard errors, along with the z statistics and p values used to test whether the MLE for the corresponding covariate equals to 0 or not. For an arbitrary β j , the z statistic is calculated by Ṽ OS jj is the estimated standard error and the estimated standard error is the squared root of j -th diagonal element ofṼ OS in (6).

More Efficient Optimal Subsampling for Logistic Regression
In this section, we introduce two approaches proposed by Wang (2019b) to improve the OSMAC, where the first one is to use unweighted subsample estimators and the other one is to adopt Poisson subsampling.

More Efficient Unweighted Estimator
In Algorithm 2,β OS is obtained by maximizing weighted target function because the expression of the optimal subsampling probabilities involves y i . From (1), we can see that data points with higher subsampling probabilities contribute relatively less towards the weighted target function. Note that the higher subsampling probability one data point has, the more information that observation carries. Thus, the weighted target function cannot utilize the information of a sample as efficient as an unweighted target function. Given a subsample {x * i , y * i } n i=1 , the general unweighted subsample estimatorβ sub uw proposed by Wang (2019b) is obtained by maximizing * Theβ sub uw is biased and a bias correction procedure is needed. Algorithm 3 summarizes how to implement unweighted estimator in the optimal subsampling method and how to correct the bias.

Poisson Subsampling
Besides subsampling with replacement, Poisson subsampling was considered in Wang (2019b). For Poisson subsampling, each observation is assigned to a subsampling probability and we decide to include a data point into a sample by conducting a Bernoulli trail with the assigned subsampling probability as the successful rate. The observations in the subsample drawn by Poisson subsampling can be independent to each other unconditionally to the full data. That means, we can calculate the subsampling probabilities for i-th observation and decide whether to include i-th observation into subsample only based on the information of the i-th data point. Whereas for the subsampling with replacement, we have to draw a large indexes of samples from N numbers with pre-specified subsampling probabilities. For enormously large N such that N exceeding the memory limit of the computer, the subsampling with replacement fails to be applied. Another advantage of Poisson subsampling is that no replicate observation exists in the subsample. Furthermore, the sample size is a random variable for Poisson subsampling, and we need to use the expected sample size to control it. The procedure of a general Poisson subsampling is described in Algorithm 4.
To keep all those features of Poisson subsampling, given a pilot sample with corresponding subsampling probabilities {x * 0 i , y * 0 i , π * 0 i } n * 0 i=1 and the pilot coefficient estimatorβ ps,0 , the approximated optimal subsampling probabilities under A-optimality and L-optimality criteria are

Estimation for pilot sampling:
-Obtain the unweighted estimatorβ sub,0 uw by maximizing * 0 -Correct bias and the pilot sample estimator isβ sub,0

Income Dataset
Algorithm 3 is realized by function AdpOptUWLog. The following code applies the function AdpOptUWLog to the income dataset. The standard errors are calculated from (10).
Algorithm 5 Efficient adaptive optimal subsampling algorithm using Poisson subsampling. Pilot sampling: -Run Algorithm 4 with expected sample size n 0 and subsampling probabilities π 0 i . -Obtain a pilot sample with sample size n * 0 , say

Second step sampling:
-Calculate the approximate optimal subsampling probabilities {π , where n * 1 is the true sample size. Estimation for second step sampling: -Obtainβ ps,1 uw for second step sample by maximizing * 1 -The second step estimator can be obtained by correcting bias, sayβ ps,1 uw =β ps,1 uw +β ps,0 uw . Combination: The final estimatorβ ps uw is obtained bỹ β ps uw = ¨ * 0 ps,uw (β ps,0 uw ) +¨ * 1 ps,uw (β ps,1 uw ) −1 ¨ * 0 ps,uw (β ps,0 uw )β ps,0 uw +¨ * 1 ps,uw (β ps,1 uw )β ps,1 uw , where¨ * 0 The variance-covariance matrix ofβ ps uw can be estimated bỹ V ps uw = ¨ * 0 ps,uw (β ps,0 uw ) +¨ * 1 ps,uw (β ps,1 uw ) Because the sample size for Poisson sampling is random, we record the true sample size in both stages. In AdpOptPosLog, r0 and r, which are the expected pilot sample size and expected second stage sample size, respectively, are set to be 500 and 1000. We can see, in this example, the true pilot sample size is 493 and true second stage sample size is 1107. The optmethod could be "A", "L" and "LCC", where "LCC" represents the local case control sampling introduced in Fithian and Hastie (2014), and the second step estimator is used as the final estimator. When selecting optmethod = "LCC", r is not meaningful since the subsampling probabilities at second stage become |y i − p(x i ,β ps,0 uw )|. The expected sample size is determined by the discrepancy between the real value and estimated probabilities, and is at the same order of N . Table 1 compares the statistical efficiency and computing efficiency of the proposed algorithms with uniform subsampling and local case control sampling for the income dataset. The statistical efficiency is measured by MSE, where MSE is calculated by with S being the number of replications andβ * i being the final estimator of the targeted algorithm for i-th replication. All computations are processed on a MacBook Pro with a 2.5 GHz Intel Core i7 processor and 16 GB memory. Table 1 shows that the uniform subsampling takes the least time since only one sampling step is involved and no need to compute the subsampling probabilities. The performances of all proposed algorithms in estimation efficiency are better than the uniform subsampling. Among these approaches, the local case control sampling is the most efficient one in approximatingβ MLE because this method draws greatly more second step samples than others. As a consequence, the local case control sampling has a heavier computational burden than the proposed algorithms. Obviously, directly calculatingβ MLE with the full data is the most time consuming method. For the statistical efficiency of these three proposed algorithms, Algorithm 5 outperforms the other two, and Algorithm 2 is the least efficient one, indicating that using unweighted estimator and Poisson subsampling helps improve the estimation accuracy. In addition, it can be seen that Algorithms under L-optimality are less efficient in coefficient estimation but more efficient in terms of computing time than Algorithms under A-optimality.

Optimal Subsampling Method for Generalized Linear Models
Consider a generalized linear model with expression where h(·), g(·) and c(·) are known functions. Theβ MLE can be obtained by maximizing through the Newton-Raphson method, which can be achieved in O(ηNd 2 ) time, where η is the number of iterations for the Newton-Raphson method to converge. Assign subsampling probabilities to each observation. Draw n observations with replacement and denote them as The subsample estimatorβ glm sub is obtained by maximizing the weighted target function * By minimizing the asymptotic MSE ofβ glm sub , the optimal subsampling probabilities under Aoptimality criterion are whereċ(·) andġ(·) are the first-order derivatives of c(·) and g(·); and withc(·) andg(·) being the second-order derivatives of c(·) and g(·). The optimal subsampling probabilities under L-optimality criterion are We need O(Nd 2 ) time to compute π optA glm,i (β MLE ) and O(Nd) time to compute π optL glm,i (β MLE ). From (15), we can see that the weighted target function is easily inflated by extreme small subsampling probabilities. To solve this, the authors in Ai et al. (2019) used a threshold to constraint the value of |y i −ċ{g(x T i β)}| from below. In such way, given a pilot sample estimatorβ glm,0 and a pre-specified threshold δ, the approximated optimal subsampling probabilities arê π optA glm,i (β glm,0 ) = under A-optimality criterion and π optL glm,i (β glm,0 ) = under L-optimality criterion. The adaptive optimal subsampling algorithm for generalized linear regression is summarized in Algorithm 6. It has been proved in Ai et al. (2019) that the resultant estimator of Algorithm 6 is asymptotically normal and the rate of convergence is O(n −1/2 1 ) under some mild assumptions.

Poisson Regression
Poisson regression is widely used for modeling count data, and is one of the generalized linear models. Under (14), the poisson regression has h(y i ) = 1/(y i !), g(x T i β) = x T i β and c(·) = exp(·), and is of the form Given a prior estimatorβ glm,0 , the approximated optimal subsampling probabilities in (18) and where (21), (22) and (23) into Algorithm 6, and we can have the adaptive optimal subsampling algorithm for poisson regression.

Bike Sharing Dataset
The bike sharing dataset, which models the number of bikes rented hourly under different conditions, is used to demonstrate the effectiveness of the Algorithm 6 to the poisson regression. This dataset contains 17379 observations, and 4 covariates are included to the model, consisting of a binary variable "workingday" to indicate whether a certain day is a working day or not, 3 continuous variables which are "temp" (temperature), "hum" (humidity) and "windspeed" (windspeed). The organized dataset is named hour1 and the coefficient estimator for hour1 is computed by glm using family = "poisson". The following code shows how to obtain the MLE for the full dataset.

Second step sampling:
-Calculate the approximate optimal subsampling probabilities {π optA glm,i (β glm,0 )} N i=1 or π optL glm,i (β glm,0 ) based on (18) or (19). -Draw n 1 samples with replacement based on those approximate optimal subsampling probabilities. -Record the second step subsample and the corresponding subsampling probabilities {x * 1 i , y * 1 , π * 1 i } n 1 i=1 . Estimation: Combine the pilot sample and second stage sample and denote it as Estimate the variance-covariance matrix ofβ glm bỹ where hour <-read.csv("Code/hour.csv") hour1 <-subset(hour, select = c("workingday", "temp", "hum", "windspeed", "cnt", NULL)) hour.glm <-glm(cnt~., data = hour1, family = "quasipoisson") summary(hour.glm) We choose quasipoisson for the family option in glm function to deal with the overdispersion problem for the bike sharing dataset. The small p values show that every covariate is significant to the model at 5% significance level. As we can see, the expected count of rented bikes in working days is greater than that in non-working days. The increase of temperature or windspeed has a positive influence to the number of rented bikes, and the increase of humidity has a negative effect on the number of rented bikes.

##
Next, we implement function AdpOptSubPoi, which is coded by Algorithm 6, to the bike sharing dataset using the following code. The above result is given by setting the pilot sample size as 200 and the second stage sample size as 500 under A-optimality criterion. The option dalta.quant = 0.05 indicates that δ is chosen as the 5% quantile of |y i − exp(x T iβ glm,0 )|. The weighted subsample estimator is obtained by glm using weights option and the standard errors are estimated using (20).
To demonstrate the effectiveness of the proposed algorithm, we compare the MSE and running time of different methods. Table 2 shows that Algorithm 6 is better than uniform subsampling in estimation accuracy, and is computationally more efficient compared with the full data computation.

Optimal Subsampling Method for Quantile Regression
The adaptive optimal subsampling algorithm for quantile regression was discussed in Wang and Ma (2020). The quantile regression estimates a specified quantile of the response variable conditional on the covariate variable, and has form q τ (y i |x i ) = x T i β, where τ represents that the τ -th quantile of y i given x i is measured. The full data estimator can be solved in O(N 5/2 d 3 ) time by interior point method (Portnoy et al., 1997). Draw a subsample with size n based on the probability distribution {π i } N i=1 , and record the sampled data with its subsampling probability as {x * i , y * i , π * i } n i=1 . The subsample estimatorβ qr sub is obtained by minimizing The optimal subsampling probabilities under A-optimality are The difficulty to estimate f (0, x i ) makes A-optimal subsampling probabilities hard to compute. Thus, for quantile regression, L-optimal subsampling probabilities are more favorable, which are The time complexity for computing π optL qr,i (β) is O(Nd). Based on the L-optimal subsampling probabilities, the authors proposed an iteratively adaptive optimal algorithm to obtain the coefficient estimator and its estimated variance. This algorithm is stated in Algorithm 7. It has shown that the rate of convergence of the final estimator is (n 1 R) −1/2 in Wang and Ma (2020). From the result, we know that, at 5% significance level, the seventh covariate (Euclidian distance) is not significant to the model and all others are significant.
Algorithm 7 is realized by QuanSub as follows, in which the option r0 and r are pilot sample size and second stage sample size, respectively, working as n 0 and n 1 in the Algorithm 7, and RR is the same as R in the Algorithm 7. The option tau = 0.75 indicates that we are modeling 75-th quantile of the size of the residue based on the covariates. The optmethod can be L and uniform, which implies optimal subsampling under L-optimality criterion and uniform subsampling, respectively. X <-cbind(1, as.matrix(casp[, -1])) y <-casp$RMSD QuanSub(X, y, r0 = 200, r = 1000, RR = 10, tau = 0.75, optmethod = "L") 2.866351e-02 2.840549e-03 10.090833 6.065278e-24 ## beta8 -1.329846e-01 3.951277e-02 -3.365611 7.637438e-04 The standard errors are obtained from (26), and z statistics and p values are to test whether the true value of corresponding parameter equals to 0 or not, where z statistics are acquired by dividing coefficient estimators by standard errors. All p values are small demonstrating that every parameter is significant under a relatively low significance level.
We also compare the performance of Algorithm 7 with uniform subsampling. Table 3 indicates that, comparing with the uniform subsampling, Algorithm 7 is more efficient in estimation accuracy. Even though Algorithm 7 takes more time in computing than uniform subsampling, it is still computationally more efficient compared with full data calculation.

Summary
In this paper, we demonstrate the effectiveness of the optimal subsampling methods to reduce the computational burden for massive datasets, and illustrate the application of the optimal subsampling methods to logistic regression, generalized linear models and quantile regression by real data examples. The coefficient estimators obtained by the optimal subsampling methods always maintain nice statistical properties, such as consistency and asymptotic normality, making it possible to perform statistical inferences, including making hypothesis tests and constructing confidence intervals, based on the subsample.
This review focuses on the application of optimal subsampling methods, and the discussion mainly focuses on presenting optimal subsampling probabilities and practical algorithms. Theoretical properties of the resultant coefficient estimators are not discussed in details. In practical applications, problems more complex than what we have discussed can occur, and further efforts are necessary to develop suitable sampling approaches. Subsampling for big data is a promising method for estimation efficiency and computational efficiency tradeoffs. It is quite now, and much work is needed. We hope this review can be a starting point for practitioners to use the optimal subsampling methods.

Supplementary Material
The R functions mentioned in the paper for the optimal subsampling algorithms and all datasets can be found on the Journal of Data Science website.