Is the group structure important in grouped functional time series?

We study the importance of group structure in grouped functional time series. Due to the non-uniqueness of group structure, we investigate different disaggregation structures in grouped functional time series. We address a practical question on whether or not the group structure can affect forecast accuracy. Using a dynamic multivariate functional time series method, we consider joint modeling and forecasting multiple series. Illustrated by Japanese sub-national age-specific mortality rates from 1975 to 2016, we investigate one- to 15-step-ahead point and interval forecast accuracies for the two group structures.


Introduction
Because of the rapid advancements in medical technology and improvements in health services, many countries worldwide have been observing a continuous reduction of human mortality rates for decades. Developed OECD nations, such as Japan, consider the remarkable increase in population life expectancy as one of the greatest achievements of the last century (OECD, 2020).
The growing longevity risks in many countries put pressure on government policymakers and planners to develop sustainable pension, health, and aged care systems (see, e.g., Coulmas, 2007).
Traditional studies of human mortality focus on population data at the national level. In recent years, academics and government stakeholders have shown increasing interest in regional mortality improvements after realizing that sub-national forecasts of age-specific mortality rates help inform social and economic policies within local regions. Thus, any improvement in the forecast accuracy of mortality rates will be beneficial in determining the allocation of current and future resources at the national and sub-national levels.
Numerous mortality modeling methods have been proposed since the first publication of the Gompertz law in 1825 (see Booth and Tickle, 2008, for a comprehensive literature review on mortality modeling and forecasting). Only a few approaches could simultaneously forecast multiple regional mortality rates within a country (see, e.g., Jarner and Kryger, 2011;Cairns et al., 2011;Dowd et al., 2011;Russolillo et al., 2011;Hatzopoulos and Haberman, 2013;Shang, 2016;Villegas et al., 2017;Gao et al., 2019;Perla et al., 2021). The main hypothesis for multipopulation mortality models is that mortality rate differences for any two populations having similar socio-economic status and close connections with each other do not diverge indefinitely over time. This paper applies a dynamic multivariate functional time series method to produce accurate sub-national age-specific mortality forecasts.
Applying functional time series forecasting methods to sub-national mortality disaggregated by attributes such as sex and geographical regions independently does not ensure coherence in the forecasts. The forecasts of all sub-populations will not add up to the forecasts obtained by applying the technique to the national data. Hence, in practice, it is crucial to consider forecast reconciliation approaches in forecasting sub-national mortality rates (e.g., see Stone et al., 1942;Sefton and Weale, 2009).
We apply a dynamic multivariate functional time series method to simultaneously model a group of sub-national mortality rates and make forecasts. In particular, we aim to answer a practical issue: does the group structure matter in terms of forecast accuracy for a grouped functional time series? We first arrange the national and sub-national Japanese male and female age-specific mortality rates from 1975 to 2016 into two group structures. Then, we apply dynamic univariate and multivariate functional time series methods and compare their forecasting performance on some holdout data for the two group structures considered.
The rest of this paper is structured as follows. In Section 2, we describe the motivating dataset, which is the Japanese national and sub-national age-specific mortality rates. In Section 3, we describe a dynamic multivariate functional time series method for producing point and interval forecasts. Reconciled forecasts produced by the dynamic univariate and multivariate functional time series forecasting methods are evaluated and compared in Section 4. Conclusions are presented in Section 5, along with some reflections on ways to extend the techniques presented in this study.

Japanese age-specific mortality rates
The Japanese Mortality Database regularly recently publishes age-specific mortality rates over the period 1975-2016 at the national level and various subnational levels (Japanese Mortality Database, 2017). We consider ages from 0 to 99 in single years of age and include the last age group, "100+", as an aggregation of ages at and beyond 100. Figure 1 presents rainbow plots of the age-specific mortality rates of females and males in the studied period. Rainbow plots are designed to display the distant past in red curves and the more recent years in purple curves (Hyndman and Shang, 2010). Typical "J shape" mortality patterns of developed countries can be observed in both panels in Figure 1, which indicates a rapid decrease of mortality occurs for the newborns, with slower reduction continuing into the adolescence age. There is an "accident hump" in mortality for young adults around age 20, followed by a short "plateau" until the early 30s, and then a steady increase in middle-to-old age groups. Figure 1 reveals an overall reduction in the trend of total mortality rates over the considered period 1975-2016. However, the rate of decline appears to vary across different age groups.

Nonparametric smoothing
We implement a penalized regression spline smoothing method with a monotonic constraint to smooth out excess noise at higher ages (see also Hyndman and Ullah, 2007;Shang, 2016).
For a particular population j, let y j t (x i ) be the log central mortality rates observed at the beginning of each year for year t = 1, 2, . . . , n, where x i represents the center of each age or age group for i = 1, . . . , p. The value of j depends on the research question. For example, when modeling age-specific mortality rates of females and males in a particular prefecture, j takes two values: male or female. We assume there is an underlying continuous and smooth function f j t (x) that is observed at discrete data points with errors of the form where ε j t,i is an independent and identically distributed standard normal random variable for each age in year t, and δ j t (x i ) measures the variability in mortality at each age in year t for the j th population. Jointly, δ j t (x i )ε j t,i represents the smoothing error. Let m j t (x i ) = exp{y j t (x i )} be the observed central mortality rates for age x i in year t and define E j t (x i ) to be the total exposure to risks of j th population of age x i at 1 st January of year t. Then m j t (x i ) approximately follows a binomial distribution with estimated variance Since we are modeling the mortality rates at logarithm scale, we apply a Taylor approximation to y j t (x) = ln(m j t (x i )) to get Following Hyndman and Ullah (2007), we then define weights w j t (x i ) = 1/ δ j t 2 (x i ) and use weighted penalized regression splines to obtain the smoothed functions f j t (x) (see also Wood, 2003;Currie et al., 2004).
As a byproduct of the smoothing, we can potentially fill in any missing observations. In  To better understand the mortality patterns of Japan, we consider mortality time series featuring various geographical factors (i.e., country, region, and prefecture) and sex factors (i.e., female and male) in this study. As highlighted by different colors in Figure  Thus, disaggregate the national and sub-national age-specific mortality rates by geographical location and sex leading to in total 168 series (i.e., 47 × 2 = 94 Sex × Prefecture series, 16 Sex × Region series, 2 Sex series at the national level, and further 1 + 8 + 47 = 56 total series at Japan, Region and Prefecture levels, respectively). Because of the non-uniqueness of the group structure, the middle-level series can be disaggregated differently. This, in turn, affects the form of a group structure.

Figure 3:
Eight regions (represented in different colors) and 47 prefectures of Japan. In the remaining sections, we mainly use notations R 1 , · · · , R 8 , and P 1 , · · · , P 47 numbered geographically from north to south when referring to regions and prefectures, respectively.
We split the Japanese data into geographical structures coupled with a sex grouping variable by considering the difference in mortality between females and males. A three-level geographical structure is used to disaggregate the national and sub-national total mortality rates, as shown in Figure 4(a). We apply a dynamic multivariate functional time series method to eight region-level series and 47 prefecture-level series, respectively, to obtain joint forecasts at each level. Specifically, for each region, we consider all prefecture-level total series that belong to this region; all eight region-level total series are jointly modeled for the national total. These forecasts of multiple series at various levels are then used to perform forecast reconciliation according to the specified group structure.
When forecasting mortality rates of females and males, two disaggregation structures focusing on either geographical factors or sex are used. The first grouping structure (Hierarchy 1) in Figure 4(b) initially disaggregates the top-level series by geographical location. It then further splits sub-national totals into sex-specific series at the region and prefecture levels. Jointly mod- Total series eling the two sex-specific series at each node of the hierarchy incorporates the close connection of females and males living in each region or prefecture. The second grouping structure (Hierarchy 2) in Figure 4(c) first splits the total series by sex and then further disaggregates the totals for females and males according to geographical factors. Implementing a dynamic multivariate functional time series method under this structure means jointly modeling the series of one particular sex (female or male at a time) either for each region or for the national sex-specific totals. This sub-national mortality forecasting considers the covariance between spatially closely correlated sub-populations and respects the well-known fact that human females and males have different mortality patterns.

Methodology
The mortality patterns of sub-populations in Japan are expected to be similar because of their closely related biological backgrounds and similar lifestyles. Still, they are not expected to be identical nationwide across all age groups. Traditional age-specific mortality forecasting methods tend to result in divergent forecasts for sub-populations when independence is assumed. It is desirable to model sub-national mortality rates simultaneously while considering the potential heterogeneity between these. In this paper, we apply a dynamic multivariate functional time series method and demonstrate its advantage over its univariate counterpart using the Japanese sub-national data in Section 4.

Dynamic multivariate functional time series method
The dynamic univariate functional time series method considers temporal dependence of data by estimating its long-run covariance function. Let { f (l) (x)} l=1,...,ω be a set of random functions (in total we have ω ∈ Z + functions with ω ≥ 2) that comprise each observation in Hilbert space, consisting of ω number of functions. These multivariate functions may be defined on the same domain I (see, e.g., Jacques and Preda, 2014;Chiou et al., 2014) or may be defined at different domains I 1 , . . . , I ω (see, e.g., Happ and Greven, 2018). Given our application has the same domain, we consider the former situation.
We define the mean function For x, s ∈ I, auto-covariance function at lag is defined as The long-run covariance is defined as where f (j) (s) and f (l) (x) ∈ L 2 (I). The long-run covariance function C(x, s) C(x, s) consists of the variance as well as the temporal cross-covariance at various positive and negative lags of a functional time series, and is a wall-defined element of L 2 (I) under mild dependence and moment conditions. Through right integration, C(x, s) defines a Hilbert-Schmidt integral operator on L 2 (I) given by Via Mercer's lemma, there exists an orthonormal sequence (φ k ) of continuous functions in L 2 (I) and a non-increasing sequence λ k of positive number, such that By the separability of Hilbert spaces, the Karhunen-Loève expansion of a stochastic process ing the vector of the basis expansion coefficients, and .
The dynamic functional principal component analysis is an effective way of extracting a large amount of long-run covariance. It summarizes the main features of an infinite-dimensional object by its finite elements by minimizing the mean integrated squared error of the reconstruction error function over the whole functional data set. The dynamic functional principal component analysis has been applied in modeling human age-specific mortality (Shang, 2019) and foreign exchange rates Shang and Kearney (2021). This work is the first that extends the technique to model multivariate functional time series to the best of our knowledge. The dynamic multivariate functional time series method could better capture the temporal dependency of age-specific mortality rates compared to the commonly used Lee-Carter model or static functional time series method (see, e.g., Hyndman and Ullah, 2007).

Long-run covariance estimation
The long-run covariance function introduced in (1) in practice can be estimated from a finite where v is the bandwidth parameter, and (3) is a continuous and symmetric function with bounded support of order q defined on [−m, m] for some m > 0 satisfying and there exists an ω such that As with the kernel sandwich estimator in (3), the crucial part is the estimation of bandwidth parameter v. We estimated v through a data-driven approach using the plug-in algorithm of Rice and Shang (2017, Section 2). The plug-in algorithm selects the optimal bandwidth v that minimizes the asymptotic mean-squared normed error between the estimated and actual long-run variance functions.

Selecting the number of retained functional principal components
There are several methods for selecting the number of functional principal components K: (1) the fraction of the variance explained by the first few functional principal components (Chiou, 2012); (2) pseudo-versions of the Akaike information criterion (AIC) and Bayesian information criterion (Yao et al., 2005); (3) predictive cross-validation leaving one or more curves out (Rice and Silverman, 1991); and (4) bootstrap methods (Hall and Vial, 2006). Here, the number of components is determined as the minimum that reaches 90% of total variance explained by the leading components (the first method mentioned), such that where 1 { λ k >0} is to exclude possible zero eigenvalues, and 1{·} represents the binary indicator function. Although the total variance explained level could be arbitrary, 0.9 is a common threshold when using the total variance explained in principal component analysis to select the number of components to retain (see also Kolkiewicz et al., 2021).

Forecasting multivariate functional time series
In the dynamic multivariate functional time series method, we first stack all series into a vector of functions and obtain a set of dynamic functional principal components with associated scores according to the method described in Section 3.1. Conditioning on the estimated mean function and estimated dynamic functional principal components, these forecast scores are multiplied by the estimated principal components to form forecast curves. A univariate time series forecasting method, such as the ARIMA model, is considered to forecast dynamic functional principal component scores. Through differencing, the ARIMA model can handle non-stationarity present in the data. For details of forecasting dynamic principal component scores, consult Hyndman andShang (2009), Shang (2019) and Shang and Kearney (2021).
Denote the h-step-ahead prediction of dynamic principal component scores β K (x) are the estimated functional principal component functions.

Construction of pointwise prediction intervals
As a supplement to point forecast evaluation, prediction intervals are constructed and evaluated to gauge the uncertainty associated with point forecasts. Depending on statistical theory and data on error distributions, prediction intervals explicitly estimate the probability that future realizations lie within a given range. In implementing the proposed mortality forecasting method, the errors accumulated in the estimated principal component scores and model residuals in (2) are the primary sources of uncertainty. As emphasized by Chatfield (1993), it is crucial to provide interval forecasts as well as point forecasts to be able to (1) assess future uncertainty level; (2) enable different strategies to be planned for the range of possible outcomes; (3) compare forecasts from different methods; and (4) explore different scenarios based on various underlying assumptions.
3) Based on these in-sample forecast errors, we can sample with replacement to obtain a series of bootstrapped forecast errors, from which we obtain lower and upper bounds, denoted by γ (l) lb (x) and γ (l) ub (x), respectively. We then seek a tuning parameter π α such that α × 100% of the residuals satisfy Then, the h-step-ahead pointwise prediction intervals are given as where i symbolizes the discretized data points.
Note that Aue et al. (2015) calculate the standard deviation of (l) M (x) , which leads to a parametric approach to constructing prediction intervals. Instead, we consider the nonparametric bootstrap approach of Shang (2018), which allows us to reconcile bootstrapped forecasts among different functional time series in a hierarchy.

Reconciliation of grouped forecasts
The disaggregated data can first be organized into groups for forecasting sub-national mortality rates before applying the dynamic multivariate functional time series method. Hence, an appropriate grouping structure is crucial for accurate forecasts. A natural strategy is arranging the sub-national functional time series into geographical structures, as in Figure 4. State-of-theart forecast reconciliation methods are applied to ensure that the point and interval forecasts at the various disaggregation levels add up to the forecasts of the national mortality.
For all aforementioned hierarchies, we denote a particular disaggregated series using the notation G * S, meaning the geographical area G and the sex S. For instance, R 1 * F denotes females in Region 1, P 1 * T denotes females and males in Prefecture 1 and Japan * M denotes males in Japan. Let E G * S,t (x) denote the exposure to risk for series G * S in year t and age x, and let D G * S,t (x) be the number of deaths for series G * S in year t and age x. Then, age-specific mortality rate is given by R G * S,t (x) = D G * S,t (x)/E G * S,t (x). Dropping the age argument (x) allows us to express the national and sub-national mortality series in a matrix multiplication as where R t is a vector containing all series at all levels of disaggregation, b t is a vector of the most disaggregated series, and S t shows how the two are related. The group structures illustrated in Figure 4 satisfy this relationship. In the two group structures, the main difference is about how the (R R1×F,t , R R1×M,t ,. . . , R R8×F,t , R R8×M,t ) and (R P1×F,t , R P1×M,t , . . . , R P47×F,t , R P47×M,t ) are estimated. In Figure 4(b), we estimate the female and male series together in a geographical location. In Figure 4(c), we estimate multiple series at various geographical locations for females and males, respectively.
We present a brief review of three methods for forecast reconciliation based on this summing matrix. The three forecast reconciliation methods are bottom-up, optimal combination method with the ordinary least squares (OLS) estimator, and optimal combination method with generalized least squares (GLS) estimator. The bottom-up method has the agreeable feature that it is intuitive and straightforward and always results in forecasts that satisfy the same group structure as the original data (e.g., see Dangerfield and Morris, 1992;Zellner and Tobias, 2000).

Data analysis results
We apply the dynamic univariate and multivariate functional time series forecasting methods to Japanese age-specific mortality rates to obtain base forecasts for the group structures in Figure 4.
Then, we conduct reconciliation via the bottom-up and the optimal combination methods. To assess model and parameter stabilities over time, we consider an expanding window analysis of considered time series models (see Zivot and Wang, 2006, Chapter 9 for details). We used the observed mortality curves in 1975-2001 Japanese data to produce one-to 15-step-ahead point and interval forecasts. Through an expanding window approach, we re-estimate the parameters of considered models using the first 28 years of observations from 1975 to 2002 and estimate one-to 14-step-ahead forecasts using re-estimated models. The process is iterated with the sample size increased by one year each time until reaching the end of the data period in 2016.
This process produces 15 one-step-ahead forecasts, 14 two-step-ahead forecasts, and so on, up to one 15-step-ahead forecast. We evaluate the point and interval forecast accuracies and report our study results in this section.

Point forecast evaluation and comparison
To evaluate the point forecast accuracy, we compute the root mean squared forecast error (RMSFE), which measures how close the forecasts are compared to the actual values. For each series, the RMSFE is given by where f n+ς (x i ) represents the actual holdout sample for the i th age and ς th curve of the fore- Averaging all series at each disaggregation level of Hierarchy 1, we show RMSFEs (×100) for the forecasts before and after reconciliations in Figure 5. In the figure, we denote the independent forecasts as Base. We denote the reconciled forecasts obtained by the bottom-up method and the optimal combination method as BU and OP, respectively (see, e.g., Hyndman et al., 2011;Shang, 2017, for more details). This figure compares forecasts obtained by the dynamic multivariate functional time series method with forecasts produced by its univariate counterpart method. The dynamic multivariate functional time series method is expected to give more accurate point forecasts at disaggregation levels where multiple mortality series have highly consistent patterns. The correlations among multiple series can be captured and used in producing forecasts. This advantage of our method is shown in "Region + Sex series" and "Prefecture + Sex series" panels of Figure 5. However, due to its construction, the dynamic multivariate functional time series method implicitly assumes the same number of retained functional principal components for all series, consequently resulting in diminishing forecast accuracy for a particular series. This disadvantage of our method is revealed when only the Japan * F and Japan * M series are considered, as shown in the "Sex series" panel of Figure 5. At the national level, where only one mortality time series exists, the dynamic univariate functional time series method, a special case of dynamic multivariate functional time series method, is implemented based on an estimated long-run covariance function. Hence, both methods have the same forecasts for the total Japanese series.
The advantage of dynamic multivariate functional time series method is that it could extract common features among strongly related sub-populations. Another prominent character of the dynamic multivariate functional time series method shown in Figure 5 is the method's robustness to outliers in the training sample. At least three prefectures in Japan that were severely affected by earthquakes and tsunamis have inflated mortality curves. One reason for the dynamic multivariate functional time series forecasts being immune to outliers is that the computation of long-run covariance function over a period mitigates the influence of abrupt and temporal movements of observations. The autocovariance function, a part of the long-run covariance function, can smooth out the observational noise. Another reason for the robust forecast is that collectively, the consideration of related sub-populations enables the information pooling and thus improves the extraction of human mortality features. By contrast, many large values exist in the dynamic univariate functional time series forecasts even after reconciliation. Applying the dynamic multivariate functional time series method to various disaggregation structures leads to different RMSFEs (×100), as shown in Figure 6. The geographical hierarchies described in Figure 4 are blended into Hierarchy 1 and Hierarchy 2 in the reconciliation of the total series at the country, region, and prefecture levels. Hierarchy 2 produces the most accurate point forecasts, given the outliers in the data. Collectively modeling the curves that represent all females (males) in the prefectures can better ease the influence of the few outlying curves than jointly modeling the series for females and males in each prefecture. This finding is because natural disasters are likely to affect females and males in an area equally. Comparing RMSFEs across all sub-national levels, we recommend grouping age-specific mortality rates of the same sex for each region and prefecture, applying the dynamic multivariate functional time series method to obtain base forecasts, and then reconciling the forecasts using the bottom-up method.

Interval forecast evaluation and comparison
In addition to point forecasts, we also evaluate the pointwise interval forecast accuracy using the interval score of Gneiting and Raftery (2007) (see also Gneiting and Katzfuss, 2014). For each year in the forecasting period, the h-step-ahead prediction intervals are calculated at the 100(1 − α)% nominal coverage probability. We consider the common case of a symmetric 100(1 − α)% prediction interval whose lower and upper bounds are predictive quantiles at α/2 and 1 − α/2, denoted by f lb n+ς (x i ) and f ub n+ς (x i ), respectively. As defined by Gneiting and Raftery (2007), a scoring rule for the pointwise interval forecast at time point x i is where α denotes the level of significance; typically α = 0.2. It can be easily seen that the optimal interval score is achieved when f n+ς (x i ) lies between f lb n+ς (x i ) and f ub n+ς (x i ), with the distance between the upper bound and the lower bound being minimal.
The scoring rule of Gneiting and Raftery (2007) is considered in this study since it combines the coverage rate and halfwidth of the prediction interval. We aim to construct prediction intervals where the empirical coverage rate is close to the nominal one, with a minimum width of the prediction interval. When the empirical coverage rate is less than the nominal one, the constructed prediction interval is narrower than the oracle. This indicates that we are under-estimating the forecast uncertainty. Conversely, if the empirical coverage rate is more than the nominal one, our prediction interval is wider than the oracle. This indicates we are over-estimating the forecast uncertainty, and our prediction intervals are not that informative.
Averaging over different points in a curve and different years in the forecasting period, the mean interval score is defined as where P α f lb n+ς (x i ), f ub n+ς (x i ); f n+ς (x i ) denotes the interval score at the ς th curve of the forecasting period. For 15 different forecast horizons, we use a mean statistic given by to evaluate the pointwise interval forecast accuracy. The second and third panels in Figure 7 depict interval forecast accuracies for the total series (i.e., the aggregate mortality of females and males) at region and prefecture levels, respectively. The DMFTS method collectively models all the considered total series to estimate the long-run covariance of the entire population of Japan before producing forecasts. Significant heterogeneity exists in the total series at the Prefecture or Region levels since earthquakes and tsunamis influenced mortality at several prefectures during the considered period. Hence, natural disaster-related deaths as outliers tend to inflate the estimation variance of every series used in the DMFTS modeling process. In contrast, the base method considers one total series, either for a region or a prefecture, at a time. Only those prefectures or regions with outlying death counts would see additional estimation variance. Moreover, the dynamic modeling method may not accurately recover the long-run covariance of mortality series of all Japanese residents based on a finite sample. Due to these reasons, the DMFTS method sometimes performs inferiorly to the independent method. Applying any of the considered reconciliation methods improves interval forecast accuracies of the total series at region and prefecture levels, as shown in Figure 7. The forecast accuracy improvement is because Japan's population hierarchy helps stabilize the abnormal forecasts inflated by excess uncertainty in estimation.   on common features of female (male) mortality, leading to inaccurate pointwise prediction intervals for outlying series due to losing specific information for the sub-population. Given that interval scores will be inflated by every holdout value outside the calculated prediction interval, outlier curves cause a greater penalty to Hierarchy 2 than to Hierarchy 1. Hence, jointly modeling the mortality series for females (males) of each prefecture and region is better for improving pointwise interval forecast accuracy.

Conclusion
When applied to the functional time series formed by disaggregated series, such as sub-national age-specific mortality rates, the dynamic multivariate functional time series method can be combined with reconciliation methods to obtain improved point and interval forecasts.
The dynamic multivariate functional time series method is applied to data on Japanese age-specific mortality rates from 1975 to 2016. The dynamic multivariate functional time series method produces the most accurate point and interval forecasts of Japanese mortality rates combined with the bottom-up reconciliation method. Different group structures involving geographical factors and sex are considered and compared in forecasting and reconciliation. For point forecasting, we recommend that, first, the national mortality data are disaggregated by sex, and further, each series is split according to a geographical area. Because of the bias-variance trade-off, reversing the disaggregation order, that is, first, according to the geographical area, and then, by sex, yields more accurate pointwise interval forecasts, evaluated by the mean interval scores.
There are some ways in which this paper can be further extended. Here, we briefly mention three: 1) We have considered two factors, namely sex and geographical locations, to disaggregate age-specific mortality rates. It is possible to enrich the information contained in the group structure by differentiating these rates according to other factors, such as the cause of death (Murray and Lopez, 1997;Gaille and Sherris, 2015), social and economic aspects (Bassuk et al., 2002;Singh et al., 2013) and native-born or immigrant status (Singh and Siahpush, 2001). If appropriate data are available, we may attempt to extend the proposed grouped multivariate functional time series forecasting method to cause-specific mortality rates and occupation-specific mortality rates.
2) We expect the mortality functions at the prefecture-level to be homogeneous so that the empirical long-run covariance function can be accurately estimated. When there are major natural disasters such as earthquakes and tsunamis in some prefectures, the observed mortality functions may have different shapes. Such functions of strange shapes can be viewed as outliers. Functional depth-based outlier detection methods of Hyndman and Shang (2010), and the recently developed novel multivariate functional outlier detection methods of Dai and Genton (2018) and Dai et al. (2020) can be adopted to check if a particular curve is an outlying observation. Among other methods, the robust functional principal component analysis method and the robust regularized singular value decomposition method of Shang et al. (2019) can be applied to mitigate influences of functional outliers in modeling and forecasting age-specific morality rates (see also Zhang et al., 2013;Bali et al., 2011).
3) We evaluate and compare the pointwise interval forecast accuracy. To our best knowledge, uniform prediction intervals are still rarely used in the context of functional time series models, owing to the difficulty in computing an appropriate semi-metric distance. A future extension can consider the uniform prediction intervals of grouped functional time series and study the interval forecast accuracy in function space.

Supplementary Material
For reproducibility, all R codes are collated in a Github repository at https://github.com/ hanshang/GMFTS. This repository contains the following files: • README: Describe the functionality of each R file.
• R files with indices 1 to 29: Functions used to compute numerical results presented in the paper.
• Example: Contains a quick example of forecasting prefecture-level age-specific mortality rates using the DMFTS method.