Spatio-Temporal Forecasting Approach for Health Indicators

Progress towards government health targets for health areas may be assessed by short term extrapolation of recent trends. Often the observed longitudinal series for a set of health areas is relatively short and a parsimonious model is needed that is adapted to varying observed trajectories between areas. A forecasting model should also include spatial dependence between areas both in representing stable cross-sectional differences and in terms of changing incidence. A fully Bayesian spatio-temporal forecasting model is developed incorporating flexible but parsimonious time dependence while allowing spatial dependencies. An application involves conception rates to women aged under 18 in the 32 boroughs of London.


Introduction
Government health improvement targets to reduce adverse events or health inequity increasingly involve forecasts to assess how far much progress towards the target has been achieved.When the targets are set for health areas, this requires a forecast for a panel data situation, with observations formed by a collection of incidence rates by areas i = 1, . . ., N and times t = 1, . . ., T , and with forecasts required for times t = T + 1, . . ., T + R.This paper sets out a spatio-temporal forecasting model suited to such analysis that allows both for enduring spatial patterning in the health outcome and for spatial clustering of growth or decline in incidence.
Enduring health contrasts reflect features such as the impact of social structure and health behaviours on health outcomes.Such contrasts tend to be relatively stable in established urban areas, only subject to slow change (e.g.due possibly to selective migration over a number of decades), though more pronounced change may occur in fast growing urban areas (e.g. if there is substantial building of new homes).By contrast, trends towards improvement (or deterioration) in health outcomes may well not be positively correlated with existing differentials, and may reflect a wide range of factors including success or otherwise of health sector interventions, though may still show spatial patterning.This is true for the application considered here which involves short term forecasts of conception rates to women under 18 in the 32 London boroughs, using observations over 1992-2003; the percent change in this rate between 1992-95 and 2000-2003 has a correlation of −0.14 with the average rate over the entire 12 year period.
Considerable work has been done in developing spatio-temporal models for descriptive analysis (e.g.Waller et al, 1997) but the temporal mechanisms assumed are not necessarily suited to short to medium term extrapolation (between one to ten years ahead).Growth curve models in biostatistics and psychometrics also assume relatively simple time effects (e.g.linear growth) and are not necessarily suited to extrapolation.Bernardinelli et al (1995) present a model for spatial disease outcomes through time that allows for spatially correlated linear growth, but this change pattern may be a simplification in many applications.
Some have argued for an integration of growth curve principles and autoregressive models, improving forecasting potential, since autoregressive dependence on lagged observations provides a basis for short term forecasting that reflects the recent history of the series (Bollen and Curran, 2004).There is also scope for modelling nonlinear evolution within a growth path context, e.g. by introducing fractional polynomial effects in time (Royston and Altman, 1994).Fractional polynomials allow a flexible polynomial shape without necessarily requiring high order polynomial terms in time.The model developed here combines autoregressive and growth curve principles within a spatio-temporal forecasting framework; the model allows for spatial patterning both in the health outcome and in changing incidence of the outcome, with area change profiles modelled using fractional polynomials.
The application involves a relatively short panel series of conception rates to women aged under 18 in 32 London boroughs.Event totals and denominators (women aged 15-17) are available back to 1992 and the series ends in 2003.Here the last year's observations are held out to provide a cross-validation benchmark for models based on the 11 years 1992-2002.Teenage conceptions in London in 2003 total 6500 in relation to a risk population (over all 32 areas combined) of 127 thousands.The borough rates in 2003 vary from 22.4 per 1000 to 105 per 1000 (see Figure 1) with the highest rates in socially deprived areas in east and south east inner London (cf.Diamond et al, 1999), while lower rates occur in generally more affluent suburban boroughs.Specification of the model variants is based on fully Bayesian principles, involving prior densities on parameters and their updating via the likelihood to produce estimated posterior densities.Estimation involves repeated sampling via Markov Chain Monte Carlo procedures, and uses the WINBUGS package1

Modelling Framework
For event totals Y it and populations at risk N it , binomial or Poisson sampling is frequently assumed in conjunction with a logit and log links respectively.Thus Y it ∼Bin(N it , r it ) where r it are the event probabilities, or Y it ∼Po(N it µ it ) where µ it are relative risks.The framework suggested here allows for time specific spatial morbidity effects s it that are expected to change relatively slowly.However, forecasting is facilitated by lagged effects of previous outcomes and by including possibly nonlinear growth effects.Here binomial sampling is assumed, and lagged effects are based on moment estimates of rates p it = Y it /N it .
For modelling spatial effects s it , one possible prior structure is provided by the intrinsic conditionally autoregressive (ICAR) Normal prior of Besag et al (1991), with weights w ijt relating areas i and j in year t, where τ t is the conditional variance of the spatial effects.Several other forms of spatial prior have been suggested, including priors that mix heterogeneity with spatial correlation (Leroux et al, 1999).The s it can be envisaged as residuals with spatial structure so that s it and s jt have positive spatial dependence.Such spatial effects are often used in cross-sectional disease mapping applications (Pascutto et al, 2000) to represent spatially correlated and unmeasured influences on disease risk.Here they can be taken to represent time varying spatially correlated influences on the risk of teenage conception.It is not necessary to assume Normality regarding the s it and other options are Student t and double exponential densities (Lawson and Clark, 2002), especially if outliers are thought likely to distort an otherwise relatively smooth spatial interdependence.For identification, the restriction i s it = 0 within each period is needed; this is achieved by centering the effects at each MCMC iteration.The weights w ijt define the form of spatial contiguity or connectiveness2 and are typically assumed to be non-stochastic.They make take account of geographic or even social distances between areas, but are commonly defined in terms of adjacency (i.e.w ijt = 1 if areas i and j are adjacent and zero otherwise).The regression model is here confined to a changing intercept α t to model the overall change in incidence, though other known predictors for areas and/or times might be relevant to explaining conception trends (e.g.youth employment prospects).Then a model for r it including lags at 1, . . ., L in p it has the form For t exceeding T +L, the lagged p i,t−L are replaced by lagged r i,t−L .Stationarity is not imposed a priori on the autoregressive (AR) coefficients.It is likely that autoregressive effects differ by area, and a generalization assumes random effects AR coefficients, either with no spatial structure or spatially structured, so where the β il for lag l could have ICAR form separately, though a multivariate version of the CAR model (an MCAR prior) can also be used to model the β il jointly (Gamerman et al, 2003).
Following Bollen and Curran (2004), albeit concerned with non-spatial applications, a more inclusive approach to modelling longitudinal change includes growth curve as well as autoregressive influences.While several forms of time series model have been proposed for nonlinear time effects, such as functional autoregressive models (Huang and Shen, 2004), these are difficult to extend to the case where the nonlinear effects differ by area.By contrast, the fractional polynomial structure allows flexible modelling of nonlinear effects and also permits the coefficients to vary by area.A fractional polynomial (FP) relation between r it and time t can be added to the above model via where q = (q 1 , q 2 , . . ., q M ), (q 1 < q 2 < • • • < q M ), is a vector of powers in t, including possibly q j = 0 leading to log(t).The summation involves polynomial terms H j (t), j = 1, . . ., M defined as To reflect the spatial panel data application, the coefficients ξ ij can be made area specific, with spatially structured (possibly multivariate) random effects priors being one option for (ξ i1, ξ i2 , ..ξ iM ).
In practice models with large L and/or M may not lead to pronounced improvements in forecasting accuracy, and one may adopt an incremental modelling strategy starting with a simple model.For example, a baseline model oriented to forecasting might be defined by {L = 1, M = 1, q = (1)} and varying area lag and FP effects, namely and one may generalize this model by adding (area specific) lags and/or further fractional polynomial terms.The model in (2.2) would also tend to accurately describe observed cross-sectional patterns (within the estimation period) due to the s it parameters.However, reduced parameterization of the descriptive side of the model may be possible without adversely affecting short-term forecasts.This is illustrated by the case study.

Case Study
The above framework is now applied to the spatial panel for London boroughs over 1992-2003 (source: Department of Health for England), with the first ten years of data providing the estimation period (see the web site of Journal of Data Science for the data, including area neighbour structure).For teen conception totals Y it and female populations at risk N it , binomial sampling is assumed in conjunction with a logit link.The goal is parsimonious modelling that produces a good forecast to (known) values in the cross validation year, 2003.Comparisons of model fit within the estimation period use the deviance information criterion (DIC) of Spiegelhalter et al (2002), where the DIC is the posterior average of the deviance plus a measure of complexity d e .This measure can also be regarded as an estimate of the effective dimension of the model.It is estimated as the difference between the average deviance and the deviance Dev( θ) at the posterior mean of the full parameter set θ.To assess out of sample fit to known rates in the period T + 1 (i.e. the year 2003, so R = 1), predictive error sum of squares and average absolute deviation measures are used, where ri,T +1 denotes the posterior mean forecast.Then 100E 2 expresses the average percent absolute relative error in forecasts to 2003.Five models are considered, with the rationale in model development being to introduce enhancements to improve forecast performance while avoiding excess parameterization.A baseline model (model 1) resembles descriptive models used in disease mapping studies; this will tend to produce a good within sample fit but not necessarily a good forecast fit.Model 1 allows for spatially correlated effects s it specific to time and also for changing overall incidence.The spatial effects follow the ICAR model of Besag et al (1991), with weights w ijt .This model allows for changes in the degree of spatial correlation in a health outcome over time (Waller et al, 1997); specifically, the changing variance τ t of the spatial effects is modelled using a first order random walk in log(τ t ).Changing overall incidence is modelled via a changing intercept α t , also assumed to follow a first order random walk prior, with identifiability achieved by centering the s it .The weights w ijt are assumed constant over time and to equal 1 when boroughs are adjacent and zero otherwise.Hence model 1 is where L i denotes the neighbourhood of borough i (the set of k i boroughs adjacent to it) and Sit is the average of the spatial effects in that neighbourhood in year t.Diffuse Normal N (0, 1000) priors (i.e.variance of 1000) are assumed on κ 1 and α 1 , while 1/φ κ and 1/φ α follow Ga(1, 1) gamma priors.
A two chain run of 5000 iterations with disparate initial values shows convergence after 1000 iterations according to Gelman-Rubin criteria (Gelman et al, 1995).There is some evidence that overall incidence is increasing early in the observation span (the early and mid 1990s), but that this is tailing off by the 2000s.Hence the forecast (posterior mean) for α t for 2003 is for very similar to the mean for 2002.By contrast, the precisions of the spatial effects are increasing and the mean of ϕ T +1 exceeds the mean of ϕ T .
This model is relatively heavily parameterized (d e = 488 compared to NT = 352 observations) while not facilitating forecasts tailored to the growth path in each area.The fit within the estimation period (1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002) is good as judged by criteria 3.1 and 3.2 evaluated for 2002 (Table 1).However, the forecasts for 2003 show lack of fit, with average relative absolute error of 41% (100 times E 2 ).Hence a second model (model 2) introduces area specific autoregression on the preceding years rate r i,t−1 and also allows a borough specific linear growth effect; this is the case M = L = 1 with q = (1), as mentioned above.To reduce parameterization the time specific spatial effect is replaced by a spatial factor model (Wang and Wall, 2003;Congdon, 2002), so that s it is replaced by λ t s i where λ t =exp(η t ) are positive coefficients(loadings), with λ 1 = 1, and η t following an RW1 prior to allow for pooling strength over time.The terminology 'spatial factor' describes the use of a shared error with coefficients linking the error through times.The autoregressive effect and the linear growth effect are both assumed spatially correlated.The resulting model 2 has the form ) and 1/φ α ∼ Ga(1, 1).For t exceeding T + L, the lagged p i,t−L are replaced by lagged r i,t−L .
A two chain run of 5000 iterations (convergent from 1000) shows that this model leads to a some loss of fit (to 2002 data) within the estimation period.However, the loss is slight in view of the much reduced parameterization as compared to model 1.The DIC actually falls due to the reduced parameter count (Table 1) though the deviance at the posterior parameter mean is higher.Despite the reduced parameter count, the forecast accuracy for 2003 (when forecasts can be compared to known conception rates) improves considerably compared to model 1.The average percent relative forecast error falls to 9%.
Model development could be in extending the fractional polynomial, the AR components or both.To assess the gain through developing the fractional polynomial component of the model, the fractional polynomial components is expanded to φ 3 (t; ξ i , q), where q = (−1, 0, 1) and and 1/φ α ∼ Ga(1, 1).The increased parameterization may be contained by pooling strength under a multivariate version of the CAR Normal prior (Gamerman et al, 2003) for ξ i =(ξ i1 , ξ i2 , ξ i3 ).Thus the pairwise difference prior with precision matrix Ψ ξ is As above a contiguity assumption is made for w ij , and a Wishart prior with identity scale matrix and 3 degrees of freedom is adopted on Ψ ξ ; so Ψ ξ ∼ W (3, I).
This model produces a gain in fit within the estimation period at the expense of an increase in complexity d e from 78.9 to 96.5.There is a gain in forecasting performance according to criterion (3.1). Figure 2 shows the forecast rates for 2003 under model 3, and Table 2 shows the posterior summary of the forecasts.
This model produces a slight loss in fit to the estimation sample as compared to model 3, and also no gain in forecast accuracy as compared to model 3. Which model among models 1 to 4 is preferred depends on the relative weights attached to insample fit and out-of-sample forecast accuracy; relative forecasting performance also depends on which of (3.1) or (3.2) is used.Since model 3 has the best in-sample fit (among models 2 to 4), and best forecasting fit under (3.1), it seems to show the best overall performance.If one were primarily concerned with description then model 1 is preferred.
Model 3 has a simple fractional polynomial form.However, extending the fractional polynomial did not improve forecast or in-sample performance.Thus model 5 replicates model 3 except in expanding the FP component to φ 5 (t; ξ i , q), where q = (−2, 1, 0, 1, 2) and ξ i = (ξ i1 , ξ i2 , ξ i3 , ξ i4 , ξ i5 ) following an MCAR form.The extra complexity of this model leads to a worse DIC and some deterioration in forecast accuracy too (Table 1).

Conclusion and Possible Methodological Extensions
Many spatio-temporal models for disease and mortality outcomes have focussed on description and not on extrapolation to the future.As model 1 in the case study application illustrated, a descriptive model providing a close fit to an existing dataset may not produce a good forecast.By contrast, models allowing for short term extrapolation of growth trends and/or autoregression on recent rates may improve on descriptive models in terms of out-of-sample accuracy.
For health and social indicators relatively short observed series over sets of constant boundary areas are often common.Short panel series for area units may be due to boundary changes, changes in data collection or indicator definition, or intermittent collection (as in the case health status data from Censuses).Hence there may be only a small gain in adopting highly complex models with high order lags or complex polynomial forms.However, the case study demonstrated that improved fit did result from adopting a spatially varying fractional polynomial of a relatively simple form.
The above analysis has been for a single outcome, but there may be gains in pooling strength over two or more outcomes.For example, area conception rates for women aged under 16, aged 17-18 and aged 19-20 tend to be positively correlated.For outcomes k assumed binomial with rates r ikt , one might envisage models that develop on (2.1), such as logit(r ikt ) = α tk + λ tk s ik + β ik1 p i,k,t−1 + • • • + β ikL p ik,t−L + φ M k (t; ξ ik , q k ) where the s ik model stable morbidity differences, the λ tk are positive time and output specific coefficient loadings, the β ikl model area specific autoregressive own lags in outcome k, and the fractional polynomial allows area and outcome specific coefficients and also possibly q k vectors different by outcome.Pooling over outcomes in the autoregressive coefficients might be based on a common factor approach, for example expressing β ikl as κ k β il .More complex models might involve cross-lags (of r ikt on p im,t−l for m = k).

Table 2 :
Actual 2003 rates and model 3 prediction summary FPM= forecast posterior mean Observation level forecasting performance in this and other models is dominated by overestimation of the conception rate in the central London borough of Kensington and Chelsea in 2003; this has an actual rate of 27.6 per 1000 in 2003 compared to a posterior mean forecast of 46 per 1000 under model 3.The number of conceptions here fell sharply to 57 in 2003, compared to totals of 80,80 and 81 in the years 2000, 2001 and 2002 respectively.Excluding this borough reduces criterion (3.2) 9.5% to 7.7%.However, some discrepancies are related to shifts in the denominator population estimates.For example, the denominator population in Westminster (women aged 15-17) shifted upwards by 12% between 2002 and 2003 due to an upward revision of populations, reflecting concerns over 2001 census enumeration errors 3 .The official rate per 1000 (Department of Health, England) for 2003, with the revised population denominator, is 41.5 compared to a posterior mean forecast of 46.8 per 1000.Model 4 considers instead considers an extension of the lag effects in model 2, namely adding spatially varying AR2 coefficients.A bivariate CAR form is assumed for the two autoregressive coefficients.Thus model 4 is defined by logit a