Application of a Mixed Eﬀects Model for Biosurveilliance of Regional Rail Systems

: Although United States government planners and others outside government had recognized the potential risk of attacks by terrorists, the events of September 11, 2001, vividly revealed the nation’s vulnerabilities to terrorism. Similarly, the 2004 terrorist attacks in Madrid illustrated vulnerabilities to terrorism extend beyond the United States. Those attacks were obvious destructive acts with a primary purpose of massive causalities. Let us consider a bioterrorist attack which is conducted subtly through the release of a Chemical/Biological agent. If such an attack occurs through release of a speciﬁc biological agent, an awareness of the potential threat of this agent in terms of the number of infections and deaths that could occur in a community is of paramount importance in preparing the public health community to respond to this attack. An increase in biosurveillance and novel approaches to biosurveillance are needed. This paper illustrates the use of mixed eﬀects model for biosurveillance based on commuter size for regional rail lines. With mixed eﬀects model we can estimate for any station on a given rail system the expected daily number of commuters and establish an acceptability criterion around this expected size. If the actual commuter size is signiﬁcantly smaller than the estimate, then this could be an indicator of a possible attack. We illustrate this method through an example based on the 2001 daily totals for the Port Authority Transportation Company (PATCO) rail system, which serves residents of southern New Jersey and Philadelphia region in the United States. In addition, we discuss ways to put this application in a real time setting for continuous biosurveillance.


Introduction
The primary goal of biosurveillance is to minimize the effect of a bioterrorist attack. While most communities have disaster response systems, a major bioterrorism attack will produce a catastrophe primarily affecting the community health care system (Flowers et al., 2002). Bioterrorism response requires precise distribution of available personnel and resources. This distribution of personnel and resources must be based on the expected severity of the attack for a given community. Poor distribution of personnel and resources could result in an increase in infections and mortality; therefore, reliable prediction of the expected severity is crucial.
A key location for a bioterrorist attack is a large transportation hub such as Philadelphia's 8-th Street and Market Street station. This station is popular not only for the large number of businesses in the neighboring vicinity but also for the large number of tourist attractions. Comparable stations are easily identified for any major city in the United States such as Penn Station in New York and the Metro Center in Washington, D.C. There are various types of biological pathogens that potential enemies may employ, which could be released in a large crowd, released in a food or water supply, or released through ventilation systems of a targeted hub. One concern is the incubation time before infection is recognized. For example, if the agent was anthrax, infection is not immediately apparent. A person may be infected but not symptomatic. Upon presentation, infection may appear as cutaneous anthrax, inhalational anthrax, or gastrointestinal anthrax. Depending on how the anthrax infection appears, determines the severity of the infection as well as the urgency of immediate medical response. Death due to non-treated inhalational and gastrointestinal anthrax is much more rapid than cutaneous anthrax (Bartlett et al., 2002).
The majority of the people in the transportation hub are using one of the available commuter services such as the regional rail lines available at Philadelphia's 8-th Street and Market Street station to go to and from work. During the work week, the same commuters will consistently use the same transportation service. If anthrax was released in the ventilation system, then upon presentation of the infection as inhalational anthrax, many commuters may be too sick to go to work; therefore, the number of commuters, which we'll refer to as passenger loads, on any specific rail line would be lower than expected. Many infected individuals will not attribute the infection to bioterrorism. They are more likely to attribute the symptoms to a cold or flu. While we do expect some variability from day to day, if the number of commuters was substantially lower than the expected value, indicating a large number of people potentially experiencing infection symptoms, would raise some level of suspicion.
As discussed by Evans et al. (2002), one approach to biosurveillance is the use of mixed effects models, treating census tracts as the repeated observations over time for a given cluster where the cluster is the unit of analysis. Using historical data, the model estimates month, day of the week, and holiday fixed effects, in ad-dition to random effects for each census tract. These random effects will account for the within correlation for a given unit as it is measured over time. Evans et al. (2002) described using the estimated random effects to get different estimates for each tract on each day, which may be compared to observed counts. In response to the need for increased biosurveillance, we consider monitoring the passenger loads for regional rail systems through this mixed effects model approach. This mixed effects model approach will allows us to forecast the expected number of commuters on a regional rail system. This forecast will be adjusted for day of the week differences as well as stations on the rail system. If the forecast is not within an established criterion, then it provides a reason for concern, which will require further investigation of possible causes for the decreased passenger load.
The goal of this paper is to illustrate the use of mixed effects models as a method of biosurveillance. The mixed effects model will be used to forecast the expected passenger load on a given regional rail system. The mixed effects model allows us to properly address various sources of variability. We illustrate this method on available data from the Port Authority Transportation Company (PATCO) rail system, which serves New Jersey residents who commute to Philadelphia and Philadelphians who commute to New Jersey. The PATCO line is illustrated in Figure 1. We have data for each station in New Jersey and Philadelphia for 2001. The data consists of daily total counts of the number of passengers entering each station. Mirroring the terminology of Evans et al. (2002), the PATCO data consists of 13 census tracts, one for each rail station on the PATCO line, consisting of 365 data points for each tract, corresponding to the daily total per station for the 2001 calendar year. We will estimate the expected number of passengers for any station and particular day through the mixed effects model.
The remainder of the paper is organized as follows. In the next section, we review the mixed effects model. In Section 3, we discuss our Biosurveillance application to the PATCO regional rail system. Discussions relating ways to implement these methods to a real time setting is presented in Section 4. Some concluding remarks are made in Section 5. The appendix contains the SAS code for the discussed example.

Mixed Effects Model Applied to the PATCO Data
With the PATCO data, we expect some similarity in the daily totals for a given station over time. To get a better understanding of the dynamics of these station tracts over time, we consider daily totals over time for each station.  As illustrated in Figure 2, there is a rise and fall for the daily totals that appears cyclic with a weekly periodicity. This pattern holds over the entire year, as well as for the other stations. This cyclic behavior makes sense due to expected lower number of passengers during the weekends and the beginning of the work week (Monday) and the end of the work week (Friday). Usually one expects more passengers during the middle of the week. Passengers on average use the same station primarily due to proximity to their residence or parking availability. In addition, days in which passengers use the rail system are usually consistent over time. For example, some passengers may work from home on Mondays; therefore, they will consistently use the rail system only four of the available seven days per week, and in addition, they use the rail system on the same four days. Focusing on the Lindenwold station's tract in Figure 2, we see the Monday daily total from 1/1/2001 -2/26/2001 range from 1806-5239. Tuesday daily totals from 1/2/2001 -2/27/2001 range from 4613-5682. Wednesday daily totals from 1/3/2001 -2/28/2001 range from 5157-5842. The same period of time was investigated for the Philadelphia 8-th Street and Market Street stations with daily totals ranging from 3083-5434 for Mondays, 4885-5748 for Tuesdays, and 5096-7067 for Wednesdays. These summary statistics illustrate that for a given station there is variability within the various Monday daily totals as well as for the other days; therefore, a within day classification per station source of variability. Similarly, within a station, there is variability across the days of the week. Finally there is variability across stations. This dissection of the variability in passenger load is allocated to three sources: Error variance (Variability across the replicates for a given day per station), variance across day of the week classification per station, and variability between stations. These sources of variability are illustrated further in Figure 3 based on simple summary statistics from the available data.  Figure 3 is the standard deviation in these means for the Lindenwold station over the 6 classifications for the 7 days of the week. The last bar is the standard deviation for the mean estimates per station over the 365 data points. From these summary statistics, we see variability within a station for a given day of the week, which show heterogeneity per day, variability across the Day of the Week within a station, and variability across stations. Thus, we identified a nesting structure to the PATCO data. Now we need a statistical model which accommodates this structure.
Data that have a nested or hierarchical structure are common in a wide variety of disciplines, and similar methods for analyzing such data are found in these disciplines under different guises. The analyses fall under the heading of random coefficient models (RCM), hierarchical linear models (HLM), and multilevel linear models (MLM) (Littell et al., 1996;Goldstein, 1987;Bryk and Raudenbusch, 1996). Laird and Ware (1982) and Strenio, Weisberg, and Bryk (1983), working independently, proposed essentially identical approaches to the analysis of hierarchical data, both using the EM algorithm. Despite the prevalence of hierarchical data structures, classical analysis ignored such structures for many years, partly due to the underdevelopment of statistical models as well as statistical packages to model such data (Plewis, 1997;Singer, 1998;Suzuki and Sheu, 1998). With software such as PROC MIXED in SAS, HLM, MLWin, and SPSS Mixed Models, implementation of mixed effects models in a variety of fields has become more common. Recently, we've seen mixed effects models used in the transportation field (Deaton and Winebrake, 2000). The mixed effects model offers investigators a model to accommodate the hierarchy of the data. The main advantage of this approach is the variability can be partitioned at each level through the inclusion of random effects (Raudenbush, 1993). Ignoring the hierarchical structure by fitting a fixed effects model would result in an underestimation of the variance of the outcome, which results in underestimation of standard errors and "smaller" levels of statistical significance.
For the PATCO data, the hierarchy of the data consists of three levels: • Reported observation for each day of the week and station combination • Day of the week classification within station • Station.
Days of the week are classified into six classifications: Monday -Friday, and Weekend/Holiday. Holidays will consist of major national and religious holidays. The three level mixed effects model is: where Y ijk is the i-th observation for the j-th day classification of the k-th station, i(j(k)) accounts for the random variance within a day classification, j(k) accounts for the random day classification variance nested within station, and k accounts for the random station variance.
The assumptions for the random elements are: The variance of Y ijk is: In this approach, each station's passenger load is characterized by a set of station and day classification random effects. These estimates are termed Best Linear Unbiased Prediction (BLUP) estimates. BLUP estimates are linear in the sense that they are linear functions of the data; unbiased in the sense that the average value of the estimates is equal to the average value of the quantity being estimated; best in the sense that they have the minimum mean squared error within the class of linear unbiased estimators; and prediction estimates to distinguish them from estimation of the random effects (Robinson, 1991). The station by day classification estimates will be used as the "expected values of passenger loads" for a given station on a specific day classification. BLUP Estimates are derived as follows: where Y is the vector of passenger loads. The dimensions of Y is 4745 × 1, which corresponds to the 365 daily totals for each of the 13 stations. The vector β is also 4745 × 1 with each entry equal to the grand average. The mixed model produces an overall grand average of 2841.38 passengers. The matrix V is the covariance matrix of the outcome and has dimensions 4745 × 4745. The diagonal entries for V is the variance of the outcome for any time and station combination, which is equal to the right-hand side of equation 3 above. The matrixĜ is block diagonal with 13 blocks corresponding to the 13 stations, where G 11 = diag(σ 2 S , σ 2 DS , . . . , σ 2 DS ) is the form of the block. The dimensions of G 11 is 7 × 7; therefore the dimensions ofĜ is 91×91. The matrix Z is the design matrix for the random effects which indicates the random station and random day classification nested in station. The dimensions of Z is 4795 × 91. The number of rows for Z corresponds to the daily observations for the 13 stations. The first 13 columns of Z corresponds to respective station for the observed value with column designation based on alphabetical ordering of the stations. The last 78 columns corresponds to the 6 day classifications ordered Monday through Friday and Weekend/Holidays for each of the 13 station, with the station grouping again ordered alphabetically by station name. The superscript T indicates the transpose of Z andV −1 is the inverse of the estimate of V. Hence the dimension ofû is 91 × 1. These estimates indicate deviations for each station from the grand average and day classification within station from the grand average.
To establish a threshold for concern, we focus on an effect size for the reduction of the grand average in equation (2.1). Effect sizes are unitless measures, which are usually applied to measure the magnitude of a treatment effect (Rosnow and Rosenthal, 1996). These effect sizes are standardized values centered with respect to a specified mean and standard deviation. The particular effect size used here was defined by Cohen (1988) and is commonly referred to as Cohen's d. The formula is: where M 1 is current passenger load, M 0 is the BLUP estimate of the passenger load. The standard deviation, s, is estimated by equation (2.3) evaluated at the estimates for each variance component. For the PATCO data we have: variability from station to station is 3,801,074; variability across days within a station is 12,987; and, the within station and day classification with station variability is 47,880. Each estimate is inserted in equation (2.3) to derive an estimate for the variability. The standard deviation estimate is acquired by taking the square-root of the variance estimate. The main benefit of an effect size is the unitless property which allows for comparison to other studies regardless of outcome, design, or analysis. In our application, the effect size provides a unitless measure for the magnitude of change from the best linear unbiased predicted estimate. We establish an acceptable reduction of less than 35% reduction in average estimated number of passengers for a station on a given day. If an observed passenger load reduces by more than 35%, then we have rise for concern. This 35% reduction threshold is based on an examination of a 35% reduction in the grand average, which corresponds to an effect size of 0.5 as illustrated below: Cohen defined an effect size of 0.5 as a "medium" effect size, which corresponds to the expected number of passengers is at the 69-th percentile of the observed number of passengers. While the effect size estimate per station and day within station based on the 35% threshold deviates around this 0.50 estimate for the grand average, using the grand average as a reference point for setting this acceptability threshold seems reasonable.

Biosurveillance Application
We split the 2001 PATCO data in two halves. We use the first six months to estimate expected daily totals for each station on the PATCO line. The mixed effects model of equation (2.1) is used. Refer to the appendix to see the SAS PROC MIXED syntax. For simplicity, the mixed effects model is implemented on only the work week data (Monday through Friday). This reduces one level of complexity for the limited available data. Indices for equation (2.1) in this reduced sample are: Y ijk , i = 1, 2, . . . , 26 corresponding to the 26 weeks for the first half of the year, j = 1, 2, . . . , 5 corresponding to the day of the week classifications for the 5 workdays, and k = 1, 2, . . . , 13 corresponding to the 13 rail stations on the PATCO line. The dimensions of Y is 1820 × 1, which corresponds to the 140 daily totals (26 weeks with 5 days per week) for each of the 13 stations. The vector β is also 1820 × 1 with each entry equal to the grand average. The matrix V has dimensions 1820 × 1820. The blocks ofĜ , G 11 = diag(σ 2 S , σ 2 DS , . . . , σ 2 DS ) have dimension 6× 6; therefore the dimensions ofĜ is 78× 78. The design matrix Z for random effects has dimensions 1820 × 78. The dimension ofû is 78 × 1.
Variance estimates for the three sources of variability were reported above. Corresponding standard deviations are 1949 for station-to-station, 113 for the across time within a station, and 218 for within time and station. The standard deviation estimates for station to station variability and the within station and day classification are consistent with the summary statistic estimates illustrated in Figure 3. The time within station source of variability is smaller here due to the exclusion of the weekend/holiday data points, which usually have a large deviation from the average due to much less commuters as compared to the workweek.
We produce BLUP estimates for the number of total passengers per station and day classification pair. This is implemented in SAS PROC MIXED through ESTIMATE statements as illustrated in the appendix. We use the second half of the data to determine if any daily totals fall below the threshold values (35% reduction in the BLUP estimates) and determine if there is some reason for the reduced passenger load. Second half U.S. national and major religious holidays are identified and classified with a day classification of Weekend/Holiday. These days include July 4, Labor Day, Thanksgiving, Christmas Eve, Christmas, and New Year's Eve; therefore, these values are excluded from this example. This split sample approach was used by Stern and Lightfoot (1999) in their investigation of salmonella infection for the National Enteric Pathogen Surveillance Scheme data, although their statistical approach differed from the methods discussed here. BLUP estimates are generated by equation (2.4) and passenger loads per station and day classification are produced, and illustrated in Table 1.  Focusing our attention on Philadelphia's 8-th Street and Market street station, there are two times when the observed passenger counts had more than a 35% reduction from the BLUP estimate for the given day classification. Figure  4 illustrates these two occurrences.
Both of these "small" passenger load estimates would raise some level of concern. With this data being historical data, we can find causes for the large declination of passengers for both dates. The September 11, 2001 occurrence corresponds to the terrorist attacks on New York and Washington, D.C. The Mayor of Philadelphia urged many workers to stay home; therefore, reduced number of passengers is expected. Similarly, December 26, 2001 is the day after Christmas; therefore, while this is still a "workday", many companies were closed and many workers did not return to work.
In both these scenarios, we were able to justify the large reduction. What if we can not? Are passengers starting to develop symptoms due to exposure to a biological pathogen? While the answer to this question is not known, the reduced passenger load could provide some evidence of peculiar behavior.

Real-Time Setting
The PATCO regional rail line discussed in this example has integrated magnetic/ smart card fare system. These smart cards contain customer identification numbers for the daily customers, as well as personal identification information such as name, destination, and residence. Guest cards are required for infrequent customers or one-time commuters. Unfortunately, these guest cards will not contain personal identification information. Daily Customers are no longer generic market segments. PATCO has the ability to know exactly who they are, when they typically use the PATCO facilities, and what is happening at the current moment (Brannon, 2003). Entrances and exits are logged instantaneously in PATCO database by swiping of the smart cards. So if a particular daily customer is missing, we know exactly where the customer entered or exited the PATCO line last, as well as personal information such as residence.
SAS/IntrNet Application Dispatcher is a computer service that allows users to pass parameter selections from a Web page to the appropriated SAS program that executes on a server, sending the results back to the user's Web browser. Because the program is executed on the server, end users only need a Web browser. There is no need for SAS on their workstations (Timbers, 2003). The Web browser can be automated to refresh itself periodically.
So we have a database being updated instantaneously, a web-based way to access the data, and a SAS program to perform the analysis and report the results. Recall, for the example discussed in section 3, we were limited to only daily totals for the past year. With this database, we are no longer limited to daily totals.
For each scheduled train per day, we can monitor if the passenger load is substantially small. By accessing the database, using the scheduled trains passenger load for a fixed period of time (i.e. past year), a SAS program using PROC MIXED will fit the 3-level HLM model specified in Section 2. BLUP estimate will be acquired adjusting for the hierarchy of the data. Comparison of the BLUP estimates versus the actual observed passengers will be done. Report will be produced illustrating the estimated passenger load and actual passenger load. With large amounts of data, additional covariates can be included such as adjusting for seasonal trends. Thus, combining the current database with SAS/IntrNet Application Dispatcher, reports will be generated upon a specific scheduled train's conclusion of its inbound or outbound commute. Any evidence of concern is immediately known.

Conclusion
We have presented an application of mixed effects model for estimation of passenger loads on the PATCO regional rail line for any day or station. This mixed effects model has three benefits. First, it accommodates the hierarchical structure of the data, which has repeated observations per day classification nested in station. Second, it provides best estimates of the expected number of passengers per station for any day through the derivation of BLUP estimates. Third, there are a vast number of statistical packages to implement this model. We were able to establish a rule which would indicate a potential concern if the observed passenger load fell substantially below the expected count. A more than 35% decrease was our established rule, which resulted in two days where the passenger load fell below the criterion for the Philadelphia 8-th Street and Market Street station.
Two alternate analytical methods are Cross Sectional Time Series (CSTS) and Generalized Estimating Equations (GEE). CSTS approach was used by Williamson and Hudson (1999) in their time-series analysis of public health surveillance reports. The CSTS approach usually focuses on time-series observations on each of several cross-sectional units, where the cross-sectional units are independent. The CSTS is a 2-level approach with time-series nested within station. For a given station and day of the week classification, we expect the number of passengers to be correlated. An advantage of the mixed effects approach is it allows for this third level of clustering and therefore, models this correlation directly. While for the CSTS model, accounting for this might be more challenging. Under certain specifications of the mixed effects model, the model can fit time-series type models (Verbeke and Molenberghs, 2000). So the mixed effect model can assess if the three-level model is a substantial improvement compared to the two-level CSTS model. This process was done by Gmel et al. (2001) in their assessment of Alcohol Consumption for European men. Their conclusions were "For pooled cross-sectional time series data, control of the potential impact of historical time is of the utmost importance. Hierarchical linear models constitutes a superior alternative to analyze such complex data". The second alternative is GEE. The GEE approach of Liang and Zeger (1986) is extremely popular for repeated measures designs, namely because of their consistent estimates of parameters in the model. The GEE approach is commonly called a population-averaged (PA) approach. In the PA approach, one is not concerned with random deviations per subject. For the PATCO data we're interested in the random deviations per day for a given station. We need this individual deviation which is sometimes refereed to as a subject-specific effect, which are used to produce the BLUP estimates. Hence, one limitation of the GEE approach is there are no BLUP estimates, and these BLUP estimates are needed to establish the threshold cutoff for each day of the week per station. These limitations of both the CSTS and GEE approach indicate that the mixed effects approach is as good as and possibly better than these alternate approaches.
A third approach would be to fit a fixed effects model, which ignores the within cluster per station variability, Fitting this model to the first quarter PATCO data, produces a variance estimate of 218.81. Compared to the solution of equation (2.3), 1965.18. In addition, the passenger load estimates from the Fixed effects model for the Philadelphia 8-th Street and Marker Street station for Monday through Friday are, respectively, 5020.1, 5183.5, 5417.2, and 5246.9. These estimates do not match Table 1. Therefore, we have differences in variance estimates and the estimated average levels per station and day combination. We recognize the fixed effects model is not capturing all sources of variability properly; therefore, we have more confidence in the means and variances estimated by the more complex HLM model. Two limitations of this application focus on limitation of the BLUP estimates. Distribution theory associated with BLUP estimates is not nearly as wellunderstood as it is with conventional estimable functions (Littell et al., 1996). The variance of the BLUP estimates may experience shrinkage, since the observed data are shrunk towards the overall average since the prior means of the random effects is zero (Verbeke and Molenberghs, 2000). An additional limitation is the acceptability of the derived threshold rule. While the rule is based in a statistical framework, collaboration between statisticians, medical community, government, and regional rail managers is needed to derive a more justified rule.
The statistical community has made strides to become integral members of the war on Terrorism and is eager to help U.S. government respond to recent terrorist attacks and assist in planning for potential attacks (Banks, 2002). There are a vast number of diverse applications of statistical models for the war on terrorism. Recently, we've seen discrete event simulation models used to develop the public health infrastructure for bioterrorism response (Hupert et al., 2002). Space-time permutation scan statistics have been implemented for spatial disease surveillance (Kulldorf et al., 2003). Time series analysis has been used to track over-thecounter medication sales as an early statistical detection of anthrax outbreaks (Goldenberg et al., 2002).
One statistical model is not going to provide us the answers to all questions and concerns. The model we discuss, as well as the papers discussed above focus on one area of biosurveillance. For the PATCO example, the reduced passenger load may not be due to terrorist activities, as we saw in our illustration. For the most part, the reduced passenger load will not be due to terrorism. Our model serves as warning signs or one piece of the puzzle. Linking multiple warning signs could provide the necessary evidence of bioterrorist activities.
Consider a situation where one station on the PATCO rail system on a random day fell below the established criterion. This on its own is only one piece of evidence. In addition, suppose there is an increase in the number of over the counter medication sales in the communities surrounding the train station, as well as an increase in patients seen by primary care physicians or patients entering emergency rooms. These three pieces of evidence, could provide necessary information to determine that a biological agent was used on the passengers at the specific train station. This would allow immediate notification to other passengers who have not become symptomatic or who have ignored the symptoms.
While our focus has been on regional rail lines, similar models can be used for other transportation areas such as buses, trolleys, and bridge/tunnel entries. With our limited data, we could only adjust for a small number of factors. With more data, other factors such as seasonal trends, vacation periods, academic years, holiday shutdown, etc. could be covaried. Further advances on the available data and modeling could only improve the importance and resolution of this piece of the puzzle. With immediate access to current data, this model could be implemented on a real time setting. Breaking the rail station down from daily totals, to expected totals for each scheduled train during the daily service would require substantial amount of data but is a feasible extension.

Appendix: SAS Code for Mixed Effect Model
proc mixed covtest ; c1ass station time; model nums1= / solution ddfm=satterth notest; random station time(station) /solution ; estimate 'ASH -Fri' intercept 1 | station 1 0 0 0 0 0 0 0 0 0 0 0 0 time(station) 1 0 0 0 0 /cl alpha=0.01 ; quit; The SAS code will produce three variance components to account for the three levels of the hierarchical structure through the RANDOM Statement. The COVTEST option will perform a naive hypothesis test of whether the variance component is 0 or not. This test is naive because a Z-score is produced for the test. The actual distribution of the variance components follows a mixture of Chi-squares because the value 0 is a boundary value for the possible outcomes. Refer to Verbeke and Molenberghs (2000) for a thorough discussion on this topic. The ESTIMATE statement will produce the BLUP estimate for the designated station on the specified day. The CL alpha=0.01 produces 99% confidence bounds for the estimate. The vertical line in the ESTIMATE statement separates the Fixed and Random effects. Similar statements were written for each station-day combination.