Measurement Errors and Imperfect Detection Rates on the Transect Line in Independent Observer Line Transect Surveys

This paper proposes a parametric method for estimating animal abundance using data from independent observer line transect surveys. This method allows measurement errors in distance and size, and less than 100% detection rates on the transect line. Based on data from southern bluefin tuna surveys and data from a mike whale survey, simulation studies were conducted and the results show that 1) the proposed estimates agree well with the true values, 2) the effect of small measurement errors in distance could still be large if measurements on size are biased, and 3) incorrectly assuming 100% detection rates on the transect line will greatly underestimate the animal abundance.


Introduction
A naturally important problem in the management of fishery resources is to obtain a reliable estimate of fish abundance.Line transect (LT) surveys have been widely employed in estimating fish abundance.The measurement errors in LT surveys might naturally arise in two ways: rounding errors in observers' estimates and natural limitations in observers' ability to estimate covariates accurately.Chen and Cowling (2001) therefore proposed estimators with corrections of measurement errors.Their estimators are, however, based on another critical assumption: All animals on the transect line are detected.Unfortunately, this assumption is often violated, especially in a marine environment, because animals such as whales or tunas might be underwater when the observer passes over them.Independent observer line transect (IOLT) surveys, using two observer teams instead of one, have therefore been recommended and studied by many researchers.Borchers, Zucchini, and Fewster (1998) developed a general likelihood framework for IOLT surveys.Borchers et al. (1998) provided the Horvitz-Thompson estimator of the animal abundance for IOLT surveys.Chen and Lloyd (2000) proposed a non-parametric estimator of animal abundance for IOLT surveys.Their estimators were, however, not corrected for measurement errors.More recently, Hwang and Huang (2003) provided a regression method for estimation in capture-recapture surveys, in which measurement errors were allowed.This work can, however, be applied to IOLT surveys only when the measurement errors follow normal distributions and the detection abilities of different observer teams are identical.The aim of this paper is to provide estimators of animal abundance with corrections for measurement errors in IOLT surveys when the distribution of measurement errors are arbitrary and the detection abilities of the two observer teams are allowed to differ.
A school is a relatively tight gathering of animals observed in a localized region.Many animal populations aggregate naturally into schools such as minke whales and southern bluefin tunas (SBT).In 1995, the Norwegian government and the International Whaling Commission undertook an IOLT survey for minke whales in the northeastern Atlantic.The participating vessel traversed a distance and had two observer teams at two separate platforms to search for animals.A third team (coordinate team) recorded the position of each detected signal (of the presence of animals) and then calculated the perpendicular distance from each detected school to the transect line using the positions of one or more detected signals.Measurement calibration was not carried out in this survey to gain information on measurement errors.This might be due to the lack of estimation methods with corrections for measurement errors for IOLT surveys.Skaug and Schweder (1999) used hazard models to analyze the data (without correction of measurement errors).To make corrections for measurement errors more feasible, we suggest that observer teams measure the covariates of their own detections independently rather than having the coordinator team measure the covariates for them.A simulation study for such a IOLT survey based on the minke whale data was conducted and results will be reported in the simulation study section.
Another survey that motivates this paper is the SBT survey.Since 1991, CSIRO in Australia and National Research Institute of Far Seas Fisheries in Japan jointly conducted annual surveys for juvenile SBT.In 1994 to 1998, the following LT survey with one observer team was adopted to measure the SBT abundance.A plane flies along a transect line and experienced spotters in the plane search for schools of SBT by eye.The perpendicular distance from the detected school to the transect line is first calculated and then the spotter estimates the total mass of the school by eye.Measurement errors cannot be avoided in this way because of the natural limitation of human eyes.In addition, spotters can only detect surfacing animals and thus the detection rate on the transect line cannot be assumed as 100%.To assess the effects of measurement errors and the imperfect detection on the transect line, IOLT surveys were simulated based on the data of SBT surveys in 1994 to 1998 and the results will be reported in the simulation study sections.
In this paper, the covariate of each school is considered to be bivariate: the perpendicular distance to a school and the weight of a school.Two main indices of SBT abundance are the number of schools in the surveyed region (school population size, N ) and the biomass of SBT per square nautical mile (biomass density, D).To estimate the two indices with corrections for measurement errors, a parametric method is adopted.To reduce the influence caused by misspecified models, moment estimators are chosen over maximum likelihood estimators (mle) because moment estimators do not require the distributions of errors but only their first few moments.In addition, a non-parametric method is incorporated to estimate the indices in order to further reduce the sensitivity to model selection.The proposed estimators not only correct for measurement errors but also allow for less than 100% detection rates on the transect line.The same method can also be applied to univariate covariate cases such as the minke whale survey, which will be given in the simulation study section.
To evaluate the performance of the proposed estimators, simulation studies were conducted based on the minke whale survey and the SBT surveys.They were found to be quite accurate on average with small relative-root-mean-squared errors (RMSE), the ratio of the square root of mean square error to the parameter of estimate.The simulation results indicate that incorrectly assuming 100% detection rates on the transect line will greatly underestimate the animal abundance.
The structure of this paper is as follows.The parametric models used for the measurement errors are introduced in Section 2 and moment estimators for the model parameters are derived in Section 3. The estimators for the detection probability on the transect line, biomass density, and school population size are developed in Section 4. In addition, the simulation studies based on the data from minke whale survey and the SBT surveys are reported in Section 5.These data might be available from the organizations conducting these surveys.Finally, Section 6 summarizes the paper and discusses possible extensions.

The Parametric Models
The conditional probability of detection given that a school is present for sampling is called the detection probability.It may depend on covariates associated with the school or with conditions at the time of survey.Here only the perpendicular distance from the transect line to a school, X, and the size of that school, S, are considered as covariates.In our model, the distance X can be negative, i.e. when the detected school is on the left of the vessel or plane.The detection probability decreases with the magnitude of distance from the transect line and it increases with the school size.
Let the index ij in the data be associated with the j-th school detected by team i, i = 1, 2. Let Y ij be the measurement of distance X ij and T ij be the measurement of size S ij .In addition, let g i denote the detection probability for team i and f denote the density function of (X, S) before taking into account detection or non-detection.Then the probability that a school is detected by team i is π i = g i f dxds and the probability that a school is detected by both teams is π 12 = g 1 g 2 f dxds.The detection probability at x = 0 (transect line) for team i is assumed to be constant and denoted by c i .Thus 0 < c i ≤ 1.
All pairs of distances and sizes of schools detected by team i, i.e. (X ij , S ij ), are assumed to be independently and identically distributed.In addition, the detections by team 1 are assumed to be independent of the detections by team 2. For measurement errors, an additive model is used for errors in X and a multiplicative model for errors in S: (2.1) where xij and sij are measurement errors in X ij and S ij , respectively.Note that sij is positive.We also assume independence between (X ij , S ij ) and xij , between (X ij , S ij ) and sij , and between xij and sij .In addition, the distribution of xij is assumed to be symmetric at 0. As the transect line is the center line of the survey region of width 2w, the density function of X is assumed to be uniform on (−w, w): where w is a large value.Here the size, S, is considered as a continuous random variable and is often the weight of the school.The density function of S is assumed to be gamma with parameter 2 }, and independent of X. Hence the joint density of (X, S) Let l be the total length of transect line searched.Then the biomass density D, which is N E(S)/2lw by definition, is equal to The detection probability for team i adopted here is generalized from that of Drummer and McDonald (1987), where λ i > 0 is a scale parameter, p i ≥ 0 is a shape parameter, and β i ≥ 0 describes the relationship between the weight S and the detection ability.Hence, the density function of (X, S) for a school detected by team i can be derived as: It can be shown that the weight of a school detected by team i has approximately a gamma distribution with parameters v 1 + β i and v 2 as w approaches to ∞.

Moment Estimators for Parameters
The moment estimators for all parameters in the models, except c i 's, will be derived in this section.The criterion of selecting moments for deriving moment estimators is to use as low orders of moments as possible and equal numbers of equations from both teams.To show the derivation of the proposed estimators easily, the index j, denoting the j th detection, in all notations are omitted.For example, X i denotes the perpendicular distance of a detection by team i. Assume that the first two non-trivial moments of xi and si , i = 1, 2 are known and they are denoted as , and u xi4 = E( 4xi ), for i = 1, 2. By models ( 1) and ( 2), an appropriate set of moments is and these moments can be expressed as: Let n i be the number of schools detected by team i and for convenience denote the sample moments as follows: When w is large, it can be shown that for team i where k, h are non-negative integers and k is even.In an appropriate set of moments, replace the moments of Y i and T i by their sample moments and approximate the moments of X i and S i using equation (4).Then the following moment equations for estimating (v 1 , v 2 , λ 1 , λ 2 , p 1 , p 2 , β 1 , β 2 ) are obtained: Solving these equations yields the following moment estimators: In addition, for i = 1, 2, the estimator for p i , pi , is found to be the solution of equation . The solution of p i is unique since T (p) is strictly decreasing Chen (1998).Finally, the estimator for Moment estimators of all parameters except c i 's are now found.

Estimating Detection Rates and Biomass
For estimating the school population size N , a non-parametric approach proposed by Chen and Lloyd (2000) is used here.The estimate of N depends on the level of heterogeneity in the detection process, measured by α = π 11 /π 1 π 2 : where n 11 is the number of schools detected by both teams.Note that α = 1 represents "homogeneity" and under homogeneity, the estimator becomes the well-known Petersen estimator.Based on the model in Section 2, the α value can be expressed explicitly as: Its estimator, α, can be obtained by substituting the parameter estimates into the right-hand side and then calculating the integral numerically.The school population size N and biomass density D can therefore be estimated by The method of moments fails to estimate c i 's because they are missing in the moment equations.We therefore use the relationship between c i and N to estimate c i .For the LT surveys with a single observer team, Chen and Cowling (2001) provided an estimator of N with corrections for measurement errors using the same models (1), (2), detection probability (3) with c i = 1, and the method of moments.Their estimator of N based on the data collected by team i therefore estimates c i N and is denoted as Ni .It can be expressed explicitly as: The estimator of D based on the data from team i with perfect detection Note that the parameter estimators on the right-hand side are our estimators in Section 3, not Chen & Cowling's estimators.The value of c i can therefore be estimated by ĉi = Ni / N .
The moment estimators in Section 3 are all asymptotically normally distributed and asymptotically unbiased for the corresponding parameters because they are all smooth functions of independent and identically distributed means (Serfling, 1980).Although the estimator N (α) is not asymptotically unbiased (Chen and Lloyd, 2000), both the RB (relative bias) and the RMSE (relative mean squared error) of N (α) approach zero as N gets larger.Their orders are (shown in Appendix 1): As the variances of the estimators are difficult to derive theoretically, the bootstrap method may be used to evaluate these variances and confidence intervals.

Simulation Studies
In this section, simulation studies are reported to assess the effects of measurement errors and incorrectly assuming 100% detection rates on the transect line.They are based on the data collected in the SBT surveys and in the minke whale survey.

The SBT surveys
The parameter values were selected to be in the range of estimates from the data collected in the (single team) SBT line transect surveys conducted between 1994 and 1998 (see Chen and Cowling, 2001 for the estimates).Because these surveys used only one observer team, all parameters except c i in the models (1), (2) and the detection probability (3) were set to be the same for the two observer teams.Because the set of moments used to derive moment estimators depends on which team is labeled as team 1, three sets of c 1 and c 2 were selected to study the effect of labeling: c 1 = 0.9, c 2 = 0.7 (c 1 > c 2 ); c 1 = 0.8, c 2 = 0.8 (c 1 = c 2 ); c 1 = 0.7, c 2 = 0.9 (c 1 < c 2 ).
The value of l was selected to be 2000 (meter), the value of w to be 20, the value of D to be 0.9, and the value of N to be 1200.The (perpendicular) distances from 1200 schools to plane were generated from an uniform distribution on (−20, 20) and weights of 1200 schools were generated from a gamma distribution with parameters (0.6, 100).The detection probability (3) with β i = 0.2, p i = 3.0, λ i = 3.8 was used to determine whether or not a school was detected by a team.In addition, for each detection, a N (0, u x2 ) measurement error was added to the distance X as the observed X and a Gamma(v s1 , v s2 ) measurement error was multiplied to the weight S as the observed S.
Table 1: Simulation results for unbiased weight measurements; D = 0.9.In 1998, experiments were conducted to assess measurement errors in the SBT surveys.In these experiments two planes independently estimated the location and weight of each detected school.From these data, the measurement errors in X were fitted well by a normal distribution and its variance (u x2 ) was estimated to be 0.50.In addition, the first (u s1 ) and second (u s2 ) moments of the measurement error in S were estimated to be 1.58 and 3.09, respectively.We therefore assume that the measurement errors of both teams were independent and had the same distributions: the distance errors followed N (0, u x2 ) and the weight errors followed Gamma(v s1 , v s2 ).Because u s1 = v s1 v s2 and u s2 = v s1 v 2 s2 (1+v s1 ), the parameters v s1 , v s2 were selected to be 4 and 0.4, respectively, such that u s1 = 1.58 and u s2 = 3.09.In order to assess the interaction effect of measurement errors in X and in S, a larger variance of the distance error: u x2 = 1.0 and a set of parameters for an unbiased weight error: v 1s = 2, v 2s = 0.5 were also studied.The simulation results are based on 500 simulations for each case and reported in Tables 1 to 4, in which the subscript u represents the estimates with corrections for imperfect detection rates on the transect line but without corrections for measurement errors.For estimating biomass density, Tables 1 and 2 indicate that incorrectly assuming 100% detection on the transect line will underestimate the biomass density D even with the corrections for measurement errors ( D1 , D2 in tables).Recall that Di is obtained by assuming prefect detection of team i on the transect line.The values of D1 , D2 in Tables 1, 2 also reveal that the level of underestimation depends on how close to 100% the true detection rate on the transect line c i is.It is also observed that in Tables 1 and 2 the RMSE of Di for the team i with the larger c i value is smaller than that of the other team.This is expected since the team with the larger c i can detect any school with a higher probability and thus the number of schools this team detects is larger on average.When imperfect detection rates on the transect line are allowed, if the mean weight measurement is the exact weight (i.e.unbiased) then the correction for measurement errors is not critical within the studied range of parameters.As shown in Table 1, Du is slightly less accurate than D on average and their RMSE's are comparable.However, if the weight measurement is biased, the correction for measurement errors is significant: D is 50% more accurate than Du on average (in terms of RB) and the RMSE of D is less than 20% of Du as seen in Table 1.For estimating school population size, the results in comparing estimators are similar to those for estimating biomass density as seen in Tables 3 and 4. The proposed estimators can precisely estimate c i as shown in the last two rows in Tables 1 and 2: RB and RMSE are both very small for teams 1 and 2 for all cases.
Which team is labeled as team 1 seems to be minor for estimation as shown in Tables 1 to 4. The performances of D, N , Du , and Nu for c 1 > c 2 are similar to those for c 1 < c 2 .In addition, another interesting point observed in these tables is that the effect of the size of measurement errors on X seems not important in the performances of D, N , Du , and Nu either.The values of these estimators for σ 2 x = 0.5 are similar to those for σ 2 x = 1.0.To further check this point, two extreme cases, σ 2 x = 0.1 and σ 2 x = 3.0, were simulated (but not shown here).The simulation results indicate that the corrected estimators D and N performed almost the same while the uncorrected ones Du and Nu performed slightly worse as the size of measurement errors on X changed from tiny (σ 2 x = 0.1) to huge (σ 2 x = 3.0).Overall, incorrectly assuming 100% detection rates on the transect line will seriously underestimate the animal abundance.The correction for measurement errors is not significant if the weight measurements are unbiased.This correction is, however, important even when the measurement errors on X are small, if the weight measurements are biased.

The minke whale survey
To estimate the abundance of minke whale in the northeastern Atlantic, the Norwegian government and the International Whaling Commission undertook an IOLT survey in 1995 (Schweder et al., 1997).Each participating vessel had two independent observer platforms A and B (teams 1 and 2 in our notation).A total of 772 individual minke whales were detected by the two platforms.The perpendicular distance X to the detected school was recorded by the third team using the positions of the detected signals.The half width of the effective surveyed region, w, was estimated to be 2,100 m (Skaug and Schweder, 1999).
Although the estimators demonstrated in Sections 3 and 4 are for the bivariate covariate (X, S) case, the same estimation method can be easily applied to the univariate covariate X case as follows.The model ( 1) is used to model the errors in X and the function (3) with β i = 0 is used to describe the detection probability.Because the odd moments of X are zero, the second and fourth moments of X are used to derive the moment estimators.Then the moment estimator of p i , pi , is the solution of equation . In addition, the moment estimator of Similar to Section 4, the estimators α, N , Ni and c i are: If there were no measurement errors in X, then u xi2 = u xi4 = 0 for i = 1, 2. Assuming no measurement errors, the parameter estimates for platform A (team 1) were found to be (ĉ 1 , p1 , λ1 ) = (0.44, 1.82, 7.93) and for platform B (team 2) to be (ĉ 2 , p2 , λ2 ) = (0.42, 1.82, 6.80).For illustration purpose, the unit of X was set as 100 m.The values of X were plotted on histograms by platforms, together with their estimated densities, as shown in Figure 1.It can be seen from these histograms that the density of detected X was lower in the interval [0,2) than in the interval [2,4).This is a violation of the property derived from equation (3), the detection probability, that the density of detected X is monotone decreasing with x and thus it has peak on the transect line.This violation explains why the estimated densities for the two platforms did not fit the data very well around x = 0 as shown in Figure 1.Yet the Chi-square goodness-of-fit test gives a pvalue of 0.63 for platform A and a p-value of 0.32 for platform B, suggesting acceptable overall fits for both platforms.In addition, the detection probability on the transect line for platform A was estimated to be 0.44, slightly higher than that for platform B, 0.42.This implies that platform A might be slightly more efficient, but not significantly, in detection than platform B when animals were around the transect line.To demonstrate the effect of measurement errors, measurement errors for both platforms were simulated and added to the X values in the data.Measurement errors from the two platforms are assumed to follow the same distribution, normal with mean 0 and variance σ 2 .In addition, because the side of the detected animal was not recorded, a sign (+ for the right-hand side and -for the left-hand side) is randomly assigned to each X in the data.The simulated observation of X is therefore either X + error or −X + error.Two cases, σ 2 = 0.5 and σ 2 = 1, were simulated.The corrected and uncorrected estimates based on the simulated data are reported in Table 5.The estimator N with corrections was slightly larger than that without corrections for both cases as seen in Table 5.When the error was not large (σ 2 = 0.5), the difference of N with and without corrections (3383 − 3354 = 29) might be minor.When the error was larger (σ 2 = 1.0), the difference was also larger (3389 − 3330 = 59).Yet these differences seem not significant.In contrast, the effect of incorrecting assuming 100% detection rates on the transect line is quite significant.As can been seen in Table 5, Ni , i = 1, 2 were both less than half of N regardless of corrections for measurement errors.This indicates that incorrectly assuming 100% detection rates on the transect line leads to underestimate the value of N dramatically, even with correction for measurement errors.

Conclusion and Discussion
In this paper, a parametric method is provided for analyzing data from IOLT surveys in which measurement errors are expected and the detection probability on the transect line might not be 1.0.The proposed method assumes that the covariate measurements from the two observer teams are independent.Therefore, the method can be applied only for the IOLT surveys in which the two observer teams measure covariates of their own detections independently.
We restrict our discussion to the case of a bivariate covariate, the distance and weight of a school.In addition, two indices of animal abundance, school population size and biomass density, are studied in this paper.To reduce the sensitivity due to the misspecified model, the method of moments is selected to estimate parameters and a non-parametric method is incorporated to estimate the two indices.As found in the simulation study, the proposed estimators can estimate animal abundance quite well.This study also suggests that if the weight measurement could be biased, the correction for measurement errors is essential even when the measurement errors in distance are small.In addition, the detection probabilities on the transect line should not be assumed to be 1.0 as done in many papers unless these probabilities are really very close to 1.0.Incorrectly assuming these probabilities to be 1.0 will greatly underestimate the animal abundance.
For some animal species, such as minke whale, instead of biomass, the mean number of animals per unit area might be of interest.The size of a school is therefore not the weight but the number of animals in that school.The method used in this paper can easily be adapted to abundance estimation for such animals, if an appropriate discrete model for measurement errors on the school size is provided.
In this paper, we relax the assumption of 100% detection probability on the transect line, but we assume that the detection probability on the transect line is a constant, independent of school size or other covariates.When this assumption is not reasonable, allowing variable detection probability on the transect line might be necessary to estimate the animal abundance.In addition, the peak of the detection probability is always assumed to be on the transect line, but this is often violated.For example, the data from the minke whale survey in year 1995 (see Figure 1) suggest that the peak for the detection probability was not on the transect line.More research on further relaxing this assumption might be necessary to further improve the estimation in animal abundance.

Figure 1 :
Figure 1: Histograms and the estimated densities of X detected in the minke whale survey.Left panel is from platform A and the right panel is from platform B.

Table 3 :
Simulation results for unbiased weight measurements; N = 1200.

Table 5 :
Estimates with and without corrections for measurement errors in X for the minke whale survey.