Comparisons of Gene Expression Indexes for Oligonucleotide Arrays

: High density oligonucleotide arrays have become a standard research tool to monitor the expression of thousands of genes simultaneously. Aﬀymetrix GeneChip arrays are the most popular. They use short oligonucleotides to probe for genes in an RNA sample. However, important challenges remain in estimating expression level from raw hybridization intensities on the array. In this paper, we deal with the problem of estimating gene expression based on a statistical model. The present method is like Li and Wong model (2001a), but assumes more generality. More precisely, we show how the model introduced by Li and Wong can be generalized to provide new measure of gene expression. Moreover, we provide a comparison between these two models.


Introduction
High density oligonucleotide expression arrays are now widely used in many area of biomedical research for measurements of gene expression. In the Affymetrix system, an array contains several thousands of genes and ESTs. To probe genes, oligonucleotides of length 25 bp are used. Typically, a mRNA molecule of interest (usually related to a gene) is represented by a probe set. Every probe set consists of 10-20 probe pairs. Every probe pair is composed of a perfect match P M, a section of the mRNA molecule of interest and a mismatch MM, which is identical to the perfect match probe except for the base in the middle (13th) position. After RNA samples are prepared, labeled and hybridized with arrays, these are scanned and images are produced and processed to obtain an intensity value for each probe. These intensities, P M ij and MM ij , represent the amount of hybridization for arrays i = 1, ...I and probe pairs j = 1, ..., J for any given probe set. There has been considerable discussion over the appropriate algorithm for constructing single expression estimates based on multiple-probe hybridization data. At present, there are several analytical methods to measure such intensities. However, we will only discuss the Affymetrix Microarray Suite MAS4. 0 andMAS5.0 (1999 and2001) and the method of Li and Wong LW (2001a). The MAS 4.0 uses an average over probe pairs P M ij −MM ij , j = 1, ...J for each array i = 1, ...I. This average difference (AD) is motivated by underlying statistical model: P M ij − MM ij = θ i + ij , j = 1...J. The expression index on array i is represented with the θ i . AD is an appropriate estimate of θ i if the error term ij has equal variance for j = 1, ..., J. However, the equal variance assumption does not hold for GeneChip probe level data, since probes with larger mean intensities have larger variances, see Irizarry et al. (2003c). The latest version of this software MAS5.0 computes the anti-log of a robust average of The basic disadvantage for this method is that there is no learning about probe characteristics, based on the performance of each probe across chips. To account for probe affinity effect, LW method suggests that P M ij − MM ij = θ i φ j + ij , i = 1, ...I, j = 1, ...J, ∼ = N (0, σ 2 ). The probe affinity effect is represented by φ j . The main object of this paper is to generalize this model by considering separate models for P M and MM and making general assumptions on the errors. This paper is organized as follows: The next section deals with a general model based on Li and Wong's model. We make general assumptions on the empirical variance and correlation of and between P M and MM, and estimate the parameters using maximum likelihood. Based on our analysis, we will show that our model gives an unbiased estimate of the expression index with low variance. Section 3 is concerned by a special case using P M only with inconstant variance. In addition, we compare how well these methods perform using the spike-in experiment H GU95A described in more details in the same section.

The full model: A simple case
Following Li and Wong, the P M and MM intensities are modeled as: where I denotes the number of samples and J denotes the number of probe pairs in a probe set. θ is the expression index, ν is a non-specific cross-hybridization term, α is the rate of increase of MM intensity and φ is the additional rate of increase of the P M intensity. Although this model was introduced by Li and Wong, they have only treated the reduced case which we will call RLW : Lemon et al.(2002) use the above equations, but assume that the P M and MM values are independent so their model describes the marginal distributions. Recently, Taib (2004) introduced a model in which it is assumed that the errors are correlated but with common variance and a constant correlation across samples. In general, these assumptions do not fit the observations as we will see later.
We propose then to augment the recent model to permit to the empirically observed correlation between P M and MM and the variances of P M and MM to change across the arrays as is shown in Figures 1-3. More precisely, we assume that the errors terms follow a bivariate normal distribution according to where σ 2 i is the variance and ρ i is the correlation coefficient. In the following this model will be called FLW1.

The estimates
Given data (P M ij , MM ij ) we can estimate the parameters of our model using the maximum likelihood. It is known that the likelihood function of the bivariate normal distribution can be expressed as: The corresponding log likelihood function is To get the estimates of the parameters we take the partial derivatives with respect to the corresponding parameters and we set the resulting expression equal to zero. Hence, we obtain: The last two equations can be written as: 2 ) These formulas have to be understood as steps in an iterative procedure that will lead to final estimates. In this case we will not be concerned by solving these equations. However, they are useful when it comes to deriving various properties. If we assume the other parameters to be known, It will be easy to see thatθ i is an unbiased estimate of θ i since E θ i = θ i . For the variance, we get:

Comparisons between FLW1 and RLW
In this section, we will give a brief description of the reduced Li and Wong model and make a comparison between the estimates obtained in each model in terms of accuracy (bias) and precision (variance) . For the RLW model, we recall that: The estimated expression indexθ i can be obtained using the maximum likelihood or the least squares. Henceθ The variance of the estimate, based on the assumptions of RLW model is But, based on the FLW1 assumptions, on can easily show that and it is easy to see that (2.3) ≤ (2.4). Given the Li and Wong Model, one could choose a suitable model based on the distribution of the errors. Another important point for the selection of the convenient estimate is the unbiasedness and low variance. Since we have shown that the correspondingθ i for our model is an unbiased estimate with low variance, and according to the comparison above, we see that the full model should be a good choice.

The full model: A general case
In this section section, we propose to augment the last model to take into account the difference of the empirically observed variances between P M and MM as is shown in Figure 4. We will then assume that the error terms in 2.1 and 2.2 are distributed according to where σ 2 1,i and σ 2 2,i are the variances and ρ i is the corresponding correlation coefficient. From now on, we will call this model the FLW2 model.
In this case, the likelihood function has the form The same computations as above lead to the maximum likelihood estimates of the parameters: Given the other parameters, it is thus easy to see that the estimateθ i of the expression index is unbiased. For the variance we get On the other hand the variance ofθ i based on the RLW is and it is not easy to compare these variances. For example when a i ≥ 0 we have (2.5) ≤ (2.6). In general, we use data from the spike-in studies HGU95A and HGU133 to make this comparison (see

The model based on PM only
It has been observed that some MM probes may respond poorly to the changes in the expression level of the target gene as discussed in Li and Wong (2001b). This phenomenon raised questions on the efficiency of using MM probes, and led some investigators to calculate fold changes using only P M probes. To investigate the relative performance of P M-only using RLW and FLW, we modified the FLW model to estimate gene expression levels using only P M probes, and compared it to RLW. The modified FLW model becomes N (0, σ 2 i ) The same procedure as above gives: To evaluate how this model performs, we use a spike-in study HGU95A designed by Affymetrix.

Data
HGU95AGeneChip is a subset of the data used to develop and validate the MAS5.0 algorithm. Human cRNA fragments matching 16 probe-sets on the HGU95A GeneChip were added to the hybridization mixture of the arrays at concentrations ranging from 0 to 1024 picoMolar. The same hybridization mixture, obtained from a common tissue source, was used for all arrays. The cRNAs were spiked-in at a different concentration on each array (apart from replicates) arranged in a cyclic Latin square design with each concentration appearing once in each row and column. Within each experiment, only the spike-in concentrations are varied, background is the same for all arrays. Fold change calculations are always made within experiment to ensure that only spiked-in genes will be differentially expressed. For more details see(http://www.affymetrix. com/analysis/downloadcenter2.affx).

Numerical results
This section is concerned by evaluating how the FLW based on P M-only performs. Actually we present a numerical comparison between FLW and RLW using the spike-in study HGU95A GeneChip. we computed our estimates using the R environment see Ihaka and Gentleman (1996), which can be freely obtained from (http://cran.r-project.org) and the methods for Affymetrix Oligonucleotide Arrays R package described in Irrizary et al. (2003a), which is freely available as part of the Bioconductor project http://www.bioconductor.org. We then use a benchmark for Affymetrix GeneChip expression measures developed by Cope et al. (2003) which aims to evaluate and compare summaries of Affymetrix probe level data. We submitted our data to the corresponding webtool which is available at (http://affycomp.biostat.jhsph.edu).
The results obtained are summarized in the table below (see Tables 1-2). We got results for RLW from (http://affycomp.biostat.jhsph.edu/AFFY2/rafajhu.edu/030519.1451/completeassessment.pdf) and results corresponding to FLW are given in the Affycompwebtool report. The score components for Table NR1  6. IQR: Interquartile range of log ratios among genes not differentially expressed.
8. Obs (low)int-fc slope: Slope obtained from regressing observed log-foldchanges against nominal log-fold-changes for genes with nominal concentrations less than or equal to 2.
9. F C = 2, AUC (F P < 100): Area under the ROC curve up to 100 false positives when comparing arrays with nominal fold changes of 2.
10. F C = 2, AFP, call if f c > 2: Average false positives if we use fold-change> 2 as a cut-off when comparing arrays where nominal fold-changes are 2.
11. F C = 2, ATP, call if f c > 2: Average true positives if we use fold-change > 2 as a cut-off when comparing arrays where nominal fold-changes are 2. and for Table 2: 1. Median SD: Median SD across replicates.
2. null log-fc IQR: Inter-quartile range of the log-fold-changes from genes that should not change.
3. null log-fc 99.9%: 99.9% percentile of the log-fold-changes if from the genes that should not change.
4. Signal detect R2: R-squared obtained from regressing expression values on nominal concentrations in the spike-in data.
5. Signal detect slope: Slope obtained from regressing expression values on nominal concentrations in the spike-in data.
6. low.slope: Slope from regression of observed log concentration versus nominal log concentration for genes with low intensities.
7. med.slope: As above but for genes with medium intensities.
8. high.slope: As above but for genes with high intensities.
10. Obs-(low)int-fc slope: Slope obtained from regressing observed log-foldchanges against nominal log-fold-changes for genes with nominal concentrations less than or equal to 2.
11. low AUC: Area under the ROC curve (up to 100 false positives) for genes with low intensity standardized so that optimum is 1.
12. med AUC: As above but for genes with medium intensities.
13. high AUC: As above but for genes with high intensities.
14. weighted avg AUC: A weighted average of the previous 3 ROC curves with weights related to amount of data in each class (low,medium,high).
For more details we refer to Irizarry et al. ( 2003c).

Conclusions
We have presented a comparison between the reduced and full form of Li and Wong models using either the full bivariate or P M-only models. To understand the difference in the performance of calls generated by these two models, we used both theoretical and numerical criteria. To make a decision as a choice of a model, one can make comparison in terms of accuracy(unbiased or low bias) and precision (low variance). We have shown that FLW1 has a less variance than RLW. Furthermore, using the Spikein study, it seems clear that FLW2 has considerably less variance than RLW. We also see that the P M-only model provides important improvements in various aspects compared to the same model based on RLW.