To date, many gene set analysis (GSA) approaches have been developed for identifying differentially expressed gene sets using microarray data. However, these methods are not directly
applicable to RNA-Seq data due to intrinsic difference between two data structures. When testing the differential expression of gene sets, there is a critical assumption that the members in
each gene set are sampled independently in most GSA methods. It means that the genes within
a gene set don’t share a common biological function. The aim of this paper is twofold. First, we
propose a powerful yet simple extension to GSA methods based on the de-correlation (DECO)
algorithm that properly remove the correlation bias in the expression of each gene set. We then
study the performance of our proposed method compared with other GSA methods through a
real RNA-Seq dataset and simulation studies under various scenarios combining with four commonly used normalization methods. Second, we discuss the effect of the complex correlation
structure of gene sets on four normalization methods. As a result, we found that our proposed
method outperforms the others in terms of Type I error rate and empirical power. A comparative study on a public data showed that gene sets identified by our proposed method have better
concordance with biological confirmed pathways than other methods.