<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">JDS</journal-id>
      <journal-title-group>
        <journal-title>Journal of Data Science</journal-title>
      </journal-title-group>
      <issn pub-type="epub">1680-743X</issn>
      <issn pub-type="ppub">1680-743X</issn>
      <publisher>
        <publisher-name>SOSRUC</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="publisher-id">OCT3</article-id>
      <article-id pub-id-type="doi">10.6339/JDS.202010_18(4).0003</article-id>
      <article-categories>
        <subj-group subj-group-type="heading">
          <subject>Research Article</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>Gene Set Enrichment Analysis in RNA-Seq Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <name>
            <surname>Tsai</surname>
            <given-names>Chen-An</given-names>
          </name>
          <xref ref-type="aff" rid="j_JDS_aff_000"/>
        </contrib>
        <aff id="j_JDS_aff_000">Department of Agronomy, Division of Biometry, National Taiwan University, Taipei, Taiwan</aff>
        <contrib contrib-type="author">
          <name>
            <surname>Li</surname>
            <given-names>Pei-Hsun</given-names>
          </name>
          <xref ref-type="aff" rid="j_JDS_aff_001"/>
        </contrib>
        <aff id="j_JDS_aff_001">Department of Agronomy, Division of Biometry, National Taiwan University, Taipei, Taiwan</aff>
      </contrib-group>
      <volume>18</volume>
      <issue>4</issue>
      <fpage>632</fpage>
      <lpage>648</lpage>
      <permissions>
        <ali:free_to_read xmlns:ali="http://www.niso.org/schemas/ali/1.0/"/>
      </permissions>
      <abstract>
        <p>To date, many gene set analysis (GSA) approaches have been developed for identifying differentially expressed gene sets using microarray data. However, these methods are not directly</p>
        <p>applicable to RNA-Seq data due to intrinsic difference between two data structures. When testing the differential expression of gene sets, there is a critical assumption that the members in</p>
        <p>each gene set are sampled independently in most GSA methods. It means that the genes within</p>
        <p>a gene set don’t share a common biological function. The aim of this paper is twofold. First, we</p>
        <p>propose a powerful yet simple extension to GSA methods based on the de-correlation (DECO)</p>
        <p>algorithm that properly remove the correlation bias in the expression of each gene set. We then</p>
        <p>study the performance of our proposed method compared with other GSA methods through a</p>
        <p>real RNA-Seq dataset and simulation studies under various scenarios combining with four commonly used normalization methods. Second, we discuss the effect of the complex correlation</p>
        <p>structure of gene sets on four normalization methods. As a result, we found that our proposed</p>
        <p>method outperforms the others in terms of Type I error rate and empirical power. A comparative study on a public data showed that gene sets identified by our proposed method have better</p>
        <p>concordance with biological confirmed pathways than other methods.</p>
      </abstract>
      <kwd-group>
        <label>Keywords</label>
        <kwd>correlation bias</kwd>
        <kwd>DECO</kwd>
        <kwd>gene set analysis</kwd>
        <kwd>RNA-Seq</kwd>
      </kwd-group>
    </article-meta>
  </front>
</article>
