<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">JDS</journal-id>
      <journal-title-group>
        <journal-title>Journal of Data Science</journal-title>
      </journal-title-group>
      <issn pub-type="epub">1680-743X</issn>
      <issn pub-type="ppub">1680-743X</issn>
      <publisher>
        <publisher-name>SOSRUC</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="publisher-id">100307</article-id>
      <article-id pub-id-type="doi">10.6339/JDS.201207_10(3).0007</article-id>
      <article-categories>
        <subj-group subj-group-type="heading">
          <subject>Research Article</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>Direct and Unbiased Multiple Imputation Methods for Missing Values of Categorical Variables</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <name>
            <surname>Xiao</surname>
            <given-names>Yuanhui</given-names>
          </name>
          <xref ref-type="aff" rid="j_JDS_aff_000"/>
        </contrib>
        <aff id="j_JDS_aff_000">Georgia State University</aff>
        <contrib contrib-type="author">
          <name>
            <surname>Song</surname>
            <given-names>Ruiguang</given-names>
          </name>
          <xref ref-type="aff" rid="j_JDS_aff_001"/>
        </contrib>
        <aff id="j_JDS_aff_001">Centers for Disease Control and Prevention</aff>
        <contrib contrib-type="author">
          <name>
            <surname>Chen</surname>
            <given-names>Mi</given-names>
          </name>
          <xref ref-type="aff" rid="j_JDS_aff_002"/>
        </contrib>
        <aff id="j_JDS_aff_002">Centers for Disease Control and Prevention</aff>
      </contrib-group>
      <volume>10</volume>
      <issue>3</issue>
      <fpage>465</fpage>
      <lpage>481</lpage>
      <permissions>
        <ali:free_to_read xmlns:ali="http://www.niso.org/schemas/ali/1.0/"/>
      </permissions>
      <abstract>
        <p>Abstract: Missing data is a common problem in statistical analyses. To make use of information in data with incomplete observation, missing values can be imputed so that standard statistical methods can be used to analyze the data. Variables with missing values are often categorical and the miss ing pattern may not be monotone. Currently, commonly used imputation methods for data with a non-monotone missing pattern do not allow di rect inclusion of categorical variables. Categorical variables are converted to numerical variables before imputation. For many applications, the imputed numerical values for those categorical variables must then be converted back to categorical values. However, this conversion introduces bias which can seriously affect subsequent analyses. In this paper, we propose two direct imputation methods for categorical variables with a non-monotone missing pattern: the direct imputation approach incorporated with the expectation maximization algorithm and the direct imputation approach incorporated with a new algorithm: the imputation-maximization algorithm. Simulation studies show that both methods perform better than the method using vari able conversion. An application to real data is provided to compare the direct imputation method and the method using variable conversion.</p>
      </abstract>
      <kwd-group>
        <label>Keywords</label>
        <kwd>bias</kwd>
        <kwd>categorical variable</kwd>
        <kwd>HIV</kwd>
      </kwd-group>
    </article-meta>
  </front>
</article>
