<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">JDS</journal-id>
      <journal-title-group>
        <journal-title>Journal of Data Science</journal-title>
      </journal-title-group>
      <issn pub-type="epub">1680-743X</issn>
      <issn pub-type="ppub">1680-743X</issn>
      <publisher>
        <publisher-name>SOSRUC</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="publisher-id">1709</article-id>
      <article-id pub-id-type="doi">10.6339/JDS.201901_17(1).0009</article-id>
      <article-categories>
        <subj-group subj-group-type="heading">
          <subject>Research Article</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>Machine Learning Algorithms To Predict The Childhood Anemia In Bangladesh</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <name>
            <surname>Khan</surname>
            <given-names>Jahidur Rahman</given-names>
          </name>
          <xref ref-type="aff" rid="j_JDS_aff_000"/>
        </contrib>
        <aff id="j_JDS_aff_000">Centre for Research and Action in Public Health (CeRAPH), Health Research Institute (HRI), Faculty of Health, University of Canberra, Canberra, Australia； Biomedical Research Foundation (BRF), Dhaka, Bangladesh</aff>
        <contrib contrib-type="author">
          <name>
            <surname>Chowdhury</surname>
            <given-names>Srizan</given-names>
          </name>
          <xref ref-type="aff" rid="j_JDS_aff_001"/>
        </contrib>
        <aff id="j_JDS_aff_001">Institute of Statistical Research and Training (ISRT), University of Dhaka, Dhaka, Bangladesh</aff>
        <contrib contrib-type="author">
          <name>
            <surname>Islam</surname>
            <given-names>Humayera</given-names>
          </name>
          <xref ref-type="aff" rid="j_JDS_aff_002"/>
        </contrib>
        <aff id="j_JDS_aff_002">Institute of Statistical Research and Training (ISRT), University of Dhaka, Dhaka, Bangladesh； Health Management and Informatics Institute, University of Missouri- Columbia, Missouri, USA</aff>
        <contrib contrib-type="author">
          <name>
            <surname>Raheem</surname>
            <given-names>Enayetur</given-names>
          </name>
          <xref ref-type="aff" rid="j_JDS_aff_003"/>
        </contrib>
        <aff id="j_JDS_aff_003">Biomedical Research Foundation (BRF), Dhaka, Bangladesh</aff>
      </contrib-group>
      <volume>17</volume>
      <issue>1</issue>
      <fpage>195</fpage>
      <lpage>218</lpage>
      <permissions>
        <ali:free_to_read xmlns:ali="http://www.niso.org/schemas/ali/1.0/"/>
      </permissions>
      <abstract>
        <p>Anemia, especially among children, is a serious public health problem in Bangladesh. Apart from understanding the factors associated with anemia, it may be of interest to know the likelihood of anemia given the factors. Prediction of disease status is a key to community and health service policy making as well as forecasting for resource planning. We considered machine learning (ML) algorithms to predict the anemia status among children (under five years) using common risk factors as features. Data were extracted from a nationally representative cross-sectional survey- Bangladesh Demographic and Health Survey (BDHS) conducted in 2011. In this study, a sample of 2013 children were selected for whom data on all selected variables was available. We used several ML algorithms such as linear discriminant analysis (LDA), classification and regression trees (CART), k-nearest neighbors (k-NN), support vector machines (SVM), random forest (RF) and logistic regression (LR) to predict the childhood anemia status. A systematic evaluation of the algorithms was performed in terms of accuracy, sensitivity, specificity, and area under the curve (AUC). We found that the RF algorithm achieved the best classification accuracy of 68.53% with a sensitivity of 70.73%, specificity of 66.41% and AUC of 0.6857. On the other hand, the classical LR algorithm reached a classification accuracy of 62.75% with a sensitivity of 63.41%, specificity of 62.11% and AUC of 0.6276. Among all considered algorithms, the k-NN gave the least accuracy. We conclude that ML methods can be considered in addition to the classical regression techniques when the prediction of anemia is the primary focus.</p>
      </abstract>
      <kwd-group>
        <label>Keywords</label>
        <kwd>Anemia prediction</kwd>
        <kwd>children</kwd>
        <kwd>machine learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
</article>
