<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">JDS</journal-id>
      <journal-title-group>
        <journal-title>Journal of Data Science</journal-title>
      </journal-title-group>
      <issn pub-type="epub">1680-743X</issn>
      <issn pub-type="ppub">1680-743X</issn>
      <publisher>
        <publisher-name>SOSRUC</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="publisher-id">110409</article-id>
      <article-id pub-id-type="doi">10.6339/JDS.2013.11(4).1186
</article-id>
      <article-categories>
        <subj-group subj-group-type="heading">
          <subject>Research Article</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>Dealing with Failures of Assumptions in Analyses of Medical Care Quality Indicators with Large Databases Using Clustering</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <name>
            <surname>Pietz</surname>
            <given-names>Kenneth</given-names>
          </name>
          <xref ref-type="aff" rid="j_JDS_aff_000"/>
        </contrib>
        <aff id="j_JDS_aff_000">Michael E. DeBakey VA Medical Center and Baylor College of Medicine</aff>
        <contrib contrib-type="author">
          <name>
            <surname>Petersen</surname>
            <given-names>Laura A.</given-names>
          </name>
          <xref ref-type="aff" rid="j_JDS_aff_001"/>
        </contrib>
        <aff id="j_JDS_aff_001">Michael E. DeBakey VA Medical Center and Baylor College of Medicine</aff>
        <contrib contrib-type="author">
          <name>
            <surname>Woodard</surname>
            <given-names>LeChauncy D.</given-names>
          </name>
          <xref ref-type="aff" rid="j_JDS_aff_002"/>
        </contrib>
        <aff id="j_JDS_aff_002">Michael E. DeBakey VA Medical Center and Baylor College of Medicine</aff>
      </contrib-group>
      <volume>11</volume>
      <issue>4</issue>
      <fpage>835</fpage>
      <lpage>849</lpage>
      <permissions>
        <ali:free_to_read xmlns:ali="http://www.niso.org/schemas/ali/1.0/"/>
      </permissions>
      <abstract>
        <p>Abstract: The application of linear mixed models or generalized linear mixed models to large databases in which the level 2 units (hospitals) have a wide variety of characteristics is a problem frequently encountered in studies of medical quality. Accurate estimation of model parameters and standard errors requires accounting for the grouping of outcomes within hospitals. Including the hospitals as random effect in the model is a common method of doing so. However in a large, diverse population, the required assump tions are not satisfied, which can lead to inconsistent and biased parameter estimates. One solution is to use cluster analysis with clustering variables distinct from the model covariates to group the hospitals into smaller, more homogeneous groups. The analysis can then be carried out within these groups. We illustrate this analysis using an example of a study of hemoglobin A1c control among diabetic patients in a national database of United States Department of Veterans’ Affairs (VA) hospitals.</p>
      </abstract>
      <kwd-group>
        <label>Keywords</label>
        <kwd>Cluster analysis</kwd>
        <kwd>logistic regression</kwd>
        <kwd>random effects</kwd>
      </kwd-group>
    </article-meta>
  </front>
</article>
