Pub. online:4 Feb 2026Type:Statistical Data ScienceOpen Access
Journal:Journal of Data Science
Volume 24, Issue 1 (2026): Special Issue: Statistical aspects of Trustworthy Machine Learning, pp. 203–217
Abstract
As the use of Artificial Intelligence (AI), especially Generative AI, becomes ubiquitous, we take a look at the performance of these methods. We specifically focus on concept of fairness element of trustworthiness. We use Statistical Parity Difference and Equalized Odds Difference to mathematically measure fairness. To systematically study how various factors like bias, access to protected categories, types of intervention affect fairness and accuracy, we performed a simulation as a multi-factor experiment. Our results indicate that accuracy and fairness (in terms of statistical parity and equalized odds) tend to go in opposite directions. This opens up the question of whether we can look at methods that can consider both accuracy and fairness simultaneously.
Pub. online:30 Aug 2022Type:Data Science In ActionOpen Access
Journal:Journal of Data Science
Volume 21, Issue 3 (2023): Special Issue: Advances in Network Data Science, pp. 578–598
Abstract
Social network analysis has created a productive framework for the analysis of the histories of patient-physician interactions and physician collaboration. Notable is the construction of networks based on the data of “referral paths” – sequences of patient-specific temporally linked physician visits – in this case, culled from a large set of Medicare claims data in the United States. Network constructions depend on a range of choices regarding the underlying data. In this paper we introduce the use of a five-factor experiment that produces 80 distinct projections of the bipartite patient-physician mixing matrix to a unipartite physician network derived from the referral path data, which is further analyzed at the level of the 2,219 hospitals in the final analytic sample. We summarize the networks of physicians within a given hospital using a range of directed and undirected network features (quantities that summarize structural properties of the network such as its size, density, and reciprocity). The different projections and their underlying factors are evaluated in terms of the heterogeneity of the network features across the hospitals. We also evaluate the projections relative to their ability to improve the predictive accuracy of a model estimating a hospital’s adoption of implantable cardiac defibrillators, a novel cardiac intervention. Because it optimizes the knowledge learned about the overall and interactive effects of the factors, we anticipate that the factorial design setting for network analysis may be useful more generally as a methodological advance in network analysis.