Traditional and GenAI Text Analysis of COVID-19 Pandemic Trends in Hospital Community Benefits IRS Documentation
Volume 22, Issue 3 (2024): Special issue: The Government Advances in Statistical Programming (GASP) 2023 conference, pp. 393–408
Pub. online: 23 July 2024
Type: Data Science In Action
Open Access
Received
1 December 2023
1 December 2023
Accepted
14 June 2024
14 June 2024
Published
23 July 2024
23 July 2024
Abstract
The coronavirus disease 2019 (COVID-19) pandemic presented unique challenges to the U.S. healthcare system, particularly for nonprofit U.S. hospitals that are obligated to provide community benefits in exchange for federal tax exemptions. We sought to examine how hospitals initiated, modified, or disbanded community benefits programming in response to the COVID-19 pandemic. We used the free-response text in Part IV of Internal Revenue Service (IRS) Form 990 Schedule H (F990H) to assess health equity and disparities. We combined traditional key term frequency and Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) clustering approaches with a novel Generative Pre-trained Transformer (GPT) 3.5 summarization approach. Our research reveals shifts in community benefits programming. We observed an increase in COVID-related terms starting in the 2019 tax year, indicating a pivot in community focus and efforts toward pandemic-related activities such as telehealth services and COVID-19 testing and prevention. The clustering analysis identified themes related to COVID-19 and community benefits. Generative Artificial Intelligence (GenAI) summarization with GPT3.5 contextualized these changes, revealing examples of healthcare system adaptations and program cancellations. However, GPT3.5 also encountered some accuracy and validation challenges. This multifaceted text analysis underscores the adaptability of hospitals in maintaining community health support during crises and suggests the potential of advanced AI tools in evaluating large-scale qualitative data for policy and public health research.
Supplementary material
Supplementary MaterialThe zipped supplementary material file includes code and output for this analysis.
References
Alomari A, Idris N, Sabri AQM, Alsmadi I (2022). Deep reinforcement and transfer learning for abstractive text summarization: A review. Computer Speech & Language, 71: 101276. https://doi.org/10.1016/j.csl.2021.101276
Azam N, Yao J (2012). Comparison of term frequency and document frequency based feature selection metrics in text categorization. Expert Systems with Applications, 39(5): 4760–4768. https://doi.org/10.1016/j.eswa.2011.09.160
Hadley E, Marcial LH, Quattrone W, Bobashev G (2023). Text analysis of trends in health equity and disparities from the internal revenue service tax documentation submitted by US nonprofit hospitals between 2010 and 2019: Exploratory study. Journal of Medical Internet Research, 25(1): e44330. Company: Journal of Medical Internet Research Distributor: Journal of Medical Internet Research Institution: Journal of Medical Internet Research Label: Journal of Medical Internet Research Publisher: JMIR Publications Inc., Toronto, Canada. https://doi.org/10.2196/44330
Rubin DB, Singh SR, Jacobson PD (2013). Evaluating hospitals’ provision of community benefit: An argument for an outcome-based approach to nonprofit hospital tax exemption. American Journal of Public Health, 103(4): 612–616. https://doi.org/10.2105/AJPH.2012.301048
Saghafian S, Song LD, Raja AS (2022). Towards a more efficient healthcare system: Opportunities and challenges caused by hospital closures amid the COVID-19 pandemic. Health Care Management Science, 25(2): 187–190. https://doi.org/10.1007/s10729-022-09591-7
Young GJ, Chou CH, Alexander J, Lee SYD, Raver E (2013). Provision of community benefits by tax-exempt U.S. hospitals. The New England Journal of Medicine, 368(16): 1519–1527. Publisher: Massachusetts Medical Society. https://doi.org/10.1056/NEJMsa1210239