Journal of Data Science logo


Login Register

  1. Home
  2. To appear
  3. Leveraging Artificial Intelligence and A ...

Journal of Data Science

Submit your article Information
  • Article info
  • Related articles
  • More
    Article info Related articles

Leveraging Artificial Intelligence and Automation for Enhancing School Improvement Efforts
Graham Chickering ORCID icon link to view author Graham Chickering details   Christina Jones   Naomi Blaushild  

Authors

 
Placeholder
https://doi.org/10.6339/26-JDS1216
Pub. online: 29 January 2026      Type: Data Science In Action      Open accessOpen Access

Received
13 August 2025
Accepted
5 January 2026
Published
29 January 2026

Abstract

Advances in AI and automation are reshaping qualitative research workflows, making processes more efficient, accurate, consistent, and scalable. This paper presents innovations developed for the Illinois Needs Assessment project, a statewide initiative led by the Illinois State Board of Education and the American Institutes for Research to conduct comprehensive needs assessments for schools that need intensive or comprehensive support. To address the scale and tight timeline requirements of the project, the team designed three interconnected pipelines that work together to produce a finalized report. The first, an Audio Pipeline, uses Whisper and generative AI to automate transcription, text-based speaker role attribution, thematic coding, and insight generation from focus groups and interviews. The second, a Report Generation Pipeline, integrates Airtable automations with AWS infrastructure to produce customized school reports that merge AI-generated findings with survey data, school performance metrics, and contextual comparisons. Third, the Needs Assessment Summary Report automates the assembly of all quantitative and qualitative inputs into a polished, customizable deliverable that combines efficiency with expert review. Together, these pipelines replace ad hoc manual workflows with reproducible, consistent systems that enhance data quality, reduce error, and broaden access for non-technical users. The integrated design demonstrates how automation and generative AI can reduce manual burdens, shorten delivery timelines, and support timely, data-informed, and human-centered decision-making in education.

Supplementary material

 Supplementary Material
The supplementary material includes a GitHub repository with two subfolders—‘aiPipeline-SDSS2025’ and ‘autoreportsPipeline-SDSS2025’—corresponding to the Audio Pipeline and the Report Generation Pipeline + Automated Report described in the manuscript. While the original implementations relied on secure cloud infrastructure, the materials provide insight into system architecture, key processing steps, and expected outputs. Included are mock data, configuration examples, prompts, crosswalks, and selected code, enabling users to review and execute sample scripts to understand each pipeline stage. The supplementary materials also include an additional R Markdown (RMD) file that demonstrates report generation using synthetic school-level data, illustrating how qualitative and quantitative inputs are combined within the automated reporting workflow. Due to reliance on internal systems and proprietary authentication, some components (e.g., secure dataset access, organizational credentials, private APIs) are non-functional outside production. Code exposing security-sensitive logic or deployment details has been removed, but the materials still convey the overall design and practical implementation. Additional documentation in each folder guides navigation of outputs. See https://github.com/gchickering21/SDSS2025_materials for files and documentation.

References

 
Airtable (2023). Airtable api documentation. https://airtable.com/api. [Online; accessed 23 November 2025].
 
Airtable Blog (2022). How low and no-code tools increase productivity by breaking silos. https://blog.airtable.com/the-promises-low-code-platforms-should-deliver/. [Online; accessed 23 November 2025].
 
Airtable Help (2025). When webhook received trigger. https://support.airtable.com/docs/when-webhook-received-trigger. [Online; accessed 23 November 2025].
 
Amazon Web Services (2025a). Amazon api gateway features. https://aws.amazon.com/api-gateway/features/. [Online; accessed 23 November 2025].
 
Amazon Web Services (2025b). Amazon rds features. https://aws.amazon.com/rds/features/. [Online; accessed 23 November 2025].
 
Amazon Web Services (2025c). Amazon simple email service (ses). https://aws.amazon.com/ses/. [Online; accessed 23 November 2025].
 
Amazon Web Services (2025d). Aws step functions. https://aws.amazon.com/step-functions/. [Online; accessed 23 November 2025].
 
Amazon Web Services (2025e). Security best practices for amazon s3. https://docs.aws.amazon.com/AmazonS3/latest/userguide/security-best-practices.html. [Online; accessed 23 November 2025].
 
Amazon Web Services (2025f). What is amazon ec2? https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/concepts.html. [Online; accessed 23 November 2025].
 
Amazon Web Services (2025g). What is amazon elastic container registry (ecr)? https://docs.aws.amazon.com/AmazonECR/latest/userguide/what-is-ecr.html. [Online; accessed 23 November 2025].
 
Boettiger C (2015). An introduction to docker for reproducible research. ACM SIGOPS Operating Systems Review, 49(1): 71–79. https://doi.org/10.1145/2723872.2723882
 
Brans P (2023). Building seamless data pipelines in a hybrid cloud environment. CMSWire. [Online; accessed 23 November 2025].
 
Bredin H, Laurent A (2021). End-to-end speaker segmentation for overlap-aware resegmentation. [Online; accessed 23 November 2025].
 
Bryk AS, Gomez LM, Grunow A, LeMahieu PG (2010). Learning to Improve: How America’s Schools Can Get Better at Getting Better. Harvard Education Press.
 
Center for Computation and Visualization (2025). Speech-to-text models. https://docs.ccv.brown.edu/ai-tools/services/transcribe/speech-to-text-models. [Online; accessed 23 November 2025].
 
Chaudhry MA, Cukurova M, Luckin R (2022). A transparency index framework for ai in education. [Online; accessed 23 November 2025].
 
Corbett J, Redding S (2017). Using needs assessments for school and district improvement: A tactical guide. Council of Chief State School Officers and Center on School Turnaround at WestEd. [Online; accessed 23 November 2025].
 
Fehling C, Leymann F, Retter R, Schupeck W, Arbitter P (2014). Cloud Computing Patterns: Fundamentals to Design, Build, and Manage Cloud Applications. Springer.
 
Gal Y, Ghahramani Z (2016). Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In: Balcan M.-F, Weinberger K. Q (eds), Proceedings of the 33rd International Conference on Machine Learning, 48, 1050–1059. Proceedings of Machine Learning Research.
 
Ghimire A, Edwards J (2024). From guidelines to governance: A study of ai policies in education. [Online; accessed 23 November 2025].
 
Glenn ML, Strassel SM, Lee H, Maeda K, Zakhary R, Li X (2010). Transcription methods for consistency, volume and efficiency. In: Calzolari N, Choukri K, Maegaard B, Mariani J, Odijk J, Piperidis S, Rosner M, Tapias D (eds), Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). European Language Resources Association (ELRA), Valletta, Malta.
 
Gohel D, Skintzos P (2023). officer: Manipulation of microsoft word and powerpoint documents (r package version 0.6.2). https://CRAN.R-project.org/package=officer. [Online; accessed 23 November 2025].
 
Lane B, Unger C, Souvanna P (2014). Turnaround practices in action: A three-year analysis of school and district practices, systems, policies, and use of resources contributing to successful turnaround efforts in massachusetts’ level 4 schools. http://www.mass.gov/edu/docs/ese/accountability/turnaround/practices-report-2014.pdf. [Online; accessed 23 November 2025].
 
Maissen P, Felber P, Kropf P, Schiavoni V (2020). FaaSdom: A benchmark suite for serverless computing. In: Charfi A, Cugola G, Pietzuch P, Jerzak Z (eds), Proceedings of the 14th ACM International Conference on Distributed and Event-Based Systems (DEBS 2020). Association for Computing Machinery (ACM).
 
Merkel D. (2014). Docker: Lightweight Linux containers for consistent development and deployment. Linux Journal, 2014, 239: 2.
 
Metabase Inc (2025). Metabase: An open-source business intelligence platform. https://www.metabase.com/. [Online; accessed 23 November 2025].
 
Microsoft (2023). Microsoft graph api overview. https://learn.microsoft.com/en-us/graph/overview. [Online; accessed 23 November 2025].
 
Microsoft (2024). Data, privacy, and security for azure openai service. https://learn.microsoft.com/en-us/azure/ai-foundry/responsible-ai/openai/data-privacy?tabs=azure-portal. [Online; accessed 23 November 2025].
 
Microsoft (2025). Azure key vault security features. https://learn.microsoft.com/en-us/azure/key-vault/general/security-features. [Online; accessed 23 November 2025].
 
Microsoft Corporation (2025). What is sharepoint? https://support.microsoft.com/en-us/sharepoint. [Online; accessed 23 November 2025].
 
Miles M, Huberman AM, Saldana J (2014). Qualitative Data Analysis. Sage Publications Ltd., 3 edition.
 
Miles MB, Huberman AM, Saldaña J (2019). Qualitative Data Analysis: A Methods Sourcebook. SAGE Publications, Thousand Oaks, CA, 4 edition.
 
Nascimento RS, Silva AL, Rocha IA, Almeida JJ, Gonçalves G, Santos A, et al. (2024). Availability, scalability, and security in the migration from on-premises systems to azure kubernetes service: A proof of concept. Computers, 13(8): 192. https://doi.org/10.3390/computers13080192
 
Ogeawuchi JC, Uzoka A, Alozie CE, Agboola OA, Owoade S (2022). Next-generation data pipeline automation for enhancing efficiency and scalability in business intelligence systems. International Journal of Social Science Exceptional Research, 1(1): 277–282. https://doi.org/10.54660/IJSSER.2022.1.1.277-282
 
OpenAI (2023). Gpt-4 technical report. arXiv preprint: arXiv:2303.08774.
 
Oyeniran O, Misra S, Fernández-Sanz L, Damasevicius R (2024). A comprehensive review of leveraging cloud-native technologies for scalability and resilience in software development. International Journal of Science and Research Archive, 12(1): 541–549.
 
Pan J, Walston J, Therriault SB (2021). Relationship between state annual school monitoring indicators and outcomes in massachusetts lowest performing schools, Technical Report REL 2021-085, Regional Educational Laboratory Northeast & Islands.
 
Peng RD (2011). Reproducible research in computational science. Science, 334(6060): 1226–1227. https://doi.org/10.1126/science.1213847
 
Pianta RC, La Paro KM, Hamre BK (2008). Classroom Assessment Scoring System™: Manual K–3. Paul H. Brookes Publishing Co.
 
Radford A, Kim JW, Xu T, Brockman G, McLeavey C, Sutskever I (2022). Robust speech recognition via large-scale weak supervision. arXiv preprint: arXiv:2212.04356.
 
Shen J (2024). Understanding airtable webhooks and their applications. https://shortcuts.sequentialroutine.com/blog/understanding-airtable-webhooks-applications/. [Online; accessed 23 November 2025].
 
US Department of Education (2001). Comprehensive needs assessment guidebook. https://www.ed.gov/sites/ed/files/admins/lead/account/compneedsassessment.pdf. [Online; accessed 23 November 2025].
 
Voicegain (2023). Practical considerations for voice developers considering openai’s whisper asr. https://www.voicegain.ai/post/practical-considerations-for-voice-developers-considering-openais-whisper-asr. [Online; accessed 23 November 2025].
 
Wang Q, Huang Y, Zhao G, Clark E, Xia W, Liao H (2024). DiarizationLM: Speaker diarization post-processing with large language models. In: Proceedings of Interspeech 2024. International Speech Communication Association (ISCA).
 
Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis. Springer.
 
Xiao Z, Yuan X, Liao QV, Abdelghani R, Oudeyer PY (2023). Supporting qualitative analysis with large language models: Combining codebook with gpt-3 for deductive coding. [Online; accessed 23 November 2025].

Related articles PDF XML
Related articles PDF XML

Copyright
2026 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
automation data pipelines educational research generative AI qualitative analysis

Funding
This work was funded by the Illinois State Board of Education and supported by AIR.

Metrics
since February 2021
368

Article info
views

64

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy