AI for Science: Opportunities, Challenges, and Future Directions

Fu, Valerie

doi:10.6339/25-JDS1214

Journal of Data Science

AI for Science: Opportunities, Challenges, and Future Directions

Volume 24, Issue 1 (2026): Special Issue: Statistical aspects of Trustworthy Machine Learning, pp. 106–124

Valerie Fu

https://doi.org/10.6339/25-JDS1214

Pub. online: 2 January 2026 Type: Data Science Reviews

Open Access

Received
15 August 2025

Accepted
12 December 2025

Published
2 January 2026

Abstract

Artificial intelligence (AI) has lately emerged as a transformative force in scientific discovery, with skills in accelerating knowledge synthesis, automating experimentation, and enhancing interdisciplinary collaboration. As research challenges—ranging from climate change to rare disease treatments—grow more and more complex, the rapid evolution of AI calls for a comprehensive examination of its current and future roles. Despite recent breakthroughs, the field remains fragmented, due to the lack of a unified framework to understand AI’s progression in science and its implications for data science, in particular. To address this gap, this review provides an analysis on AI for science, and also introduces a novel three-phase framework—Keplerian (data-driven pattern recognition), Edisonian (autonomous experimentation), and Einsteinian (foundational innovation)—to conceptualize AI’s evolving role in science. Additionally, we discuss the ethical, environmental, and data privacy challenges that go alongside AI’s integration in science, emphasizing the need for sustainable and responsible development. This review outlines how AI may transform the scientific methods and to help researchers harness AI’s potential to drive scientific innovation.

References

Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, et al. (2024). Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature, 630(8016): 493–500. https://doi.org/10.1038/s41586-024-07487-w

Acosta JN, Falcone GJ, Rajpurkar P, Topol EJ (2022). Multimodal biomedical AI. Nature Medicine, 28(9): 1773–1784. https://doi.org/10.1038/s41591-022-01981-2

Aczel B, Barwich AS, Diekman AB, Fishbach A, Goldstone RL, Gomez P, et al. (2025). The present and future of peer review: Ideas, interventions, and evidence. Proceedings of the National Academy of Sciences of the United States of America, 122(5): e2401232121. https://doi.org/10.1073/pnas.2401232121

Agrawal A, McHale J, Oettl A (2024). Artificial intelligence and scientific discovery: A model of prioritized search. Research Policy, 53(5): 104989. https://doi.org/10.1016/j.respol.2024.104989

Aldoseri A, Al-Khalifa KN, Hamouda AM (2023). Re-thinking data strategy and integration for artificial intelligence: Concepts, opportunities, and challenges. Applied Sciences, 13(12): 7082.

Alharbi WS, Rashid M (2022). A review of deep learning applications in human genomics using next-generation sequencing data. Human Genomics, 16(1): 26. https://doi.org/10.1186/s40246-022-00396-x

Araújo R, Ramalhete L, Viegas A, von Rekowski CP, Fonseca TA, Calado CR, et al. (2024). Simplifying data analysis in biomedical research: An automated, user-friendly tool. Methods and Protocols, 7(3): 36. https://doi.org/10.3390/mps7030036

Arik SÖ, Pfister T (2021). TabNet: Attentive interpretable tabular learning. Proceedings of the AAAI Conference on Artificial Intelligence, 35(8): 6679–6687.

Bai X, Zhang X (2025). Artificial intelligence-powered materials science. Nano-Micro Letters, 17(1): 135. https://doi.org/10.1007/s40820-024-01634-8

Bengio Y (2020). Time to rethink the publication process in machine learning. https://yoshuabengio.org/2020/02/26/time-to-rethink-the-publication-process-in-machine-learning/. Blog post. Accessed: 2025-12-10.

Breiman L (2001). Statistical modeling: The two cultures. Statistical Science, 16(3): 199–215. https://doi.org/10.1214/ss/1009213726

Brown TB, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33: 1877–1901.

Brundage M, Avin S, Clark J, Toner H, Eckersley P, Garfinkel B, et al. (2018). The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation. Apollo – University of Cambridge Repository.

Burger B, Maffettone PM, Gusev VV, Aitchison CM, Bai Y, Wang X, et al. (2020). A mobile robotic chemist. Nature, 583(7815): 237–241. https://doi.org/10.1038/s41586-020-2442-2

Burke A (2019). Science and engineering labor force. https://ncses.nsf.gov/pubs/nsb20198 Published by National Science Board (NSB-2019-8). Accessed: 2025-12-10.

Cahn D (2024). AI in 2025: Building blocks firmly in place. https://www.sequoiacap.com/article/ai-in-2025/. Accessed: 2024-12-22.

Cai H, Gan C, Wang T, Zhang Z, Han S (2020). Once-for-all: Train one network and specialize it for efficient deployment. In: International Conference on Learning Representations (ICLR). arXiv preprint, accepted at ICLR 2020.

Collaboration TC (2025). Model-agnostic search for dijet resonances with anomalous jet substructure in proton-proton collisions at $\sqrt{s}=13$ TeV. Reports on Progress in Physics, 88(6): 067802.

de Vries A (2023). The growing energy footprint of artificial intelligence. Joule, 7(10): 2191–2194. https://doi.org/10.1016/j.joule.2023.09.004

Degrave J, Felici F, Buchli J, Neunert M, Tracey B, Carpanese F, et al. (2022). Magnetic control of tokamak plasmas through deep reinforcement learning. Nature, 602(7897): 414–419. https://doi.org/10.1038/s41586-021-04301-9

Deng Y, Yuan Y, Fu H, Qu A (2023). Query-augmented active metric learning. Journal of the American Statistical Association, 118(543): 1862–1875. https://doi.org/10.1080/01621459.2021.2019045

Dessimoz C, Thomas PD (2024). AI and the democratization of knowledge. Scientific Data, 11(1): 268. https://doi.org/10.1038/s41597-024-03099-1

Díaz-Rodríguez N, Del Ser J, Coeckelbergh M, de Prado ML, Herrera-Viedma E, Herrera F (2023). Connecting the dots in trustworthy artificial intelligence: From AI principles, ethics, and key requirements to responsible AI systems and regulation. Information Fusion, 99: 101896.

Drew L (2024). Guidelines for academics aim to lessen ethical pitfalls in generative-AI use. Nature. https://doi.org/10.1038/d41586-024-01543-1

Fawzi A, Balog M, Huang A, Hubert T, Romera-Paredes B, Barekatain M, et al. (2022). Discovering faster matrix multiplication algorithms with reinforcement learning. Nature, 610(7930): 47–53. https://doi.org/10.1038/s41586-022-05172-4

Galaz V, Centeno MA, Callahan PW, Causevic A, Patterson T, Brass I, et al. (2021). Artificial intelligence, systemic risks, and sustainability. Technology in Society, 67: 101741.

Gao L, Guan L (2023). Interpretability of machine learning: Recent advances and future prospects. IEEE Multimedia, 30(4): 105–118. https://doi.org/10.1109/MMUL.2023.3272513

Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. (2014). Generative adversarial nets. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger K. Q. (eds.), volume 2, NIPS’14, 2672–2680. MIT Press, Cambridge, MA, USA.

Google (2024). AI science forum 2024. https://blog.google/technology/ai/ai-science-forum-2024/. Accessed: 2025-12-10.

Griffin C, Wallace D, Mateos-Garcia J, Schieve H, Kohli P (2024). A new golden age of discovery: Seizing the AI for science opportunity. https://deepmind.google/public-policy/ai-for-science/. Accessed: 2025-12-10.

Hagerty A, Rubinov I (2019). Global AI ethics: A review of the social impacts and ethical implications of artificial intelligence. arXiv preprint: https://arxiv.org/abs/1907.07892.

Hassabis D (2024). Scaling, superhuman AIs, AlphaZero atop LLMs, AlphaFold. https://youtu.be/qTogNUV3CAI. Interview on Dwarkesh Podcast. Accessed: 2025-12-10.

Huang J (2024). Digital biology (keynote address at NVIDIA GTC 2024. Available at: https://blogs.nvidia.com/blog/2024-gtc-keynote/. Accessed: 2025-03-10.

Jiang ZS, Han X, Jin H, Wang G, Chen R, Zou N, et al. (2023). Chasing fairness under distribution shift: A model weight perturbation approach. Advances in Neural Information Processing Systems, 36: 63931–63944.

Jones N (2018). How to stop data centres from gobbling up the world’s electricity. Nature, 561(7722): 163–166. https://doi.org/10.1038/d41586-018-06610-y

Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873): 583–589. https://doi.org/10.1038/s41586-021-03819-2

Kaplan J, McCandlish S, Henighan T, Brown TB, Chess B, Child R, et al. (2020). Scaling laws for neural language models. arXiv preprint: https://arxiv.org/abs/2001.08361.

Kellogg Insight (2024). AI is revolutionizing science. are scientists ready? Kellogg Insight, Northwestern University, Available at: https://insight.kellogg.northwestern.edu/article/ai-is-revolutionizing-science-are-scientists-ready. Accessed: 2025-03-10.

Kemmerzell N, Schreiner A (2024). Quantifying the trade-offs between dimensions of trustworthy AI: An empirical study on fairness, explainability, privacy, and robustness. In: KI 2024: Advances in Artificial Intelligence (LNCS 14434), Hotho A, Rudolph S (eds.), 376–390. Springer.

Khalifa M, Albadawy M (2024). Using artificial intelligence in academic writing and research: An essential productivity tool. Computer Methods and Programs in Biomedicine Update, 5: 100145. https://doi.org/10.1016/j.cmpbup.2024.100145

Lam R, Sanchez-Gonzalez A, Willson M, Wirnsberger P, Fortunato M, Alet F, et al. (2023). Learning skillful medium-range global weather forecasting. Science, 382(6677): eadi2336. https://doi.org/10.1126/science.adi2336

LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, et al. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4): 541–551. https://doi.org/10.1162/neco.1989.1.4.541

Leontidis G (2024). Science in the Age of AI: How Artificial Intelligence Is Changing the Nature and Method of Scientific Research. The Royal Society, London.

Li X, Feng M, Ran Y, Su Y, Liu F, Huang C, et al. (2023). Big data in Earth system science and progress towards a digital twin. Nature Reviews. Earth & Environment, 4(5): 319–332. https://doi.org/10.1038/s43017-023-00409-w

Li X, Zhou D, Zhang C, Wei S, Hou Q, Cheng MM (2024). Sora generates videos with stunning geometrical consistency. arXiv preprint: https://arxiv.org/abs/2402.17403.

Liu A, Feng B, Xue B, et al. (DeepSeek-AI Team) (2024). DeepSeek-V3 technical report. arXiv preprint: https://arxiv.org/abs/2412.19437. DeepSeek-AI, December 2024.

Lu H, Diaz DJ, Czarnecki NJ, Zhu C, Kim W, Shroff R, et al. (2022). Machine learning-aided engineering of hydrolases for PET depolymerization. Nature, 604: 662–667. https://doi.org/10.1038/s41586-022-04599-z

MacLeod M, Merz M, Mäki U, Nagatsu M (2019). Investigating interdisciplinary practice: Methodological challenges (introduction). Perspectives on Science, 27(4): 545–553. https://doi.org/10.1162/posc_e_00315

Madani A, Krause B, Greene ER, Subramanian S, Mohr BP, Holton JM, et al. (2023). Large language models generate functional protein sequences across diverse families. Nature Biotechnology, 41(8): 1099–1106. https://doi.org/10.1038/s41587-022-01618-2

Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A (2021). A survey on bias and fairness in machine learning. ACM Computing Surveys (CSUR), 54(6): 1–35. https://doi.org/10.1145/3457607

Mentzas G, Fikardos M, Lepenioti K, Apostolou D (2024). Exploring the landscape of trustworthy artificial intelligence: Status and challenges. Intelligent Decision Technologies, 18(2): 837–854. https://doi.org/10.3233/IDT-240366

Molnar C, Casalicchio G, Bischl B (2020). Interpretable machine learning – a brief history, state-of-the-art and challenges. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Koprinska I, Kamp M, Appice A, Loglisci C, Antonie L, et al. (eds.), 417–431. Springer.

Murdoch B (2021). Privacy and artificial intelligence: Challenges for protecting health information in a new era. BMC Medical Ethics, 22(1): 122. https://doi.org/10.1186/s12910-021-00687-3

National Science Board (2020). Science & engineering indicators 2020: U.S. scientific workforce growth and discovery trends. https://ncses.nsf.gov/pubs/nsb20198. Accessed: 2025-12-10.

NIH Collaboratory (2025). Dissemination to different stakeholder introduction. https://rethinkingclinicaltrials.org/chapters/dissemination/dissemination-different-stakeholders/dissemination-to-different-stakeholder-introduction/. Accessed: 2025-03-10.

Norton JD (1991). Thought experiments in Einstein’s work. In: Thought Experiments in Science and Philosophy (T Horowitz, GJ Massey, eds.), 129–148. Rowman & Littlefield.

O’Brien T, Stremmel J, Pio-Lopez L, McMillen P, Rasmussen-Ivey C, Levin M (2024). Machine learning for hypothesis generation in biology and medicine: Exploring the latent space of neuroscience and developmental bioelectricity. Digital Discovery, 3: 249–263. https://doi.org/10.1039/D3DD00185G

Patterson D, Gonzalez J, Le Q, Liang C, Munguia LM, Rothchild D, et al. (2021). Carbon emissions and large neural network training. arXiv preprint: https://arxiv.org/abs/2104.10350.

Peng J, Shen D, Nie T, Kou Y (2024). RLClean: An unsupervised integrated data cleaning framework based on deep reinforcement learning. Information Sciences, 682: 121281. https://doi.org/10.1016/j.ins.2024.121281

Rao Z, Tung PY, Xie R, Wei Y, Zhang H, Ferrari A, et al. (2022). Machine learning–enabled high-entropy alloy discovery. Science, 378(6615): 78–85. https://doi.org/10.1126/science.abo4940

Reddy S, Mathur P (2025). Translational AI: Bridging the gap between research and clinical practice. ScienceOpen Preprints. https://doi.org/10.14293/PR2199.001444.v1

Saifullah IT, Mercier D, Lucieri A, Dengel A, Ahmed S (2024). The privacy-explainability trade-off: Unraveling the impacts of differential privacy and federated learning on attribution methods. Frontiers in Artificial Intelligence, 7: 1236947. https://doi.org/10.3389/frai.2024.1236947

Sanusi IT, Agbo FJ, Adeleke BS, Osakwe EN, Choi SK, et al. (2024). Stakeholders’ insights on artificial intelligence education: Perspectives of teachers, students, and policymakers. Education and Information Technologies, 29: 9871–9894.

Schäfer MS (2023). The notorious GPT: Science communication in the age of artificial intelligence. Journal of Science Communication, 22(2): Y02. https://doi.org/10.22323/2.22020402

Sever R (2023). Biomedical publishing: Past historic, present continuous, future conditional. PLoS Biology, 21(10): e3002234. https://doi.org/10.1371/journal.pbio.3002234

Silver D, Sutton RS (2025). Welcome to the era of experience. Technical report, Google DeepMind.

Song Y, Sohl-Dickstein J, Kingma DP, Kumar A, Ermon S, Poole B (2021). Score-based generative modeling through stochastic differential equations. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event. Austria, May 3–7, 2021. OpenReview.net.

Steigerwald E, Ramírez-Castañeda V, Brandt DY, Báldi A, Shapiro JT, Bowker L, et al. (2022). Overcoming language barriers in academia: Machine translation tools and a vision for a multilingual future. Bioscience, 72(10): 988–998. https://doi.org/10.1093/biosci/biac062

Strubell E, Ganesh A, McCallum A (2020). Energy and policy considerations for modern deep learning research. In: Proceedings of the AAAI Conference on Artificial Intelligence, Conitzer V, Sha F (eds.), volume 34, 13693–13696.

Sutton R (2019). The bitter lesson. http://www.incompleteideas.net/IncIdeas/BitterLesson.html. Blog post. Accessed: 2025-12-10.

Thales Alenia Space (2024). Thales Alenia Space reveals results of ASCEND feasibility study on space data centers. https://www.thalesaleniaspace.com/en/press-releases/thales-alenia-space-reveals-results-ascend-feasibility-study-space-data-centers-0. Accessed: 2025-12-10.

Trung L, Zhang X, Jie Z, Sun P, Jin X, Li H (2024). Reft: Reasoning with reinforced fine-tuning. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Ku L.-W., Martins A, Srikumar V (eds.), 7601–7614.

Urbina F, Lentzos F, Invernizzi C, Ekins S (2022). Dual use of artificial-intelligence-powered drug discovery. Nature Machine Intelligence, 4(3): 189–191. https://doi.org/10.1038/s42256-022-00465-9

Vanschoren J (2023). Democratising artificial intelligence to accelerate scientific discovery. In: Artificial Intelligence in Science: Challenges, Opportunities and the Future of Research, 224–229. OECD Publishing, Paris.

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. (2017). Attention is all you need. In: Advances in Neural Information Processing Systems, Guyon I, von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds.), volume 30, 5998–6008.

Watson JL, Juergens D, Bennett NR, Trippe BL, Yim J, Eisenach HE, et al. (2023). De novo design of protein structure and function with RFdiffusion. Nature, 620(7976): 1089–1100. https://doi.org/10.1038/s41586-023-06415-8

World Economic Forum (2021). Ten challenges to scientific collaboration and how to overcome them. Available at: https://www.weforum.org/agenda/2021/06/ten-challenges-to-scientific-collaboration/. Accessed: 2025-12-10.

Wu S, Ma X, Luo D, Li L, Shi X, Chang X, et al. (2025). Automated literature research and review-generation method based on large language models. National Science Review, 12(6): nwaf169. https://doi.org/10.1093/nsr/nwaf169

wwPDB Consortium (2019). Protein data bank: The single global archive for 3D macromolecular structure data. Nucleic Acids Research, 47(D1): D520–D528. https://doi.org/10.1093/nar/gky949

Xu W, Li A, Zhao Y, Peng Y (2025). Decoding the effects of mutation on protein interactions using machine learning. Biophysics Reviews, 6(1): 011307. https://doi.org/10.1063/5.0249920

Youvan DC (2024). AI-driven democratization of academic publishing: Leveraging preprints for equitable knowledge sharing. https://doi.org/10.13140/RG.2.2.28766.86085

Zhang S, Han T, Bhalla U, Lakkaraju H (2025). Towards unified attribution in explainable AI, data-centric AI, and mechanistic interpretability. arXiv preprint: https://arxiv.org/abs/2501.18887.

Zhang Y, Zeng D, Luo J, Xu Z, King I (2023). A survey of trustworthy federated learning with perspectives on security, robustness and privacy. In: Companion Proceedings of the ACM Web Conference 2023, Ding Y, Tang J, Sequeda J, Aroyo L, Castillo C, Houben G.-J. (eds.), 1167–1176.

Zhuang Y, Cai M, Li X, Luo X, Yang Q, Wu F (2020). The next breakthroughs of artificial intelligence: The interdisciplinary nature of AI. Engineering, 6(3): 245–247. https://doi.org/10.1016/j.eng.2020.01.009

2026 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.

Open access article under the CC BY license.

Keywords

Einsteinian phase AGI human computer collaboration knowledge dissemination machine learning scientific discovery transdisciplinary research

Metrics

since February 2021

4931

Article info
views

2102

PDF
downloads

RSS

Authors

Abstract

References

Export citation

Copy and paste formatted citation

Download citation in file