AI for Science: Opportunities, Challenges, and Future Directions
Pub. online: 2 January 2026
Type: Data Science Reviews
Open Access
Received
15 August 2025
15 August 2025
Accepted
12 December 2025
12 December 2025
Published
2 January 2026
2 January 2026
Abstract
Artificial intelligence (AI) has lately emerged as a transformative force in scientific discovery, with skills in accelerating knowledge synthesis, automating experimentation, and enhancing interdisciplinary collaboration. As research challenges—ranging from climate change to rare disease treatments—grow more and more complex, the rapid evolution of AI calls for a comprehensive examination of its current and future roles. Despite recent breakthroughs, the field remains fragmented, due to the lack of a unified framework to understand AI’s progression in science and its implications for data science, in particular. To address this gap, this review provides an analysis on AI for science, and also introduces a novel three-phase framework—Keplerian (data-driven pattern recognition), Edisonian (autonomous experimentation), and Einsteinian (foundational innovation)—to conceptualize AI’s evolving role in science. Additionally, we discuss the ethical, environmental, and data privacy challenges that go alongside AI’s integration in science, emphasizing the need for sustainable and responsible development. This review outlines how AI may transform the scientific methods and to help researchers harness AI’s potential to drive scientific innovation.
References
Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, et al. (2024). Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature, 630(8016): 493–500. https://doi.org/10.1038/s41586-024-07487-w
Acosta JN, Falcone GJ, Rajpurkar P, Topol EJ (2022). Multimodal biomedical AI. Nature Medicine, 28(9): 1773–1784. https://doi.org/10.1038/s41591-022-01981-2
Aczel B, Barwich AS, Diekman AB, Fishbach A, Goldstone RL, Gomez P, et al. (2025). The present and future of peer review: Ideas, interventions, and evidence. Proceedings of the National Academy of Sciences of the United States of America, 122(5): e2401232121. https://doi.org/10.1073/pnas.2401232121
Agrawal A, McHale J, Oettl A (2024). Artificial intelligence and scientific discovery: A model of prioritized search. Research Policy, 53(5): 104989. https://doi.org/10.1016/j.respol.2024.104989
Alharbi WS, Rashid M (2022). A review of deep learning applications in human genomics using next-generation sequencing data. Human Genomics, 16(1): 26. https://doi.org/10.1186/s40246-022-00396-x
Araújo R, Ramalhete L, Viegas A, von Rekowski CP, Fonseca TA, Calado CR, et al. (2024). Simplifying data analysis in biomedical research: An automated, user-friendly tool. Methods and Protocols, 7(3): 36. https://doi.org/10.3390/mps7030036
Bai X, Zhang X (2025). Artificial intelligence-powered materials science. Nano-Micro Letters, 17(1): 135. https://doi.org/10.1007/s40820-024-01634-8
Bengio Y (2020). Time to rethink the publication process in machine learning. https://yoshuabengio.org/2020/02/26/time-to-rethink-the-publication-process-in-machine-learning/. Blog post. Accessed: 2025-12-10.
Breiman L (2001). Statistical modeling: The two cultures. Statistical Science, 16(3): 199–215. https://doi.org/10.1214/ss/1009213726
Burger B, Maffettone PM, Gusev VV, Aitchison CM, Bai Y, Wang X, et al. (2020). A mobile robotic chemist. Nature, 583(7815): 237–241. https://doi.org/10.1038/s41586-020-2442-2
Burke A (2019). Science and engineering labor force. https://ncses.nsf.gov/pubs/nsb20198 Published by National Science Board (NSB-2019-8). Accessed: 2025-12-10.
Cahn D (2024). AI in 2025: Building blocks firmly in place. https://www.sequoiacap.com/article/ai-in-2025/. Accessed: 2024-12-22.
de Vries A (2023). The growing energy footprint of artificial intelligence. Joule, 7(10): 2191–2194. https://doi.org/10.1016/j.joule.2023.09.004
Degrave J, Felici F, Buchli J, Neunert M, Tracey B, Carpanese F, et al. (2022). Magnetic control of tokamak plasmas through deep reinforcement learning. Nature, 602(7897): 414–419. https://doi.org/10.1038/s41586-021-04301-9
Deng Y, Yuan Y, Fu H, Qu A (2023). Query-augmented active metric learning. Journal of the American Statistical Association, 118(543): 1862–1875. https://doi.org/10.1080/01621459.2021.2019045
Dessimoz C, Thomas PD (2024). AI and the democratization of knowledge. Scientific Data, 11(1): 268. https://doi.org/10.1038/s41597-024-03099-1
Drew L (2024). Guidelines for academics aim to lessen ethical pitfalls in generative-AI use. Nature. https://doi.org/10.1038/d41586-024-01543-1
Fawzi A, Balog M, Huang A, Hubert T, Romera-Paredes B, Barekatain M, et al. (2022). Discovering faster matrix multiplication algorithms with reinforcement learning. Nature, 610(7930): 47–53. https://doi.org/10.1038/s41586-022-05172-4
Gao L, Guan L (2023). Interpretability of machine learning: Recent advances and future prospects. IEEE Multimedia, 30(4): 105–118. https://doi.org/10.1109/MMUL.2023.3272513
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. (2014). Generative adversarial nets. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger K. Q. (eds.), volume 2, NIPS’14, 2672–2680. MIT Press, Cambridge, MA, USA.
Google (2024). AI science forum 2024. https://blog.google/technology/ai/ai-science-forum-2024/. Accessed: 2025-12-10.
Griffin C, Wallace D, Mateos-Garcia J, Schieve H, Kohli P (2024). A new golden age of discovery: Seizing the AI for science opportunity. https://deepmind.google/public-policy/ai-for-science/. Accessed: 2025-12-10.
Hagerty A, Rubinov I (2019). Global AI ethics: A review of the social impacts and ethical implications of artificial intelligence. arXiv preprint: https://arxiv.org/abs/1907.07892.
Hassabis D (2024). Scaling, superhuman AIs, AlphaZero atop LLMs, AlphaFold. https://youtu.be/qTogNUV3CAI. Interview on Dwarkesh Podcast. Accessed: 2025-12-10.
Huang J (2024). Digital biology (keynote address at NVIDIA GTC 2024. Available at: https://blogs.nvidia.com/blog/2024-gtc-keynote/. Accessed: 2025-03-10.
Jones N (2018). How to stop data centres from gobbling up the world’s electricity. Nature, 561(7722): 163–166. https://doi.org/10.1038/d41586-018-06610-y
Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873): 583–589. https://doi.org/10.1038/s41586-021-03819-2
Kaplan J, McCandlish S, Henighan T, Brown TB, Chess B, Child R, et al. (2020). Scaling laws for neural language models. arXiv preprint: https://arxiv.org/abs/2001.08361.
Kellogg Insight (2024). AI is revolutionizing science. are scientists ready? Kellogg Insight, Northwestern University, Available at: https://insight.kellogg.northwestern.edu/article/ai-is-revolutionizing-science-are-scientists-ready. Accessed: 2025-03-10.
Khalifa M, Albadawy M (2024). Using artificial intelligence in academic writing and research: An essential productivity tool. Computer Methods and Programs in Biomedicine Update, 5: 100145. https://doi.org/10.1016/j.cmpbup.2024.100145
Lam R, Sanchez-Gonzalez A, Willson M, Wirnsberger P, Fortunato M, Alet F, et al. (2023). Learning skillful medium-range global weather forecasting. Science, 382(6677): eadi2336. https://doi.org/10.1126/science.adi2336
LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, et al. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4): 541–551. https://doi.org/10.1162/neco.1989.1.4.541
Li X, Feng M, Ran Y, Su Y, Liu F, Huang C, et al. (2023). Big data in Earth system science and progress towards a digital twin. Nature Reviews. Earth & Environment, 4(5): 319–332. https://doi.org/10.1038/s43017-023-00409-w
Li X, Zhou D, Zhang C, Wei S, Hou Q, Cheng MM (2024). Sora generates videos with stunning geometrical consistency. arXiv preprint: https://arxiv.org/abs/2402.17403.
Liu A, Feng B, Xue B, et al. (DeepSeek-AI Team) (2024). DeepSeek-V3 technical report. arXiv preprint: https://arxiv.org/abs/2412.19437. DeepSeek-AI, December 2024.
Lu H, Diaz DJ, Czarnecki NJ, Zhu C, Kim W, Shroff R, et al. (2022). Machine learning-aided engineering of hydrolases for PET depolymerization. Nature, 604: 662–667. https://doi.org/10.1038/s41586-022-04599-z
MacLeod M, Merz M, Mäki U, Nagatsu M (2019). Investigating interdisciplinary practice: Methodological challenges (introduction). Perspectives on Science, 27(4): 545–553. https://doi.org/10.1162/posc_e_00315
Madani A, Krause B, Greene ER, Subramanian S, Mohr BP, Holton JM, et al. (2023). Large language models generate functional protein sequences across diverse families. Nature Biotechnology, 41(8): 1099–1106. https://doi.org/10.1038/s41587-022-01618-2
Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A (2021). A survey on bias and fairness in machine learning. ACM Computing Surveys (CSUR), 54(6): 1–35. https://doi.org/10.1145/3457607
Mentzas G, Fikardos M, Lepenioti K, Apostolou D (2024). Exploring the landscape of trustworthy artificial intelligence: Status and challenges. Intelligent Decision Technologies, 18(2): 837–854. https://doi.org/10.3233/IDT-240366
Molnar C, Casalicchio G, Bischl B (2020). Interpretable machine learning – a brief history, state-of-the-art and challenges. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Koprinska I, Kamp M, Appice A, Loglisci C, Antonie L, et al. (eds.), 417–431. Springer.
Murdoch B (2021). Privacy and artificial intelligence: Challenges for protecting health information in a new era. BMC Medical Ethics, 22(1): 122. https://doi.org/10.1186/s12910-021-00687-3
National Science Board (2020). Science & engineering indicators 2020: U.S. scientific workforce growth and discovery trends. https://ncses.nsf.gov/pubs/nsb20198. Accessed: 2025-12-10.
NIH Collaboratory (2025). Dissemination to different stakeholder introduction. https://rethinkingclinicaltrials.org/chapters/dissemination/dissemination-different-stakeholders/dissemination-to-different-stakeholder-introduction/. Accessed: 2025-03-10.
O’Brien T, Stremmel J, Pio-Lopez L, McMillen P, Rasmussen-Ivey C, Levin M (2024). Machine learning for hypothesis generation in biology and medicine: Exploring the latent space of neuroscience and developmental bioelectricity. Digital Discovery, 3: 249–263. https://doi.org/10.1039/D3DD00185G
Patterson D, Gonzalez J, Le Q, Liang C, Munguia LM, Rothchild D, et al. (2021). Carbon emissions and large neural network training. arXiv preprint: https://arxiv.org/abs/2104.10350.
Peng J, Shen D, Nie T, Kou Y (2024). RLClean: An unsupervised integrated data cleaning framework based on deep reinforcement learning. Information Sciences, 682: 121281. https://doi.org/10.1016/j.ins.2024.121281
Rao Z, Tung PY, Xie R, Wei Y, Zhang H, Ferrari A, et al. (2022). Machine learning–enabled high-entropy alloy discovery. Science, 378(6615): 78–85. https://doi.org/10.1126/science.abo4940
Reddy S, Mathur P (2025). Translational AI: Bridging the gap between research and clinical practice. ScienceOpen Preprints. https://doi.org/10.14293/PR2199.001444.v1
Saifullah IT, Mercier D, Lucieri A, Dengel A, Ahmed S (2024). The privacy-explainability trade-off: Unraveling the impacts of differential privacy and federated learning on attribution methods. Frontiers in Artificial Intelligence, 7: 1236947. https://doi.org/10.3389/frai.2024.1236947
Schäfer MS (2023). The notorious GPT: Science communication in the age of artificial intelligence. Journal of Science Communication, 22(2): Y02. https://doi.org/10.22323/2.22020402
Sever R (2023). Biomedical publishing: Past historic, present continuous, future conditional. PLoS Biology, 21(10): e3002234. https://doi.org/10.1371/journal.pbio.3002234
Steigerwald E, Ramírez-Castañeda V, Brandt DY, Báldi A, Shapiro JT, Bowker L, et al. (2022). Overcoming language barriers in academia: Machine translation tools and a vision for a multilingual future. Bioscience, 72(10): 988–998. https://doi.org/10.1093/biosci/biac062
Sutton R (2019). The bitter lesson. http://www.incompleteideas.net/IncIdeas/BitterLesson.html. Blog post. Accessed: 2025-12-10.
Thales Alenia Space (2024). Thales Alenia Space reveals results of ASCEND feasibility study on space data centers. https://www.thalesaleniaspace.com/en/press-releases/thales-alenia-space-reveals-results-ascend-feasibility-study-space-data-centers-0. Accessed: 2025-12-10.
Urbina F, Lentzos F, Invernizzi C, Ekins S (2022). Dual use of artificial-intelligence-powered drug discovery. Nature Machine Intelligence, 4(3): 189–191. https://doi.org/10.1038/s42256-022-00465-9
Watson JL, Juergens D, Bennett NR, Trippe BL, Yim J, Eisenach HE, et al. (2023). De novo design of protein structure and function with RFdiffusion. Nature, 620(7976): 1089–1100. https://doi.org/10.1038/s41586-023-06415-8
World Economic Forum (2021). Ten challenges to scientific collaboration and how to overcome them. Available at: https://www.weforum.org/agenda/2021/06/ten-challenges-to-scientific-collaboration/. Accessed: 2025-12-10.
Wu S, Ma X, Luo D, Li L, Shi X, Chang X, et al. (2025). Automated literature research and review-generation method based on large language models. National Science Review, 12(6): nwaf169. https://doi.org/10.1093/nsr/nwaf169
wwPDB Consortium (2019). Protein data bank: The single global archive for 3D macromolecular structure data. Nucleic Acids Research, 47(D1): D520–D528. https://doi.org/10.1093/nar/gky949
Xu W, Li A, Zhao Y, Peng Y (2025). Decoding the effects of mutation on protein interactions using machine learning. Biophysics Reviews, 6(1): 011307. https://doi.org/10.1063/5.0249920
Youvan DC (2024). AI-driven democratization of academic publishing: Leveraging preprints for equitable knowledge sharing. https://doi.org/10.13140/RG.2.2.28766.86085
Zhang S, Han T, Bhalla U, Lakkaraju H (2025). Towards unified attribution in explainable AI, data-centric AI, and mechanistic interpretability. arXiv preprint: https://arxiv.org/abs/2501.18887.
Zhuang Y, Cai M, Li X, Luo X, Yang Q, Wu F (2020). The next breakthroughs of artificial intelligence: The interdisciplinary nature of AI. Engineering, 6(3): 245–247. https://doi.org/10.1016/j.eng.2020.01.009