Measuring Public Open-Source Software in the Federal Government: An Analysis of Code.gov
Volume 22, Issue 3 (2024): Special issue: The Government Advances in Statistical Programming (GASP) 2023 conference, pp. 356–375
Pub. online: 29 July 2024
Type: Data Science In Action
Open Access
Received
9 December 2023
9 December 2023
Accepted
5 July 2024
5 July 2024
Published
29 July 2024
29 July 2024
Abstract
This paper presents an in-depth analysis of patterns and trends in the open-source software (OSS) contributions by the U.S. federal government agencies. OSS is a unique category of computer software notable for its publicly accessible source code and the rights it provides for modification and distribution for any purpose. Prompted by the Federal Source Code Policy (USCIO, 2016), Code.gov was established as a platform to facilitate the sharing of custom-developed software across various federal government agencies. This study leverages data from Code.gov, which catalogs OSS projects developed and shared by government agencies, and enhances this data with detailed development and contributor information from GitHub. By adopting a cost estimation methodology that is consistent with the U.S. national accounting framework for software investment proposed in Korkmaz et al. (2024), this research provides annual estimates of investment in OSS by government agencies for the 2009–2021 period. The findings indicate a significant investment by the federal government in OSS, with the 2021 investment estimated at around $407 million. This study not only sheds light on the government’s role in fostering OSS development but also offers a valuable framework for assessing the scope and value of OSS initiatives within the public sector.
Supplementary material
Supplementary MaterialThe data and code needed to reproduce the results in this paper can be found at the Journal of Data Science website. The links to the raw data and code can also be found on our project’s website at https://oss.quarto.pub/website/analyses.html
References
Bastian M, Heymann S, Jacomy M Gephi: An open source software for exploring and manipulating networks. http://www.aaai.org/ocs/index.php/ICWSM/09/paper/view/154.
Biden JR (2020). Executive order on promoting the use of trustworthy artificial intelligence in the federal government. https://www.federalregister.gov/documents/2020/12/08/2020-27065/promoting-the-use-of-trustworthy-artificial-intelligence-in-the-federal-government.
Biden JR (2023). Executive order on the safe, secure, and trustworthy development and use of artificial intelligence. https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence/.
Boehm BW (1984). Software engineering economics. IEEE Transactions on Software Engineering, SE-10(1): 4–21. https://doi.org/10.1109/TSE.1984.5010193
Börner K, Sanyal S, Vespignani A (2007). Network science. Annual Review of Information Science and Technology, 41(1): 537–607. https://doi.org/10.1002/aris.2007.1440410119
Code.gov (n.d.). Agency compliance dashboard. https://code.gov/agency-compliance/compliance/dashboard.
Congress, US (2018). Foundations for evidence-based policymaking act of 2018. Public Law, 115: 435. https://www.congress.gov/bill/115th-congress/house-bill/4174.
Corrado C, Haskel J, Jona-Lasinio C (2015). Public intangibles: The public sector and economic growth in the SNA. In: Economics Program Working Paper Series. The Conference Board. Available at https://www.conference-board.org/pdf_free/workingpapers/EPWP1501.pdf.
Damanpour F (1991). Organizational innovation: A meta-analysis of effects of determinants and moderators. Academy of Management Journal, 34(3): 555–590. https://doi.org/10.2307/256406
DOE CODE (n.d.). Software Policy of DOE. https://www.osti.gov/doecode/policy.
Garfield E, Pudovkin A, Istomin V (2002). Algorithmic citation-linked historiography—mapping the literature of science. Proceedings of the American Society for Information Science and Technology, 39(1): 14–24. https://doi.org/10.1002/meet.1450390102
Gault F (2018). Defining and measuring innovation in all sectors of the economy. Research Policy, 47(3): 617–622. https://doi.org/10.1016/j.respol.2018.01.007
GitHub (2023). The State of the Octoverse. https://octoverse.github.com.
GSA (2019). GSA Open Software Policy. https://open.gsa.gov/oss-policy/.
Harris CR, Millman KJ, van der Walt SJ Gommers R Virtanen P Cournapeau D, et al. (2020). Array programming with NumPy. Nature, 585(7825): 357–362. https://doi.org/10.1038/s41586-020-2649-2
Hoffa F (2017). The top contributors to GitHub (2017). https://hoffa.medium.com/the-top-contributors-to-github-2017-be98ab854e87.
Howison J, Bullard J (2016). Software in the scientific literature: Problems with seeing, finding, and using software mentioned in the biology literature. The Journal of the Association for Information Science and Technology, 67(9): 2137–2155. https://doi.org/10.1002/asi.23538
Howison J, Deelman E, McLennan MJ, Ferreira da Silva R, Herbsleb JD (2015). Understanding the scientific software ecosystem and its impact: Current and future measures. Research Evaluation, 24(4): 454–470. https://doi.org/10.1093/reseval/rvv014
Impact Story (2012). https://impactstory.org.
Keller SA, Korkmaz G, Robbins CA, Shipp SS (2018). Opportunities to observe and measure intangible inputs to innovation: Definitions, operationalization, and examples. Proceedings of the National Academy of Sciences, 115(50): 12638–12645. https://doi.org/10.1073/pnas.1800467115
Keralis JM, Albertorio-Díaz J, Hoppe T (2023). Dark citations to federal resources and their contribution to the public health literature. Frontiers in Research Metrics and Analytics, 8: 1235208. https://doi.org/10.3389/frma.2023.1235208
Korkmaz G, Kelling C, Robbins CA, Keller SA (2020). Modeling the impact of Python and R packages using dependency and contributor networks. Social Network Analysis and Mining, 10: 1–12. https://doi.org/10.1007/s13278-019-0612-8
Korkmaz G, Santiago Calderón JB, Kramer BL, Guci L, Robbins CA (2024). From GitHub to GDP: A framework for measuring open source software innovation. Research Policy, 53(3): 104954. https://doi.org/10.1016/j.respol.2024.104954
Kramer BL (2021a). diverstidy: A tidy package for detection and standardization of geographic, population, and diversity-related terminology in unstructured text data. https://github.com/brandonleekramer/diverstidy.
Kramer BL (2021b). tidyorgs: A tidy package that standardizes text data for organizational analysis. https://github.com/brandonleekramer/tidyorgs.
Martin BR (2016). Twenty challenges for innovation studies. Science and Public Policy, 43(3): 432–450. https://doi.org/10.1093/scipol/scv077
Nakamura LI, Samuels J, Soloveichik RH (2017). Measuring the ‘free’ digital economy within the GDP and productivity accounts. https://www.bea.gov/research/papers/2017/measuring-free-digital-economy-within-gdp-and-productivity-accounts.
NASA (n.d.). NASA Open Software Policy. https://code.nasa.gov/#/guide.
OSI (1998). The open source definition. https://opensource.org/osd.
pandas (2020). pandas-dev/pandas: Pandas. https://doi.org/10.5281/zenodo.3509134.
Piwowar H, Priem J (2016). Depsy: Valuing the software that powers science. https://github.com/Impactstory/depsy-research/blob/master/introducing_depsy.md.
Rehn C, Gornitzki C, Larsson A, Wadskog D (2014). Bibliometric handbook for Karolinska Institutet. Huddinge: Karolinska Institutet. https://kib.ki.se/sites/default/files/bibliometric_handbook_2014.pdf.
Robbins C, Korkmaz G, Calderon JBS, Kelling C, Shipp S, Keller S (2018a). Open source software as intangible capital: Measuring the cost and impact of free digital tools. In: International Monetary Fund (IMF) 6th Statistical Forum: Measuring Economic Welfare in the Digital Age: What and How? International Monetary Fund (IMF). https://www.imf.org/en/News/Seminars/Conferences/2018/04/06/6th-statistics-forum.
Robbins C, Korkmaz G, Calderon JBS, Kelling C, Shipp S, Keller S (2018b). The scope and impact of open source software: A framework for analysis and preliminary cost estimates. In: International Association for Research on Income and Wealth (IARIW) 35th General Conference: The Digital Economy-Conceptual and Measurement Issues. The International Association for Research in Income and Wealth (IARIW). http://old.iariw.org/copenhagen/robbins.pdf.
Robbins CA, Korkmaz G, Guci L, Santiago Calderón JB Kramer B (2021). A first look at open-source software investment in the United States and in other countries, 2009–2019. In: International Association for Research on Income and Wealth (IARIW) ESCoE Conference. IARIW. https://iariw.org/wp-content/uploads/2021/11/robbins-paper.pdf.
Science-Metrix (2018). Bibliometrics and Patent Indicators for the Science and Engineering Indicators 2018. Technical Documentation. http://www.science-metrix.com/en/methodology-report.
Scott T, Rung AE (2016). Federal Source Code Policy: Achieving efficiency, transparency, and innovation through reusable and open source software. Office of Mgmt. & Budget, Exec. Office of the President Memorandum. https://www.whitehouse.gov/wp-content/uploads/legacy_drupal_files/omb/memoranda/2016/m_16_21.pdf.
Singh Chawla D (2016). The unsung heroes of scientific software. Nature News, 529(7584): 115. https://doi.org/10.1038/529115a
US Bureau of Economic Analysis (2023a). Government Gross Investment: Federal: National Defense: Gross Investment: Intellectual Property Products: Software. Retrieved from ALFRED, Federal Reserve Bank of St. Louis. Y053RC1A027NBEA.
US Bureau of Economic Analysis (2023b). Government Gross Investment: Federal: Nondefense: Gross Investment: Intellectual Property Products: Software. Retrieved from ALFRED, Federal Reserve Bank of St. Louis. Y068RC1A027NBEA.
US Bureau of Economic Analysis (2023c). Government Gross Investment: State and Local: Gross Investment: Intellectual Property Products: Software. Retrieved from ALFRED, Federal Reserve Bank of St. Louis. Y072RC1A027NBEA.
US Bureau of Economic Analysis (2023d). Gross Government Investment. Retrieved from ALFRED, Federal Reserve Bank of St. Louis. A782RC1A027NBEA.
US Bureau of Economic Analysis (2023e). Private Fixed Investment in Intellectual Property Products: Software: Custom. Retrieved from ALFRED, Federal Reserve Bank of St. Louis. Y004RC1A027NBEA.
US Bureau of Economic Analysis (2023f). Private Fixed Investment in Intellectual Property Products: Software: Own account. Retrieved from ALFRED, Federal Reserve Bank of St. Louis. Y005RC1A027NBEA.
US Bureau of Economic Analysis (2023g). Private Fixed Investment in Intellectual Property Products: Software: Prepackaged. Retrieved from ALFRED, Federal Reserve Bank of St. Louis. Y003RC1A027NBEA.
US Bureau of Economic Analysis (2023h). Private Fixed Investment: Nonresidential: Intellectual Property Products: Software. Retrieved from ALFRED, Federal Reserve Bank of St. Louis. B985RC1A027NBEA.
USCIO (2016). Federal Source Code Policy. https://www.whitehouse.gov/wp-content/uploads/legacy_drupal_files/omb/memoranda/2016/m_16_21.pdf.