Journal of Data Science logo


Login Register

  1. Home
  2. Issues
  3. Volume 22, Issue 3 (2024): Special issue: The Government Advances in Statistical Programming (GASP) 2023 conference
  4. Measuring Public Open-Source Software in ...

Journal of Data Science

Submit your article Information
  • Article info
  • More
    Article info

Measuring Public Open-Source Software in the Federal Government: An Analysis of Code.gov
Volume 22, Issue 3 (2024): Special issue: The Government Advances in Statistical Programming (GASP) 2023 conference, pp. 356–375
Rahul Shrivastava   Gizem Korkmaz ORCID icon link to view author Gizem Korkmaz details  

Authors

 
Placeholder
https://doi.org/10.6339/24-JDS1148
Pub. online: 29 July 2024      Type: Data Science In Action      Open accessOpen Access

Received
9 December 2023
Accepted
5 July 2024
Published
29 July 2024

Abstract

This paper presents an in-depth analysis of patterns and trends in the open-source software (OSS) contributions by the U.S. federal government agencies. OSS is a unique category of computer software notable for its publicly accessible source code and the rights it provides for modification and distribution for any purpose. Prompted by the Federal Source Code Policy (USCIO, 2016), Code.gov was established as a platform to facilitate the sharing of custom-developed software across various federal government agencies. This study leverages data from Code.gov, which catalogs OSS projects developed and shared by government agencies, and enhances this data with detailed development and contributor information from GitHub. By adopting a cost estimation methodology that is consistent with the U.S. national accounting framework for software investment proposed in Korkmaz et al. (2024), this research provides annual estimates of investment in OSS by government agencies for the 2009–2021 period. The findings indicate a significant investment by the federal government in OSS, with the 2021 investment estimated at around $407 million. This study not only sheds light on the government’s role in fostering OSS development but also offers a valuable framework for assessing the scope and value of OSS initiatives within the public sector.

Supplementary material

 Supplementary Material
The data and code needed to reproduce the results in this paper can be found at the Journal of Data Science website. The links to the raw data and code can also be found on our project’s website at https://oss.quarto.pub/website/analyses.html

References

 
Bastian M, Heymann S, Jacomy M Gephi: An open source software for exploring and manipulating networks. http://www.aaai.org/ocs/index.php/ICWSM/09/paper/view/154.
 
Biden JR (2020). Executive order on promoting the use of trustworthy artificial intelligence in the federal government. https://www.federalregister.gov/documents/2020/12/08/2020-27065/promoting-the-use-of-trustworthy-artificial-intelligence-in-the-federal-government.
 
Biden JR (2023). Executive order on the safe, secure, and trustworthy development and use of artificial intelligence. https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence/.
 
Bockstael NE, McConnell KE (1983). Welfare measurement in the household production framework. American Economic Review, 73(4): 806–814.
 
Boehm BW (1984). Software engineering economics. IEEE Transactions on Software Engineering, SE-10(1): 4–21. https://doi.org/10.1109/TSE.1984.5010193
 
Börner K, Sanyal S, Vespignani A (2007). Network science. Annual Review of Information Science and Technology, 41(1): 537–607. https://doi.org/10.1002/aris.2007.1440410119
 
Calderón JBS, Robbins C, Guci L, Korkmaz G, Kramer BL (2022). Measuring the cost of open source software innovation on GitHub. Technical report, U.S. Bureau of Economic Analysis.
 
Code.gov (n.d.). Agency compliance dashboard. https://code.gov/agency-compliance/compliance/dashboard.
 
Congress, US (2018). Foundations for evidence-based policymaking act of 2018. Public Law, 115: 435. https://www.congress.gov/bill/115th-congress/house-bill/4174.
 
Corrado C, Haskel J, Jona-Lasinio C (2015). Public intangibles: The public sector and economic growth in the SNA. In: Economics Program Working Paper Series. The Conference Board. Available at https://www.conference-board.org/pdf_free/workingpapers/EPWP1501.pdf.
 
Damanpour F (1991). Organizational innovation: A meta-analysis of effects of determinants and moderators. Academy of Management Journal, 34(3): 555–590. https://doi.org/10.2307/256406
 
DOE CODE (n.d.). Software Policy of DOE. https://www.osti.gov/doecode/policy.
 
Garfield E, Pudovkin A, Istomin V (2002). Algorithmic citation-linked historiography—mapping the literature of science. Proceedings of the American Society for Information Science and Technology, 39(1): 14–24. https://doi.org/10.1002/meet.1450390102
 
Gault F (2018). Defining and measuring innovation in all sectors of the economy. Research Policy, 47(3): 617–622. https://doi.org/10.1016/j.respol.2018.01.007
 
GitHub (2023). The State of the Octoverse. https://octoverse.github.com.
 
GSA (2019). GSA Open Software Policy. https://open.gsa.gov/oss-policy/.
 
Harris CR, Millman KJ, van der Walt SJ Gommers R Virtanen P Cournapeau D, et al. (2020). Array programming with NumPy. Nature, 585(7825): 357–362. https://doi.org/10.1038/s41586-020-2649-2
 
Hoffa F (2017). The top contributors to GitHub (2017). https://hoffa.medium.com/the-top-contributors-to-github-2017-be98ab854e87.
 
Hoffmann M, Nagle F, Zhou Y (2024). The value of open source software. Harvard Business School Strategy Unit Working Paper (24-038).
 
Howison J, Bullard J (2016). Software in the scientific literature: Problems with seeing, finding, and using software mentioned in the biology literature. The Journal of the Association for Information Science and Technology, 67(9): 2137–2155. https://doi.org/10.1002/asi.23538
 
Howison J, Deelman E, McLennan MJ, Ferreira da Silva R, Herbsleb JD (2015). Understanding the scientific software ecosystem and its impact: Current and future measures. Research Evaluation, 24(4): 454–470. https://doi.org/10.1093/reseval/rvv014
 
Impact Story (2012). https://impactstory.org.
 
Keller SA, Korkmaz G, Robbins CA, Shipp SS (2018). Opportunities to observe and measure intangible inputs to innovation: Definitions, operationalization, and examples. Proceedings of the National Academy of Sciences, 115(50): 12638–12645. https://doi.org/10.1073/pnas.1800467115
 
Keralis JM, Albertorio-Díaz J, Hoppe T (2023). Dark citations to federal resources and their contribution to the public health literature. Frontiers in Research Metrics and Analytics, 8: 1235208. https://doi.org/10.3389/frma.2023.1235208
 
Korkmaz G, Kelling C, Robbins CA, Keller SA (2018). Modeling the impact of R packages using dependency and contributor networks. In: Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 511–514.
 
Korkmaz G, Kelling C, Robbins CA, Keller SA (2020). Modeling the impact of Python and R packages using dependency and contributor networks. Social Network Analysis and Mining, 10: 1–12. https://doi.org/10.1007/s13278-019-0612-8
 
Korkmaz G, Santiago Calderón JB, Kramer BL, Guci L, Robbins CA (2024). From GitHub to GDP: A framework for measuring open source software innovation. Research Policy, 53(3): 104954. https://doi.org/10.1016/j.respol.2024.104954
 
Kramer BL (2021a). diverstidy: A tidy package for detection and standardization of geographic, population, and diversity-related terminology in unstructured text data. https://github.com/brandonleekramer/diverstidy.
 
Kramer BL (2021b). tidyorgs: A tidy package that standardizes text data for organizational analysis. https://github.com/brandonleekramer/tidyorgs.
 
Martin BR (2016). Twenty challenges for innovation studies. Science and Public Policy, 43(3): 432–450. https://doi.org/10.1093/scipol/scv077
 
Nakamura LI, Samuels J, Soloveichik RH (2017). Measuring the ‘free’ digital economy within the GDP and productivity accounts. https://www.bea.gov/research/papers/2017/measuring-free-digital-economy-within-gdp-and-productivity-accounts.
 
Nakamura LI, Soloveichik RH (2015). Valuing ‘free’ media across countries in GDP. FRB of Philadelphia Working Paper.
 
NASA (n.d.). NASA Open Software Policy. https://code.nasa.gov/#/guide.
 
OSI (1998). The open source definition. https://opensource.org/osd.
 
pandas (2020). pandas-dev/pandas: Pandas. https://doi.org/10.5281/zenodo.3509134.
 
Piwowar H, Priem J (2016). Depsy: Valuing the software that powers science. https://github.com/Impactstory/depsy-research/blob/master/introducing_depsy.md.
 
Rehn C, Gornitzki C, Larsson A, Wadskog D (2014). Bibliometric handbook for Karolinska Institutet. Huddinge: Karolinska Institutet. https://kib.ki.se/sites/default/files/bibliometric_handbook_2014.pdf.
 
Robbins C, Korkmaz G, Calderon JBS, Kelling C, Shipp S, Keller S (2018a). Open source software as intangible capital: Measuring the cost and impact of free digital tools. In: International Monetary Fund (IMF) 6th Statistical Forum: Measuring Economic Welfare in the Digital Age: What and How? International Monetary Fund (IMF). https://www.imf.org/en/News/Seminars/Conferences/2018/04/06/6th-statistics-forum.
 
Robbins C, Korkmaz G, Calderon JBS, Kelling C, Shipp S, Keller S (2018b). The scope and impact of open source software: A framework for analysis and preliminary cost estimates. In: International Association for Research on Income and Wealth (IARIW) 35th General Conference: The Digital Economy-Conceptual and Measurement Issues. The International Association for Research in Income and Wealth (IARIW). http://old.iariw.org/copenhagen/robbins.pdf.
 
Robbins CA, Korkmaz G, Guci L, Santiago Calderón JB Kramer B (2021). A first look at open-source software investment in the United States and in other countries, 2009–2019. In: International Association for Research on Income and Wealth (IARIW) ESCoE Conference. IARIW. https://iariw.org/wp-content/uploads/2021/11/robbins-paper.pdf.
 
Science-Metrix (2018). Bibliometrics and Patent Indicators for the Science and Engineering Indicators 2018. Technical Documentation. http://www.science-metrix.com/en/methodology-report.
 
Scott T, Rung AE (2016). Federal Source Code Policy: Achieving efficiency, transparency, and innovation through reusable and open source software. Office of Mgmt. & Budget, Exec. Office of the President Memorandum. https://www.whitehouse.gov/wp-content/uploads/legacy_drupal_files/omb/memoranda/2016/m_16_21.pdf.
 
Singh Chawla D (2016). The unsung heroes of scientific software. Nature News, 529(7584): 115. https://doi.org/10.1038/529115a
 
US Bureau of Economic Analysis (2022). NIPA Handbook: Concepts and Methods of the U.S. National Income and Product Accounts.
 
US Bureau of Economic Analysis (2023a). Government Gross Investment: Federal: National Defense: Gross Investment: Intellectual Property Products: Software. Retrieved from ALFRED, Federal Reserve Bank of St. Louis. Y053RC1A027NBEA.
 
US Bureau of Economic Analysis (2023b). Government Gross Investment: Federal: Nondefense: Gross Investment: Intellectual Property Products: Software. Retrieved from ALFRED, Federal Reserve Bank of St. Louis. Y068RC1A027NBEA.
 
US Bureau of Economic Analysis (2023c). Government Gross Investment: State and Local: Gross Investment: Intellectual Property Products: Software. Retrieved from ALFRED, Federal Reserve Bank of St. Louis. Y072RC1A027NBEA.
 
US Bureau of Economic Analysis (2023d). Gross Government Investment. Retrieved from ALFRED, Federal Reserve Bank of St. Louis. A782RC1A027NBEA.
 
US Bureau of Economic Analysis (2023e). Private Fixed Investment in Intellectual Property Products: Software: Custom. Retrieved from ALFRED, Federal Reserve Bank of St. Louis. Y004RC1A027NBEA.
 
US Bureau of Economic Analysis (2023f). Private Fixed Investment in Intellectual Property Products: Software: Own account. Retrieved from ALFRED, Federal Reserve Bank of St. Louis. Y005RC1A027NBEA.
 
US Bureau of Economic Analysis (2023g). Private Fixed Investment in Intellectual Property Products: Software: Prepackaged. Retrieved from ALFRED, Federal Reserve Bank of St. Louis. Y003RC1A027NBEA.
 
US Bureau of Economic Analysis (2023h). Private Fixed Investment: Nonresidential: Intellectual Property Products: Software. Retrieved from ALFRED, Federal Reserve Bank of St. Louis. B985RC1A027NBEA.
 
US Bureau of Economic Analysis (2023i). Private Nonresidential Fixed Investment [PNFIA]. Retrieved from ALFRED, Federal Reserve Bank of St. Louis.
 
US Bureau of Economic Analysis (2023j). Private Residential Fixed Investment [PRFI]. Retrieved from ALFRED, Federal Reserve Bank of St. Louis.
 
US Bureau of Labor Statistics (2021). Occupational Employment Statistics: National industry-specific and by ownership.
 
USCIO (2016). Federal Source Code Policy. https://www.whitehouse.gov/wp-content/uploads/legacy_drupal_files/omb/memoranda/2016/m_16_21.pdf.
 
Von Hippel E (2016). Free Innovation. MIT Press.

PDF XML
PDF XML

Copyright
2024 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
Code.gov cost measurement Github open-source software software investment

Funding
This work was supported by the National Science Foundation (NSF) under Grant Numbers 2306160 and 2224441. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of NSF.

Metrics
since February 2021
908

Article info
views

321

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy