Bringing Search to the Economic Census – The NAPCS Classification Tool✩
Volume 22, Issue 3 (2024): Special issue: The Government Advances in Statistical Programming (GASP) 2023 conference, pp. 409–422
Pub. online: 7 August 2024
Type: Data Science In Action
Open Access
✩
Any opinions and conclusions expressed herein are those of the author(s) and do not reflect the views of the U.S. Census Bureau. The Census Bureau has reviewed this data product to ensure appropriate access, use, and disclosure avoidance protection of the confidential source data (Project No. P-7504847), Disclosure Review Board (DRB) approval number: CBDRB-FY23-EWD001-002.
Received
30 November 2023
30 November 2023
Accepted
5 July 2024
5 July 2024
Published
7 August 2024
7 August 2024
Abstract
The North American Product Classification System (NAPCS) was first introduced in the 2017 Economic Census and provides greater detail on the range of products and services offered by businesses than what was previously available with just an industry code. In the 2022 Economic Census, NAPCS consisted of 7,234 codes and respondents often found that they were unable to identify correct NAPCS codes for their business, leaving instead written descriptions of their products and services. Over one million of these needed to be reviewed by Census analysts in the 2017 Economic Census. The Smart Instrument NAPCS Classification Tool (SINCT) offers respondents a low latency search engine to find appropriate NAPCS codes based on a written description of their products and services. SINCT uses a neural network document embedding model (doc2vec) to embed respondent searches in a numerical space and then identifies NAPCS codes that are close to the search text. This paper shows one way in which machine learning can improve the survey respondent experience and reduce the amount of expensive manual processing that is necessary after data collection. We also show how relatively simple tools can achieve an estimated 72% top-ten accuracy with thousands of possible classes, limited training data, and strict latency requirements.
References
Devlin J, Chang M, Lee K, Toutanova K (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (J Burstein, C Doran, T Solorio, eds.), 4171–4186. Association for Computational Linguistics, Minneapolis, MN.
Graves A, Schmidhuber J (2005). Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Networks, 18: 602–610. https://doi.org/10.1016/j.neunet.2005.06.042
Mikolov T, Chen K, Corrado G, Dean J (2013a). Efficient estimation of word representations in vector space. arXiv preprint: https://arxiv.org/abs/1301.3781.
Moscardi C, Schultz B (2023). Using machine learning to classify products for the commodity flow survey. In: Advances in Business Statistics, Methods and Data Collection: Introduction (G Snijkers, M Bavdź, S Bender, J Jones, S MacFeely, J Sakshaug, K Thompson, A van Delden, eds.), 573–591. Wiley Online Library.
O’Reagan (1972). Computer assigned codes from verbal responses. Communications of the ACM, 15: 455–459. https://doi.org/10.1145/361405.361419
Roberson A (2021). Applying machine learning for automatic product categorization. Journal of Official Statistics, 37(2): 395–410. https://doi.org/10.2478/jos-2021-0017
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, et al. (2017). Attention is all you need, In: Proceedings of the 30th Conference on Neural Information Processing Systems (I Guyon, U Von Luxburg, S Bengio, H Wallach, R Fergus, S Vishwanathan, R Garnett, eds.), Curran Associates, Inc, Long Beach, CA.