Leveraging Survey Metadata for LLM Reasoning via Knowledge Graphs
Pub. online: 21 May 2026
Type: Statistical Data Science
Open Access
Received
9 September 2025
9 September 2025
Accepted
9 April 2026
9 April 2026
Published
21 May 2026
21 May 2026
Abstract
Statistical survey metadata contains essential contextual information that underpins the accurate interpretation, discovery, and reuse of statistical data. However, traditional metadata formats are not optimized for consumption by large language models (LLMs), which increasingly function as interfaces for data exploration, question-answering, and decision support. This work introduces a knowledge graph-based approach to modeling survey metadata using semantic web standards and linked data principles, specifically designed to make metadata machine-understandable and LLM-compatible. The core metadata entities, including surveys, datasets, variables, concepts, populations, and provenance, are modeled as rich interlinked nodes that allow reasoning, contextual enrichment, and structured prompting. The graph integrates established ontologies such as the Resource Description Framework (RDF) to promote interoperability and alignment with global standards. We demonstrate how this structure allows LLMs to surface relevant metadata, ground their outputs in authoritative sources, and generate semantically precise responses. This approach enhances transparency, facilitates metadata reuse, and supports the development of artificial intelligence (AI) applications powered by statistical products.
References
Abu-Salih B (2021). Domain-specific knowledge graphs: A survey. Journal of Network and Computer Applications, 185: 103076. https://doi.org/10.1016/j.jnca.2021.103076
Bang Y, Cahyawijaya S, Lee N, Dai W, Su D, ..., Fung P (2023). A multitask, multilingual, multimodal evaluation of ChatGPT on reasoning, hallucination, and interactivity. In: Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) (JC Park, Y Arase, B Hu, W Lu, D Wijaya, A Purwarianti, AA Krisnadhi, eds.), 675–718. Association for Computational Linguistics, Nusa Dua, Bali.
Bennett M (2013). The financial industry business ontology: Best practice for big data. Journal of Banking Regulation, 14(3): 255–268. https://doi.org/10.1057/jbr.2013.13
Cyganiak R, Wood D, Lanthaler M (2014). RDF 1.1 concepts and abstract syntax. https://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/. W3C Recommendation. 25 February 2014.
Devlin J (2018). Bert: Pre-training of deep bidirectional transformers for language understanding/arxiv preprint. arXiv preprint: arXiv:1810.04805
Grattafiori A, Dubey A, Jauhri A, Pandey A, Kadian A, ..., Ma Z (2024). The llama 3 herd of models. arXiv preprint: arXiv:2407.21783
Grootendorst M (2022). Bertopic: Neural topic modeling with a class-based tf-idf procedure. arXiv preprint: arXiv:2203.05794
Hastings J, Chepelev L, Willighagen E, Adams N, Steinbeck C, Dumontier M (2011). The chemical information ontology: Provenance and disambiguation for chemical data on the biological semantic web. PLoS ONE, 6(10): e25513. https://doi.org/10.1371/journal.pone.0025513
Hu N, Wu Y, Qi G, Min D, Chen J, ..., Ali Z (2023). An empirical study of pre-trained language models in simple knowledge graph question answering. World Wide Web, 26(5): 2855–2886. https://doi.org/10.1007/s11280-023-01166-y
Hu Z, Xu Y, Yu W, Wang S, Yang Z, ..., Sun Y (2022). Empowering language models with knowledge graph reasoning for question answering. arXiv preprint: arXiv:2211.08380
Järvelin K, Kekäläinen J (2002). Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems, 20(4): 422–446. https://doi.org/10.1145/582415.582418
Ji S, Pan S, Cambria E, Marttinen P, Yu PS (2021). A survey on knowledge graphs: Representation, acquisition, and applications. IEEE Transactions on Neural Networks and Learning Systems, 33(2): 494–514. https://doi.org/10.1109/TNNLS.2021.3070843
Ji Z, Lee N, Frieske R, Yu T, Su D, ..., Fung P (2023). Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12): 1–38. https://doi.org/10.1145/3571730
Kevian D, Syed U, Guo X, Havens A, Dullerud G, ..., Hu B (2024). Capabilities of large language models in control engineering: A benchmark study on gpt-4, claude 3 opus, and gemini 1.0 ultra. arXiv preprint: arXiv:2404.03647
Lau JH, Newman D, Baldwin T (2014). Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, 530–539. Association for Computational Linguistics, Gothenburg, Sweden.
Lin BY, Chen X, Chen J, Ren X (2019). KagNet: Knowledge-aware graph networks for commonsense reasoning. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (K Inui, J Jiang, V Ng, X Wan, eds.), 2829–2839. Association for Computational Linguistics, Hong Kong, China.
Liu J, Liu C, Zhou P, Lv R, Zhou K, Zhang Y (2023). Is chatgpt a good recommender? a preliminary study. arXiv preprint: arXiv:2304.10149
Liu NF, Gardner M, Belinkov Y, Peters ME, Smith NA (2019). Linguistic knowledge and transferability of contextual representations. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (J Burstein, C Doran, T Solorio, eds.), volume 1 of Long and Short Papers, 1073–1094. Association for Computational Linguistics, Minneapolis, Minnesota.
Liu Y, Ott M, Goyal N, Du J, Joshi M, ..., Stoyanov V (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint: arXiv:1907.11692
Logan R, Liu NF, Peters ME, Gardner M, Singh S (2019). Barack’s wife hillary: Using knowledge graphs for fact-aware language modeling. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (A Korhonen, D Traum, L Màrquez, eds.), 5962–5971. Association for Computational Linguistics, Florence, Italy.
Mitchell T, Cohen W, Hruschka E, Talukdar P, Yang B, ..., Welling J (2018). Never-ending learning. Communications of the ACM, 61(5): 103–115. https://doi.org/10.1145/3191513
Newman D, Lau JH, Grieser K, Baldwin T (2010). Automatic evaluation of topic coherence. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 100–108. Association for Computational Linguistics, Los Angeles, California.
Petroni F, Rocktäschel T, Lewis P, Bakhtin A, Wu Y, ..., Riedel S (2019). Language models as knowledge bases? arXiv preprint: arXiv:1909.01066
Rafailov R, Sharma A, Mitchell E, Manning CD, Ermon S, Finn C (2023). Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36: 53728–53741. https://doi.org/10.52202/075280-2338
Reimers N, Gurevych I (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint: arXiv:1908.10084
Sanh V, Webson A, Raffel C, Bach SH, Sutawika L, ..., Rush AM (2021). Multitask prompted training enables zero-shot task generalization. arXiv preprint: arXiv:2110.08207
Team G, Mesnard T, Hardin C, Dadashi R, Bhupatiraju S, ..., Kenealy K (2024). Gemma: Open models based on Gemini Research and technology. arXiv preprint: arXiv:2403.08295
United Nations Economic Commission for Europe (UNECE) (2025). Generic statistical information model (GSIM) version 2.0: User guide. https://unece.org/. User Guide PDF. GSIM v2.0.
US Census Bureau (2025a). Census API user guide. https://www.census.gov/data/developers/guidance/api-user-guide.html. Published January 16, 2025. Accessed September 1, 2025.
US Census Bureau, American Community Survey (2025b). American community survey (ACS). https://www.census.gov/programs-surveys/acs.html. Accessed September 1, 2025.
US Census Bureau, American Community Survey 1-Year Estimates (2023). American community survey 1-year estimates. https://api.census.gov/data/2023/acs/acs1. Accessed September 1. 2025.
US Census Bureau, American Community Survey 5-Year Estimates (2020). American community survey 5-year estimates. https://api.census.gov/data/2020/acs/acs5. Accessed September 1. 2025.
Vrandečić D, Krötzsch M (2014). Wikidata: A free collaborative knowledgebase. Communications of the ACM, 57(10): 78–85. https://doi.org/10.1145/2629489
Wang J, Hu X, Hou W, Chen H, Zheng R, ..., Xie X (2023a). On the robustness of chatgpt: An adversarial and out-of-distribution perspective. arXiv preprint: arXiv:2302.12095
Wei J, Bosma M, Zhao VY, Guu K, Yu AW, ..., Le QV (2021). Finetuned language models are zero-shot learners. arXiv preprint: arXiv:2109.01652
Yang J, Jin H, Tang R, Han X, Feng Q, ..., Hu X (2024). Harnessing the power of llms in practice: A survey on chatgpt and beyond. ACM Transactions on Knowledge Discovery from Data, 18(6): 1–32. https://doi.org/10.1145/3649506
Zhang Z, Han X, Liu Z, Jiang X, Sun M, Liu Q (2019). ERNIE: Enhanced language representation with informative entities. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (A Korhonen, D Traum, L Màrquez, eds.), 1441–1451. Association for Computational Linguistics, Florence, Italy.