Leveraging Survey Metadata for LLM Reasoning via Knowledge Graphs

Belyaeva, Irina; Carino, Christopher; Wang, Liang-Chi

doi:10.6339/26-JDS1230

Journal of Data Science

Leveraging Survey Metadata for LLM Reasoning via Knowledge Graphs

Irina Belyaeva

Christopher Carino Liang-Chi Wang

https://doi.org/10.6339/26-JDS1230

Pub. online: 21 May 2026 Type: Statistical Data Science

Open Access

Received
9 September 2025

Accepted
9 April 2026

Published
21 May 2026

Abstract

Statistical survey metadata contains essential contextual information that underpins the accurate interpretation, discovery, and reuse of statistical data. However, traditional metadata formats are not optimized for consumption by large language models (LLMs), which increasingly function as interfaces for data exploration, question-answering, and decision support. This work introduces a knowledge graph-based approach to modeling survey metadata using semantic web standards and linked data principles, specifically designed to make metadata machine-understandable and LLM-compatible. The core metadata entities, including surveys, datasets, variables, concepts, populations, and provenance, are modeled as rich interlinked nodes that allow reasoning, contextual enrichment, and structured prompting. The graph integrates established ontologies such as the Resource Description Framework (RDF) to promote interoperability and alignment with global standards. We demonstrate how this structure allows LLMs to surface relevant metadata, ground their outputs in authoritative sources, and generate semantically precise responses. This approach enhances transparency, facilitates metadata reuse, and supports the development of artificial intelligence (AI) applications powered by statistical products.

Supplementary material

Supplementary Material

Appendices A-C.

References

Abu-Salih B (2021). Domain-specific knowledge graphs: A survey. Journal of Network and Computer Applications, 185: 103076. https://doi.org/10.1016/j.jnca.2021.103076

Bang Y, Cahyawijaya S, Lee N, Dai W, Su D, ..., Fung P (2023). A multitask, multilingual, multimodal evaluation of ChatGPT on reasoning, hallucination, and interactivity. In: Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) (JC Park, Y Arase, B Hu, W Lu, D Wijaya, A Purwarianti, AA Krisnadhi, eds.), 675–718. Association for Computational Linguistics, Nusa Dua, Bali.

Bennett M (2013). The financial industry business ontology: Best practice for big data. Journal of Banking Regulation, 14(3): 255–268. https://doi.org/10.1057/jbr.2013.13

Bodenreider O (2004). The unified medical language system (umls): Integrating biomedical terminology. Nucleic acids research. 32(suppl_1): D267–D270.

Bouma G (2009). Normalized (pointwise) mutual information in collocation extraction. In: Proceedings of the Biennial GSCL Conference: From Form to Meaning—Processing Texts Automatically (C Chiarcos, RE de Castilho, M Stede, eds.), 31–40.

Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, ..., Amodei D (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33: 1877–1901.

Carlson A, Betteridge J, Kisiel B, Settles B, Hruschka E, Mitchell T (2010). Toward an architecture for never-ending language learning. In: Proceedings of the AAAI Conference on Artificial Intelligence (M Fox, D Poole, eds.), volume 24, 1306–1313.

Christiano PF, Leike J, Brown T, Martic M, Legg S, Amodei D (2017). Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30.

Cyganiak R, Wood D, Lanthaler M (2014). RDF 1.1 concepts and abstract syntax. https://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/. W3C Recommendation. 25 February 2014.

Dai D, Dong L, Hao Y, Sui Z, Chang B, Wei F (2021). Knowledge neurons in pretrained transformers. arXiv preprint.

Devlin J (2018). Bert: Pre-training of deep bidirectional transformers for language understanding/arxiv preprint. arXiv preprint: arXiv:1810.04805

Golovneva O, Chen M, Poff S, Corredor M, Zettlemoyer L, ..., Celikyilmaz A (2023). ROSCOE: A suite of metrics for scoring step-by-step reasoning. In: Proceedings of the Eleventh International Conference on Learning Representations (ICLR).

Grattafiori A, Dubey A, Jauhri A, Pandey A, Kadian A, ..., Ma Z (2024). The llama 3 herd of models. arXiv preprint: arXiv:2407.21783

Grootendorst M (2022). Bertopic: Neural topic modeling with a class-based tf-idf procedure. arXiv preprint: arXiv:2203.05794

Hastings J, Chepelev L, Willighagen E, Adams N, Steinbeck C, Dumontier M (2011). The chemical information ontology: Provenance and disambiguation for chemical data on the biological semantic web. PLoS ONE, 6(10): e25513. https://doi.org/10.1371/journal.pone.0025513

Hu N, Wu Y, Qi G, Min D, Chen J, ..., Ali Z (2023). An empirical study of pre-trained language models in simple knowledge graph question answering. World Wide Web, 26(5): 2855–2886. https://doi.org/10.1007/s11280-023-01166-y

Hu Z, Xu Y, Yu W, Wang S, Yang Z, ..., Sun Y (2022). Empowering language models with knowledge graph reasoning for question answering. arXiv preprint: arXiv:2211.08380

International Organization for Standardization (2013). Statistical data and metadata exchange (SDMX).

Järvelin K, Kekäläinen J (2002). Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems, 20(4): 422–446. https://doi.org/10.1145/582415.582418

Ji S, Pan S, Cambria E, Marttinen P, Yu PS (2021). A survey on knowledge graphs: Representation, acquisition, and applications. IEEE Transactions on Neural Networks and Learning Systems, 33(2): 494–514. https://doi.org/10.1109/TNNLS.2021.3070843

Ji Z, Lee N, Frieske R, Yu T, Su D, ..., Fung P (2023). Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12): 1–38. https://doi.org/10.1145/3571730

Kevian D, Syed U, Guo X, Havens A, Dullerud G, ..., Hu B (2024). Capabilities of large language models in control engineering: A benchmark study on gpt-4, claude 3 opus, and gemini 1.0 ultra. arXiv preprint: arXiv:2404.03647

Lau JH, Newman D, Baldwin T (2014). Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, 530–539. Association for Computational Linguistics, Gothenburg, Sweden.

Lewis P, Perez E, Piktus A, Petroni F, Karpukhin V, ..., Kiela D (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33: 9459–9474.

Li Z, Wang C, Liu Z, Wang H, Wang S, Gao C (2022). Cctest: Testing and repairing code completion systems. 2023 ieee/acm 45th international conference on software engineering (icse) (2022), 1238–1250.

Lin BY, Chen X, Chen J, Ren X (2019). KagNet: Knowledge-aware graph networks for commonsense reasoning. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (K Inui, J Jiang, V Ng, X Wan, eds.), 2829–2839. Association for Computational Linguistics, Hong Kong, China.

Liu J, Liu C, Zhou P, Lv R, Zhou K, Zhang Y (2023). Is chatgpt a good recommender? a preliminary study. arXiv preprint: arXiv:2304.10149

Liu NF, Gardner M, Belinkov Y, Peters ME, Smith NA (2019). Linguistic knowledge and transferability of contextual representations. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (J Burstein, C Doran, T Solorio, eds.), volume 1 of Long and Short Papers, 1073–1094. Association for Computational Linguistics, Minneapolis, Minnesota.

Liu W, Zhou P, Zhao Z, Wang Z, Ju Q, ..., Wang P (2020). K-bert: Enabling language representation with knowledge graph. In: Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 2901–2908.

Liu Y, Ott M, Goyal N, Du J, Joshi M, ..., Stoyanov V (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint: arXiv:1907.11692

Liu Y, Wan Y, He L, Peng H, Yu PS (2021). KG-bart: Knowledge graph-augmented bart for generative commonsense reasoning. In: Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 6418–6425.

Logan R, Liu NF, Peters ME, Gardner M, Singh S (2019). Barack’s wife hillary: Using knowledge graphs for fact-aware language modeling. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (A Korhonen, D Traum, L Màrquez, eds.), 5962–5971. Association for Computational Linguistics, Florence, Italy.

Luo D, Su J, Yu S (2020). A bert-based approach with relation-aware attention for knowledge base question answering. In: 2020 International Joint Conference on Neural Networks (IJCNN), 1–8. IEEE.

Malinka K, Peresíni M, Firc A, Hujnák O, Janus F (2023). On the educational impact of chatgpt: Is artificial intelligence ready to obtain a university degree? In: Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education v. 1, 47–53.

Manning CD, Raghavan P, Schütze H (2008). Introduction to Information Retrieval. Cambridge University Press, Cambridge.

Mitchell T, Cohen W, Hruschka E, Talukdar P, Yang B, ..., Welling J (2018). Never-ending learning. Communications of the ACM, 61(5): 103–115. https://doi.org/10.1145/3191513

Newman D, Lau JH, Grieser K, Baldwin T (2010). Automatic evaluation of topic coherence. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 100–108. Association for Computational Linguistics, Los Angeles, California.

Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C, ..., Lowe R (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35: 27730–27744.

Petroni F, Rocktäschel T, Lewis P, Bakhtin A, Wu Y, ..., Riedel S (2019). Language models as knowledge bases? arXiv preprint: arXiv:1909.01066

Rafailov R, Sharma A, Mitchell E, Manning CD, Ermon S, Finn C (2023). Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36: 53728–53741. https://doi.org/10.52202/075280-2338

Reimers N, Gurevych I (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint: arXiv:1908.10084

Ristoski P, Rosati J, Di Noia T, De Leone R, Paulheim H (2019). Rdf2vec: RDF graph embeddings and their applications. Semantic Web, 10(4): 721–752.

Robertson S, Zaragoza H (2009). The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval, 3(4): 333–389.

Röder M, Both A, Hinneburg A (2015). Exploring the space of topic coherence measures. In: Proceedings of the 8th ACM International Conference on Web Search and Data Mining (WSDM), 399–408. ACM.

Sanh V, Webson A, Raffel C, Bach SH, Sutawika L, ..., Rush AM (2021). Multitask prompted training enables zero-shot task generalization. arXiv preprint: arXiv:2110.08207

Suchanek FM, Kasneci G, Weikum G (2007). Yago: A core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, 697–706.

Team G, Mesnard T, Hardin C, Dadashi R, Bhupatiraju S, ..., Kenealy K (2024). Gemma: Open models based on Gemini Research and technology. arXiv preprint: arXiv:2403.08295

United Nations Economic Commission for Europe (UNECE) (2025). Generic statistical information model (GSIM) version 2.0: User guide. https://unece.org/. User Guide PDF. GSIM v2.0.

US Census Bureau (2025a). Census API user guide. https://www.census.gov/data/developers/guidance/api-user-guide.html. Published January 16, 2025. Accessed September 1, 2025.

US Census Bureau, American Community Survey (2025b). American community survey (ACS). https://www.census.gov/programs-surveys/acs.html. Accessed September 1, 2025.

US Census Bureau, American Community Survey 1-Year Estimates (2023). American community survey 1-year estimates. https://api.census.gov/data/2023/acs/acs1. Accessed September 1. 2025.

US Census Bureau, American Community Survey 5-Year Estimates (2020). American community survey 5-year estimates. https://api.census.gov/data/2020/acs/acs5. Accessed September 1. 2025.

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, ..., Polosukhin I (2017). Attention is all you need. Advances in neural information processing systems, 30.

Vrandečić D, Krötzsch M (2014). Wikidata: A free collaborative knowledgebase. Communications of the ACM, 57(10): 78–85. https://doi.org/10.1145/2629489

Wang J, Hu X, Hou W, Chen H, Zheng R, ..., Xie X (2023a). On the robustness of chatgpt: An adversarial and out-of-distribution perspective. arXiv preprint: arXiv:2302.12095

Wang X, Wei J, Schuurmans D, Le QV, Chi EH, ..., Zhou D (2023b). Self-consistency improves chain of thought reasoning in language models. In: Proceedings of the Eleventh International Conference on Learning Representations (ICLR). ICLR. 2023.

Wei J, Bosma M, Zhao VY, Guu K, Yu AW, ..., Le QV (2021). Finetuned language models are zero-shot learners. arXiv preprint: arXiv:2109.01652

Yang J, Jin H, Tang R, Han X, Feng Q, ..., Hu X (2024). Harnessing the power of llms in practice: A survey on chatgpt and beyond. ACM Transactions on Knowledge Discovery from Data, 18(6): 1–32. https://doi.org/10.1145/3649506

Zhang Z, Han X, Liu Z, Jiang X, Sun M, Liu Q (2019). ERNIE: Enhanced language representation with informative entities. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (A Korhonen, D Traum, L Màrquez, eds.), 1441–1451. Association for Computational Linguistics, Florence, Italy.

2026 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.

Open access article under the CC BY license.

Keywords

large language models linked data link prediction metadata interoperability retrieval-augmented generation semantic search statistical knowledge graphs

Metrics

since February 2021

148

Article info
views

PDF
downloads

RSS

Authors

Abstract

Supplementary material

References

Export citation

Copy and paste formatted citation

Download citation in file