Supplementary Material

JDS

Journal of Data Science

1683-86021680-743X

1680-743X

School of Statistics, Renmin University of China

JDS1230

10.6339/26-JDS1230

Statistical Data Science

Leveraging Survey Metadata for LLM Reasoning via Knowledge Graphs

https://orcid.org/0000-0003-2977-1936

Belyaeva

Irina

irinabelaeva@gmail.comirina.belyaeva@census.gov1∗ Carino

Christopher

1 Wang

Liang-Chi

1 1Research and Methodology Directorate, Center for Enterprise Dissemination, U.S. Census Bureau, Suitland, Maryland 20746, United States

∗Corresponding author. Email: irinabelaeva@gmail.com or irina.belyaeva@census.gov.

2026

2152026

00122

Supplementary Material

Appendices A-C.

992025942026

2026 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.

2026

Open access article under the CC BY license.

Statistical survey metadata contains essential contextual information that underpins the accurate interpretation, discovery, and reuse of statistical data. However, traditional metadata formats are not optimized for consumption by large language models (LLMs), which increasingly function as interfaces for data exploration, question-answering, and decision support. This work introduces a knowledge graph-based approach to modeling survey metadata using semantic web standards and linked data principles, specifically designed to make metadata machine-understandable and LLM-compatible. The core metadata entities, including surveys, datasets, variables, concepts, populations, and provenance, are modeled as rich interlinked nodes that allow reasoning, contextual enrichment, and structured prompting. The graph integrates established ontologies such as the Resource Description Framework (RDF) to promote interoperability and alignment with global standards. We demonstrate how this structure allows LLMs to surface relevant metadata, ground their outputs in authoritative sources, and generate semantically precise responses. This approach enhances transparency, facilitates metadata reuse, and supports the development of artificial intelligence (AI) applications powered by statistical products.

Keywords large language models linked data link prediction metadata interoperability retrieval-augmented generation semantic search statistical knowledge graphs

References

Abu-Salih

(2021). Domain-specific knowledge graphs: A survey. Journal of Network and Computer Applications, 185: 103076. https://doi.org/10.1016/j.jnca.2021.103076

Bang

, Cahyawijaya

, Lee

, Dai

, Su

, ..., Fung

(2023). A multitask, multilingual, multimodal evaluation of ChatGPT on reasoning, hallucination, and interactivity. In: Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) (

Park,

Arase,

Hu,

Lu,

Wijaya,

Purwarianti,

Krisnadhi, eds.), 675–718. Association for Computational Linguistics, Nusa Dua, Bali.

Bennett

(2013). The financial industry business ontology: Best practice for big data. Journal of Banking Regulation, 14(3): 255–268. https://doi.org/10.1057/jbr.2013.13

Bodenreider

(2004). The unified medical language system (umls): Integrating biomedical terminology. Nucleic acids research. 32(suppl_1): D267–D270.

Bouma

(2009). Normalized (pointwise) mutual information in collocation extraction. In: Proceedings of the Biennial GSCL Conference: From Form to Meaning—Processing Texts Automatically (

Chiarcos,

de Castilho,

Stede, eds.), 31–40.

Brown

, Mann

, Ryder

, Subbiah

, Kaplan

, ..., Amodei

(2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33: 1877–1901.

Carlson

, Betteridge

, Kisiel

, Settles

, Hruschka

, Mitchell

(2010). Toward an architecture for never-ending language learning. In: Proceedings of the AAAI Conference on Artificial Intelligence (

Fox,

Poole, eds.), volume 24, 1306–1313.

Christiano

, Leike

, Brown

, Martic

, Legg

, Amodei

(2017). Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30.

Cyganiak

, Wood

, Lanthaler

(2014). RDF 1.1 concepts and abstract syntax. https://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/. W3C Recommendation. 25 February 2014.

Dai

, Dong

, Hao

, Sui

, Chang

, Wei

(2021). Knowledge neurons in pretrained transformers. arXiv preprint.

Devlin

(2018). Bert: Pre-training of deep bidirectional transformers for language understanding/arxiv preprint. arXiv preprint: arXiv:1810.04805

Golovneva

, Chen

, Poff

, Corredor

, Zettlemoyer

, ..., Celikyilmaz

(2023). ROSCOE: A suite of metrics for scoring step-by-step reasoning. In: Proceedings of the Eleventh International Conference on Learning Representations (ICLR).

Grattafiori

, Dubey

, Jauhri

, Pandey

, Kadian

, ..., Ma

(2024). The llama 3 herd of models. arXiv preprint: arXiv:2407.21783

Grootendorst

(2022). Bertopic: Neural topic modeling with a class-based tf-idf procedure. arXiv preprint: arXiv:2203.05794

Hastings

, Chepelev

, Willighagen

, Adams

, Steinbeck

, Dumontier

(2011). The chemical information ontology: Provenance and disambiguation for chemical data on the biological semantic web. PLoS ONE, 6(10): e25513. https://doi.org/10.1371/journal.pone.0025513

, Wu

, Qi

, Min

, Chen

, ..., Ali

(2023). An empirical study of pre-trained language models in simple knowledge graph question answering. World Wide Web, 26(5): 2855–2886. https://doi.org/10.1007/s11280-023-01166-y

, Xu

, Yu

, Wang

, Yang

, ..., Sun

(2022). Empowering language models with knowledge graph reasoning for question answering. arXiv preprint: arXiv:2211.08380

International Organization for Standardization (2013). Statistical data and metadata exchange (SDMX).

Järvelin

, Kekäläinen

(2002). Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems, 20(4): 422–446. https://doi.org/10.1145/582415.582418

, Pan

, Cambria

, Marttinen

, Yu

(2021). A survey on knowledge graphs: Representation, acquisition, and applications. IEEE Transactions on Neural Networks and Learning Systems, 33(2): 494–514. https://doi.org/10.1109/TNNLS.2021.3070843

, Lee

, Frieske

, Yu

, Su

, ..., Fung

(2023). Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12): 1–38. https://doi.org/10.1145/3571730

Kevian

, Syed

, Guo

, Havens

, Dullerud

, ..., Hu

(2024). Capabilities of large language models in control engineering: A benchmark study on gpt-4, claude 3 opus, and gemini 1.0 ultra. arXiv preprint: arXiv:2404.03647

Lau

, Newman

, Baldwin

(2014). Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, 530–539. Association for Computational Linguistics, Gothenburg, Sweden.

Lewis

, Perez

, Piktus

, Petroni

, Karpukhin

, ..., Kiela

(2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33: 9459–9474.

, Wang

, Liu

, Wang

, Gao

(2022). Cctest: Testing and repairing code completion systems. 2023 ieee/acm 45th international conference on software engineering (icse) (2022), 1238–1250.

Lin

, Chen

, Ren

(2019). KagNet: Knowledge-aware graph networks for commonsense reasoning. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (

Inui,

Jiang,

Ng,

Wan, eds.), 2829–2839. Association for Computational Linguistics, Hong Kong, China.

Liu

, Liu

, Zhou

, Lv

, Zhou

, Zhang

(2023). Is chatgpt a good recommender? a preliminary study. arXiv preprint: arXiv:2304.10149

Liu

, Gardner

, Belinkov

, Peters

, Smith

(2019). Linguistic knowledge and transferability of contextual representations. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (

Burstein,

Doran,

Solorio, eds.), volume 1 of Long and Short Papers, 1073–1094. Association for Computational Linguistics, Minneapolis, Minnesota.

Liu

, Zhou

, Zhao

, Wang

, Ju

, ..., Wang

(2020). K-bert: Enabling language representation with knowledge graph. In: Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 2901–2908.

Liu

, Ott

, Goyal

, Du

, Joshi

, ..., Stoyanov

(2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint: arXiv:1907.11692

Liu

, Wan

, He

, Peng

, Yu

(2021). KG-bart: Knowledge graph-augmented bart for generative commonsense reasoning. In: Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 6418–6425.

Logan

, Liu

, Peters

, Gardner

, Singh

(2019). Barack’s wife hillary: Using knowledge graphs for fact-aware language modeling. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (

Korhonen,

Traum,

Màrquez, eds.), 5962–5971. Association for Computational Linguistics, Florence, Italy.

Luo

, Su

, Yu

(2020). A bert-based approach with relation-aware attention for knowledge base question answering. In: 2020 International Joint Conference on Neural Networks (IJCNN), 1–8. IEEE.

Malinka

, Peresíni

, Firc

, Hujnák

, Janus

(2023). On the educational impact of chatgpt: Is artificial intelligence ready to obtain a university degree? In: Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education v. 1, 47–53.

Manning

, Raghavan

, Schütze

(2008). Introduction to Information Retrieval. Cambridge University Press, Cambridge.

Mitchell

, Cohen

, Hruschka

, Talukdar

, Yang

, ..., Welling

(2018). Never-ending learning. Communications of the ACM, 61(5): 103–115. https://doi.org/10.1145/3191513

Newman

, Lau

, Grieser

, Baldwin

(2010). Automatic evaluation of topic coherence. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 100–108. Association for Computational Linguistics, Los Angeles, California.

Ouyang

, Wu

, Jiang

, Almeida

, Wainwright

, ..., Lowe

(2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35: 27730–27744.

Petroni

, Rocktäschel

, Lewis

, Bakhtin

, Wu

, ..., Riedel

(2019). Language models as knowledge bases? arXiv preprint: arXiv:1909.01066

Rafailov

, Sharma

, Mitchell

, Manning

, Ermon

, Finn

(2023). Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36: 53728–53741. https://doi.org/10.52202/075280-2338

Reimers

, Gurevych

(2019). Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint: arXiv:1908.10084

Ristoski

, Rosati

, Di Noia

, De Leone

, Paulheim

(2019). Rdf2vec: RDF graph embeddings and their applications. Semantic Web, 10(4): 721–752.

Robertson

, Zaragoza

(2009). The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval, 3(4): 333–389.

Röder

, Both

, Hinneburg

(2015). Exploring the space of topic coherence measures. In: Proceedings of the 8th ACM International Conference on Web Search and Data Mining (WSDM), 399–408. ACM.

Sanh

, Webson

, Raffel

, Bach

, Sutawika

, ..., Rush

(2021). Multitask prompted training enables zero-shot task generalization. arXiv preprint: arXiv:2110.08207

Suchanek

, Kasneci

, Weikum

(2007). Yago: A core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, 697–706.

Team

, Mesnard

, Hardin

, Dadashi

, Bhupatiraju

, ..., Kenealy

(2024). Gemma: Open models based on Gemini Research and technology. arXiv preprint: arXiv:2403.08295

United Nations Economic Commission for Europe (UNECE) (2025). Generic statistical information model (GSIM) version 2.0: User guide. https://unece.org/. User Guide PDF. GSIM v2.0.

US Census Bureau (2025a). Census API user guide. https://www.census.gov/data/developers/guidance/api-user-guide.html. Published January 16, 2025. Accessed September 1, 2025.

US Census Bureau, American Community Survey (2025b). American community survey (ACS). https://www.census.gov/programs-surveys/acs.html. Accessed September 1, 2025.

US Census Bureau, American Community Survey 1-Year Estimates (2023). American community survey 1-year estimates. https://api.census.gov/data/2023/acs/acs1. Accessed September 1. 2025.

US Census Bureau, American Community Survey 5-Year Estimates (2020). American community survey 5-year estimates. https://api.census.gov/data/2020/acs/acs5. Accessed September 1. 2025.

Vaswani

, Shazeer

, Parmar

, Uszkoreit

, Jones

, ..., Polosukhin

(2017). Attention is all you need. Advances in neural information processing systems, 30.

Vrandečić

, Krötzsch

(2014). Wikidata: A free collaborative knowledgebase. Communications of the ACM, 57(10): 78–85. https://doi.org/10.1145/2629489

Wang

, Hu

, Hou

, Chen

, Zheng

, ..., Xie

(2023a). On the robustness of chatgpt: An adversarial and out-of-distribution perspective. arXiv preprint: arXiv:2302.12095

Wang

, Wei

, Schuurmans

, Le

, Chi

, ..., Zhou

(2023b). Self-consistency improves chain of thought reasoning in language models. In: Proceedings of the Eleventh International Conference on Learning Representations (ICLR). ICLR. 2023.

Wei

, Bosma

, Zhao

, Guu

, Yu

, ..., Le

(2021). Finetuned language models are zero-shot learners. arXiv preprint: arXiv:2109.01652

Yang

, Jin

, Tang

, Han

, Feng

, ..., Hu

(2024). Harnessing the power of llms in practice: A survey on chatgpt and beyond. ACM Transactions on Knowledge Discovery from Data, 18(6): 1–32. https://doi.org/10.1145/3649506

Zhang

, Han

, Liu

, Jiang

, Sun

, Liu

(2019). ERNIE: Enhanced language representation with informative entities. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (

Korhonen,

Traum,

Màrquez, eds.), 1441–1451. Association for Computational Linguistics, Florence, Italy.