Visual Analytics for NASCAR Motorsports

Bastin, Kornelia; Healey, Christopher G.

doi:10.6339/24-JDS1141

Journal of Data Science

Visual Analytics for NASCAR Motorsports

Volume 23, Issue 1 (2025), pp. 149–170

Kornelia Bastin Christopher G. Healey

https://doi.org/10.6339/24-JDS1141

Pub. online: 2 July 2024 Type: Data Science In Action

Open Access

Received
4 February 2024

Accepted
7 June 2024

Published
2 July 2024

Abstract

The National Association of Stock Car Auto Racing (NASCAR) is ranked among the top ten most popular sports in the United States. NASCAR events are characterized by on-track racing punctuated by pit stops since cars must refuel, replace tires, and modify their setup throughout a race. A well-executed pit stop can allow drivers to gain multiple seconds on their opponents. Strategies around when to pit and what to perform during a pit stop are under constant evaluation. One currently unexplored area is publically available communication between each driver and their pit crew during the race. Due to the many hours of audio, manual analysis of even one driver’s communications is prohibitive. We propose a fully automated approach to analyze driver–pit crew communication. Our work was conducted in collaboration with NASCAR domain experts. Audio communication is converted to text and summarized using cluster-based Latent Dirichlet Analysis to provide an overview of a driver’s race performance. The transcript is then analyzed to extract important events related to pit stops and driving balance: understeer (pushing) or oversteer (over-rotating). Named entity recognition (NER) and relationship extraction provide context to each event. A combination of the race summary, events, and real-time race data provided by NASCAR are presented using Sankey visualizations. Statistical analysis and evaluation by our domain expert collaborators confirmed we can accurately identify important race events and driver interactions, presented in a novel way to provide useful, important, and efficient summaries and event highlights for race preparation and in-race decision-making.

Supplementary material

Supplementary Material

Python code for: (1) processing the transcribed driver–pit crew text, and (2) generating a web-based visualization of important events for a user-selected race and one or more drivers have been uploaded to the GitHub repository https://github.com/cghealey/JDS. Instructions on how to run the code are shown in the README.md file.

References

Aggarwal C (2018). Machine Learning for Text. Springer, New York, NY.

Amazon (2022). Amazon Transcribe. Accessed: 03-Feb-2023.

Bardhan S (2023). Deploying a Flask web app on Microsoft Azure. https://medium.datadriveninvestor.com/deploying-flask-web-app-on-microsoft-azure-89cea17e9114. Accessed: 03-Feb-2023.

Bernstein S (1927). Sur l’extension du théoréme limite du calcul des probabilités aux sommes de quantités dépendantes. Mathematische Annalen, 97: 1–59. https://doi.org/10.1007/BF01447859

Blei DM, Ng AY, Jordan MI (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3: 993–1022.

Bokhove C, Downey C (2018). Automated generation of ‘good enough’ transcripts as a first step to transcription of audio-recorded data. Methodological Innovations, 11(2). https://doi.org/10.1177/2059799118790743

Brinch S (2019). Charles-Joseph Minard’s map of Napoleon’s flawed Russian campaign: An ever-current classic. Accessed: 03-Feb-2023.

Chaudhuri S, Das G, Srivastava U (2004). Effective use of block-level sampling in statistics estimation. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data (SIGMOD 2004) (Weikum, G, König, C, Deßlock, S, eds.), 287–298. Paris, France. 10.1145/1007568.1007602

Chen W, Lao T, Xia J, Huang X, Zhu B, Hu W, et al. (2016). GameFlow: Narrative visualization of NBA basketball games. IEEE Transactions on Multimedia, 18(11): 2247–2256. https://doi.org/10.1109/TMM.2016.2614221

Coupland D (1995). Microserfs. HarperCollins, New York, NY.

Davies DL, Bouldin DW (1987). A cluster separation metric. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1(2): 224–227.

Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6): 391–407. https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9

Explosion Inc (2023a). prodigy. https://prodi.gy. Accessed: 06-Feb-2023.

Explosion Inc (2023b). spaCy. https://spacy.io. Accessed: 06-Feb-2023.

Fu Y, Stasko J (2022). Supporting data-driven basketball journalism through interactive visualization. In: ACM CHI Conference on Human Factors in Computing Systems (CHI 2022) (Appert, C, Shamma, DA, eds.), volume 598, 1–17. New Orleans, LA.

Google (2022). Google Cloud speech to text. Accessed: 03-Feb-2023.

Grootendorst M (2022). BERTopic: Neural topic modeling with a class-based TF-IDF procedure.

Gupta V, Lehal G (2010). A survey of text summarization extractive techniques. Journal of Emerging Technologies in Web Intelligence, 2(3): 60–76.

Healey CG, Dinakaran G, Padia K, Nie S, Benson JR, Caira D, et al. (2021). Visual analytics of text conversation sentiment and semantics. Computer Graphics Forum, 40(6): 484–499. https://doi.org/10.1111/cgf.14391

Healey CG, Enns JT (1999). Large datasets at a glance: Combining textures and colors in scientific visualization. IEEE Transactions on Visualization and Computer Graphics, 5(2): 145–167. https://doi.org/10.1109/2945.773807

Healey CG, Enns JT (2012). Attention and visual memory in visualization and computer graphics. IEEE Transactions on Visualization and Computer Graphics, 18(7): 1170–1188. https://doi.org/10.1109/TVCG.2011.127

Healey CG, Sawant AP (2012). On the limits of resolution and visual angle in visualization. ACM Transactions on Applied Perception, 9(4): 20:1–20:21. https://doi.org/10.1145/2355598.2355603

Heilmeier A, Graf M, Lienkamp M (2018). A race simulation for strategy decisions in circuit motorsports. In: 21st International Conference on Intelligent Transportation Systems (ITSC 2018) (Zhang, W-B, Bayen, AM, Sánchez Median, JJ, Barth, MJ, eds.), 2986–2993. Maui, HI.

Hernandez K (2019). Live from Daytona 500: SMT digitizes 62-year-old race with broadcast, team data-tracking. Accessed: 03-Feb-2023.

Highcharts, Inc (2023). Highcharts. https://www.highcarts.com. Accessed: 03-Feb-2023.

Hori C, Furui S (2003). A new approach to automatic speech summarization. IEEE Transactions on Multimedia, 5(3): 368–378. https://doi.org/10.1109/TMM.2003.813274

Huber D, Healey CG (2005). Visualizing data with motion. In: Proceedings IEEE Visualization Conference (Vis ’05) (Sliva, C, Gröller, E, Rushmeier, H, eds.), 527–534. Minneapolis, MN.

IBM (2022). IBM Watson speech to text. Accessed: 03-Feb-2023.

Jagadeesh J, Daumé H III, Udupa R (2012). Incorporating lexical priors into topic models. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012) (Lapata, M, Màrquez, L, eds.), 204–213. Avignon, France.

Johnson BE (2011). The speed and accuracy of voice recognition software-assisted transcription versus the listen-and-type method: A research note. Quantitative Research, 11(1): 91–97.

Kenton JD, Chang MW, Lee KT (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2019), 4171–4186. Minneapolis, MN.

Kryściński W, Paulus R, Xiong C, Socher R (2018). Improving abstraction in text summarization. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (Ciang, D, Hockenmaier, M, Tsujii, J, eds.), 1808–1817. Brussels, Belgium.

McKeown K, Hirschberg J, Galley M, Maskey S (2005). From text to speech summarization. In: Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’05) (Barner, K, Pesquet, J-C, eds.), v–997, volume 5. Philadelphia, PA.

Microsoft (2022). Microsoft speech to text. Accessed: 03-Feb-2023.

Moratanch N, Chitrakala S (2017). A survey on extractive text summarization. In: International Conference on Computer, Communication and Signal Processing (ICCCSP 2017) (Srinivasan, R, Shahina, A, Vasuki, P, Malathy, EM, AruKumar, V, Sofia Jenifer, J, Pavithra, LK, Geetha, K, eds.), 1–6. Tamalnadu, India.

OpenAI (2021). GPT-3 powers the next generation of apps. Accessed: 03-Feb-2023.

OpenAI (2022). ChatGPT: Optimizing language models for dialogue. Accessed: 03-Feb-2023.

OpenJS Foundation (2023). jQuery. https://jquery.com. Accessed: 03-Feb-2023.

Padia K, Bandara L, Healey CG (2019). A system for generating storyline visualizations using hierarchical task network planning. Computers & Graphics, 78: 64–75. https://doi.org/10.1016/j.cag.2018.11.004

Payne M (2021). State of the art GPT-3 summarizer for any size document or format. Accessed: 03-Feb-2023.

Perin C, Vuillemot R, Fekete JD (2013). SoccerStories: A kick-off for visual soccer analysis. Computer Graphics, 19(12): 2506–2515.

Perin C, Vuillemot R, Stolper CD, Stasko JT, Wood J, Carpendale ST (2018). State of the art of sports data visualization. Computer Graphics Forum, 37: 663–686. https://doi.org/10.1111/cgf.13447

Pileggi H, Stolper CD, Boyle JM, Stasko JT (2012). Snapshot: Visualization to propel ice hockey analysis. IEEE Transactions on Visualization and Computer Graphics, 18(12): 2819–2828. https://doi.org/10.1109/TVCG.2012.263

Rekabdar B, Mousas C, Gupta B (2019). Generative adversarial network with policy gradient for text summarization. In: 13th IEEE International Conference on Semantic Computing (ICSC 2019) (Bansal, S, Bloodgood, M, Persia, F, eds.), 204–207. Brisbane, Australia.

Rousseeuw PJ (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Computational & Applied Mathematics, 20: 53–65. https://doi.org/10.1016/0377-0427(87)90125-7

Russell JA, Lewick M, Niit T (1989). A cross-cultural study of a circumplex model of affect. Journal of Personality and Social Psychology, 57(5): 848–856. https://doi.org/10.1037/0022-3514.57.5.848

Sankey HR (1902). The thermal efficiency of steam engines. Accessed: 06-Feb-2023.

Spar̈ch Jones K (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1): 1–11. https://doi.org/10.1108/eb026525

Stoll M, Krüger R, Ertl T, Bruhn A (2013). Racecar tracking and its visualization using sparse data. In: 1st IEEE Workshop on Sports Data Visualization (Basole, R, Clarkson, E, Cox, A, Healey, CG, Stasko, J, Stolper, C, eds.), 1–6. Atlanta, GA.

Tharoor VV, Dhanya NM (2022). Performance of Indian cricket team in test cricket: A comprehensive data science analysis. In: International Conference on Electronic Systems and Intelligent Computing (ICESIC 2022) (Kavitha, M, Rajalakshmi, R, eds.), 128–133. Chennai, India.

Wienrich C, Reitelbach C, Carolus A (2021). The trustworthiness of voice assistants in the context of healthcare investigating the effect of perceived expertise on the trustworthiness of voice assistants, providers, data receivers, and automatic speech recognition. Frontiers of Computer Science, 3: 1–12. 685250

Xu H, Cao Y, Ruipeng J, Liu Y, Tan J (2018). Sequence generative adversarial networks for long text summarization. In: 30th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2018) (Alamaniotis, M, ed.), 242–248. Volos, Greece.

ZoomInfo (2023). ZoomInfo. Accessed: 16-Jun-2023.

Zuang H, Zhang W (2019). Generating semantically similar and human-readable summaries with generative adversarial networks. IEEE Access, 7: 169426–16943. https://doi.org/10.1109/ACCESS.2019.2955087

2025 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.

Open access article under the CC BY license.

Keywords

analytics natural language processing speech-to-text visualization

Metrics

since February 2021

538

Article info
views

325

PDF
downloads

RSS

Authors

Abstract

Supplementary material

References

Export citation

Copy and paste formatted citation

Download citation in file