Journal of Data Science logo


Login Register

  1. Home
  2. Issues
  3. Volume 23, Issue 1 (2025)
  4. Visual Analytics for NASCAR Motorsports

Journal of Data Science

Submit your article Information
  • Article info
  • Related articles
  • More
    Article info Related articles

Visual Analytics for NASCAR Motorsports
Volume 23, Issue 1 (2025), pp. 149–170
Kornelia Bastin   Christopher G. Healey ORCID icon link to view author Christopher G. Healey details  

Authors

 
Placeholder
https://doi.org/10.6339/24-JDS1141
Pub. online: 2 July 2024      Type: Data Science In Action      Open accessOpen Access

Received
4 February 2024
Accepted
7 June 2024
Published
2 July 2024

Abstract

The National Association of Stock Car Auto Racing (NASCAR) is ranked among the top ten most popular sports in the United States. NASCAR events are characterized by on-track racing punctuated by pit stops since cars must refuel, replace tires, and modify their setup throughout a race. A well-executed pit stop can allow drivers to gain multiple seconds on their opponents. Strategies around when to pit and what to perform during a pit stop are under constant evaluation. One currently unexplored area is publically available communication between each driver and their pit crew during the race. Due to the many hours of audio, manual analysis of even one driver’s communications is prohibitive. We propose a fully automated approach to analyze driver–pit crew communication. Our work was conducted in collaboration with NASCAR domain experts. Audio communication is converted to text and summarized using cluster-based Latent Dirichlet Analysis to provide an overview of a driver’s race performance. The transcript is then analyzed to extract important events related to pit stops and driving balance: understeer (pushing) or oversteer (over-rotating). Named entity recognition (NER) and relationship extraction provide context to each event. A combination of the race summary, events, and real-time race data provided by NASCAR are presented using Sankey visualizations. Statistical analysis and evaluation by our domain expert collaborators confirmed we can accurately identify important race events and driver interactions, presented in a novel way to provide useful, important, and efficient summaries and event highlights for race preparation and in-race decision-making.

Supplementary material

 Supplementary Material
Python code for: (1) processing the transcribed driver–pit crew text, and (2) generating a web-based visualization of important events for a user-selected race and one or more drivers have been uploaded to the GitHub repository https://github.com/cghealey/JDS. Instructions on how to run the code are shown in the README.md file.

References

 
Aggarwal C (2018). Machine Learning for Text. Springer, New York, NY.
 
Amazon (2022). Amazon Transcribe. Accessed: 03-Feb-2023.
 
Bardhan S (2023). Deploying a Flask web app on Microsoft Azure. https://medium.datadriveninvestor.com/deploying-flask-web-app-on-microsoft-azure-89cea17e9114. Accessed: 03-Feb-2023.
 
Bernstein S (1927). Sur l’extension du théoréme limite du calcul des probabilités aux sommes de quantités dépendantes. Mathematische Annalen, 97: 1–59. https://doi.org/10.1007/BF01447859
 
Blei DM, Ng AY, Jordan MI (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3: 993–1022.
 
Bokhove C, Downey C (2018). Automated generation of ‘good enough’ transcripts as a first step to transcription of audio-recorded data. Methodological Innovations, 11(2). https://doi.org/10.1177/2059799118790743
 
Brinch S (2019). Charles-Joseph Minard’s map of Napoleon’s flawed Russian campaign: An ever-current classic. Accessed: 03-Feb-2023.
 
Chaudhuri S, Das G, Srivastava U (2004). Effective use of block-level sampling in statistics estimation. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data (SIGMOD 2004) (Weikum, G, König, C, Deßlock, S, eds.), 287–298. Paris, France. 10.1145/1007568.1007602
 
Chen W, Lao T, Xia J, Huang X, Zhu B, Hu W, et al. (2016). GameFlow: Narrative visualization of NBA basketball games. IEEE Transactions on Multimedia, 18(11): 2247–2256. https://doi.org/10.1109/TMM.2016.2614221
 
Coupland D (1995). Microserfs. HarperCollins, New York, NY.
 
Davies DL, Bouldin DW (1987). A cluster separation metric. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1(2): 224–227.
 
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6): 391–407. https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
 
Explosion Inc (2023a). prodigy. https://prodi.gy. Accessed: 06-Feb-2023.
 
Explosion Inc (2023b). spaCy. https://spacy.io. Accessed: 06-Feb-2023.
 
Fu Y, Stasko J (2022). Supporting data-driven basketball journalism through interactive visualization. In: ACM CHI Conference on Human Factors in Computing Systems (CHI 2022) (Appert, C, Shamma, DA, eds.), volume 598, 1–17. New Orleans, LA.
 
Google (2022). Google Cloud speech to text. Accessed: 03-Feb-2023.
 
Grootendorst M (2022). BERTopic: Neural topic modeling with a class-based TF-IDF procedure.
 
Gupta V, Lehal G (2010). A survey of text summarization extractive techniques. Journal of Emerging Technologies in Web Intelligence, 2(3): 60–76.
 
Healey CG, Dinakaran G, Padia K, Nie S, Benson JR, Caira D, et al. (2021). Visual analytics of text conversation sentiment and semantics. Computer Graphics Forum, 40(6): 484–499. https://doi.org/10.1111/cgf.14391
 
Healey CG, Enns JT (1999). Large datasets at a glance: Combining textures and colors in scientific visualization. IEEE Transactions on Visualization and Computer Graphics, 5(2): 145–167. https://doi.org/10.1109/2945.773807
 
Healey CG, Enns JT (2012). Attention and visual memory in visualization and computer graphics. IEEE Transactions on Visualization and Computer Graphics, 18(7): 1170–1188. https://doi.org/10.1109/TVCG.2011.127
 
Healey CG, Sawant AP (2012). On the limits of resolution and visual angle in visualization. ACM Transactions on Applied Perception, 9(4): 20:1–20:21. https://doi.org/10.1145/2355598.2355603
 
Heilmeier A, Graf M, Lienkamp M (2018). A race simulation for strategy decisions in circuit motorsports. In: 21st International Conference on Intelligent Transportation Systems (ITSC 2018) (Zhang, W-B, Bayen, AM, Sánchez Median, JJ, Barth, MJ, eds.), 2986–2993. Maui, HI.
 
Hernandez K (2019). Live from Daytona 500: SMT digitizes 62-year-old race with broadcast, team data-tracking. Accessed: 03-Feb-2023.
 
Highcharts, Inc (2023). Highcharts. https://www.highcarts.com. Accessed: 03-Feb-2023.
 
Hori C, Furui S (2003). A new approach to automatic speech summarization. IEEE Transactions on Multimedia, 5(3): 368–378. https://doi.org/10.1109/TMM.2003.813274
 
Huber D, Healey CG (2005). Visualizing data with motion. In: Proceedings IEEE Visualization Conference (Vis ’05) (Sliva, C, Gröller, E, Rushmeier, H, eds.), 527–534. Minneapolis, MN.
 
IBM (2022). IBM Watson speech to text. Accessed: 03-Feb-2023.
 
Jagadeesh J, Daumé H III, Udupa R (2012). Incorporating lexical priors into topic models. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012) (Lapata, M, Màrquez, L, eds.), 204–213. Avignon, France.
 
Johnson BE (2011). The speed and accuracy of voice recognition software-assisted transcription versus the listen-and-type method: A research note. Quantitative Research, 11(1): 91–97.
 
Kenton JD, Chang MW, Lee KT (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2019), 4171–4186. Minneapolis, MN.
 
Kryściński W, Paulus R, Xiong C, Socher R (2018). Improving abstraction in text summarization. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (Ciang, D, Hockenmaier, M, Tsujii, J, eds.), 1808–1817. Brussels, Belgium.
 
McKeown K, Hirschberg J, Galley M, Maskey S (2005). From text to speech summarization. In: Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’05) (Barner, K, Pesquet, J-C, eds.), v–997, volume 5. Philadelphia, PA.
 
Microsoft (2022). Microsoft speech to text. Accessed: 03-Feb-2023.
 
Moratanch N, Chitrakala S (2017). A survey on extractive text summarization. In: International Conference on Computer, Communication and Signal Processing (ICCCSP 2017) (Srinivasan, R, Shahina, A, Vasuki, P, Malathy, EM, AruKumar, V, Sofia Jenifer, J, Pavithra, LK, Geetha, K, eds.), 1–6. Tamalnadu, India.
 
OpenAI (2021). GPT-3 powers the next generation of apps. Accessed: 03-Feb-2023.
 
OpenAI (2022). ChatGPT: Optimizing language models for dialogue. Accessed: 03-Feb-2023.
 
OpenJS Foundation (2023). jQuery. https://jquery.com. Accessed: 03-Feb-2023.
 
Padia K, Bandara L, Healey CG (2019). A system for generating storyline visualizations using hierarchical task network planning. Computers & Graphics, 78: 64–75. https://doi.org/10.1016/j.cag.2018.11.004
 
Payne M (2021). State of the art GPT-3 summarizer for any size document or format. Accessed: 03-Feb-2023.
 
Perin C, Vuillemot R, Fekete JD (2013). SoccerStories: A kick-off for visual soccer analysis. Computer Graphics, 19(12): 2506–2515.
 
Perin C, Vuillemot R, Stolper CD, Stasko JT, Wood J, Carpendale ST (2018). State of the art of sports data visualization. Computer Graphics Forum, 37: 663–686. https://doi.org/10.1111/cgf.13447
 
Pileggi H, Stolper CD, Boyle JM, Stasko JT (2012). Snapshot: Visualization to propel ice hockey analysis. IEEE Transactions on Visualization and Computer Graphics, 18(12): 2819–2828. https://doi.org/10.1109/TVCG.2012.263
 
Rekabdar B, Mousas C, Gupta B (2019). Generative adversarial network with policy gradient for text summarization. In: 13th IEEE International Conference on Semantic Computing (ICSC 2019) (Bansal, S, Bloodgood, M, Persia, F, eds.), 204–207. Brisbane, Australia.
 
Rousseeuw PJ (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Computational & Applied Mathematics, 20: 53–65. https://doi.org/10.1016/0377-0427(87)90125-7
 
Russell JA, Lewick M, Niit T (1989). A cross-cultural study of a circumplex model of affect. Journal of Personality and Social Psychology, 57(5): 848–856. https://doi.org/10.1037/0022-3514.57.5.848
 
Sankey HR (1902). The thermal efficiency of steam engines. Accessed: 06-Feb-2023.
 
Spar̈ch Jones K (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1): 1–11. https://doi.org/10.1108/eb026525
 
Stoll M, Krüger R, Ertl T, Bruhn A (2013). Racecar tracking and its visualization using sparse data. In: 1st IEEE Workshop on Sports Data Visualization (Basole, R, Clarkson, E, Cox, A, Healey, CG, Stasko, J, Stolper, C, eds.), 1–6. Atlanta, GA.
 
Tharoor VV, Dhanya NM (2022). Performance of Indian cricket team in test cricket: A comprehensive data science analysis. In: International Conference on Electronic Systems and Intelligent Computing (ICESIC 2022) (Kavitha, M, Rajalakshmi, R, eds.), 128–133. Chennai, India.
 
Wienrich C, Reitelbach C, Carolus A (2021). The trustworthiness of voice assistants in the context of healthcare investigating the effect of perceived expertise on the trustworthiness of voice assistants, providers, data receivers, and automatic speech recognition. Frontiers of Computer Science, 3: 1–12. 685250
 
Xu H, Cao Y, Ruipeng J, Liu Y, Tan J (2018). Sequence generative adversarial networks for long text summarization. In: 30th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2018) (Alamaniotis, M, ed.), 242–248. Volos, Greece.
 
ZoomInfo (2023). ZoomInfo. Accessed: 16-Jun-2023.
 
Zuang H, Zhang W (2019). Generating semantically similar and human-readable summaries with generative adversarial networks. IEEE Access, 7: 169426–16943. https://doi.org/10.1109/ACCESS.2019.2955087

Related articles PDF XML
Related articles PDF XML

Copyright
2025 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
analytics natural language processing speech-to-text visualization

Metrics
since February 2021
428

Article info
views

213

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy