Improving the Science of Annotation for Natural Language Processing: The Use of the Single-Case Study for Piloting Annotation Projects
Volume 20, Issue 3 (2022): Special Issue: Data Science Meets Social Sciences, pp. 339–357
Pub. online: 8 July 2022
Type: Data Science In Action
Open Access
Received
11 December 2021
11 December 2021
Accepted
20 June 2022
20 June 2022
Published
8 July 2022
8 July 2022
Abstract
Researchers need guidance on how to obtain maximum efficiency and accuracy when annotating training data for text classification applications. Further, given wide variability in the kinds of annotations researchers need to obtain, they would benefit from the ability to conduct low-cost experiments during the design phase of annotation projects. To this end, our study proposes the single-case study design as a feasible and causally-valid experimental design for determining the best procedures for a given annotation task. The key strength of the design is its ability to generate causal evidence at the individual level, identifying the impact of competing annotation techniques and interfaces for the specific annotator(s) included in an annotation project. In this paper, we demonstrate the application of the single-case study in an applied experiment and argue that future researchers should incorporate the design into the pilot stage of annotation projects so that, over time, a causally-valid body of knowledge regarding the best annotation techniques is built.
Supplementary material
Supplementary MaterialThe Supplementary Material includes all of the scripts and data files necessary to reproduce the results of this paper. We also include the codebook used by our annotators.
References
Explosion AI (2017). Prodigy: A New Tool for Radically Efficient Machine Teaching. https://explosion.ai/blog/prodigy-annotation-tool-active-learning.
Lingren T, Deleger L, Molnar K, Zhai H, Meinzen-Derr J, Kaiser M, et al. (2014). Evaluating the Impact of Pre-Annotation on Annotation Speed and Potential Bias: Natural Language Processing Gold Standard Development for Clinical Named Entity Recognition in Clinical Trial Announcements. Journal of the American Medical Informatics Association, 21(3): 406–413.
Snow R, O’Connor B, Jurafsky D, Ng A (2008). Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks. In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, 254–263. Association for Computational Linguistics, Honolulu, Hawaii.
White AS, Stengel-Eskin E, Vashishtha S, Govindarajan V, Reisinger DA, Vieira T, et al. (2019). The Universal Decompositional Semantics Dataset and Decomp Toolkit. arXiv preprint: https://arxiv.org/abs/1909.13851.