Towards High-Quality Open-Ended Data: a Semi-Automated Nonresponse Detection Model
Authors: Zachary Smith, Kristen Cibelli Hibben, Ben Rogers, Paul Scanlon, Travis Hoppe, National Center for Health Statistics
The rise of quick, low-cost self-administered online surveys in recent years has driven a renewed interest in the use of open-ended questions. As standalone survey items, these questions allow respondents to provide additional information without the constraints of closed-ended options (Schonlau & Couper, 2016). In the context of questionnaire design and survey development, they also offer increasing utility when little is known about a topic. When deployed in this latter way—as embedded construct and error follow-up questions to, or “probes” of, target survey questions—open-ended items can assist in illuminating the full breadth of question interpretations and inform the design of closed-ended response options (Behr et al., 2017; Scanlon, 2019). Yet challenges remain in collecting and making sense of open-text responses. Open-ended questions are more burdensome for respondents (Neuert et al., 2021) and tend to be more prone to inadequate, irrelevant, or otherwise non-codable responses (Behr et al., 2012; Lenzner & Neuert, 2017). These data quality concerns mean that researchers often determine that the labor-intensive and time-consuming process of coding and analyzing open-text data may not be worth their effort.
To partially address these challenges and aid question design researchers in quickly identifying high-quality, codable open-text data, the National Center for Health Statistics (NCHS) developed the Semi-Automated Nonresponse Detection for Survey text (SANDS) model, which draws on recent advances in natural language processing. SANDS seeks to identify and categorize responses to open-ended embedded probes as either likely valid or likely one of four types of “soft” nonresponse (Meitinger et al., 2021): gibberish, uncertainty, refusal, and other responses at “high risk” of being non-codable. In contrast to existing rule-based approaches to identifying item nonresponse (Kaczmirek et al., 2017), SANDS leverages the complex semantic understanding made possible by recent transformer models (Devlin et al., 2019; Gao et al, 2021). In contrast to other machine learning approaches (Yeung & Fernandes, 2022), SANDS performs well on short survey responses (unlike “bag-of-words” approaches, which use word frequencies and perform better on lengthy text) and can be directly applied to any set of open-ended embedded probe responses without model retraining or substantial preprocessing of text.
SANDS was trained on a subset of human-coded data from NCHS’ Research and Development Survey (RANDS), specifically Rounds 1 and 2 of RANDS During COVID-19, in which questions covered health-related topics including, but not limited to, COVID-19 (Willson et al., 2022). These rounds of RANDS used NORC at the University of Chicago’s probability-based AmeriSpeak® Panel and a non-probability oversample from the Dynata Panel™. Extensive evaluation of open-text responses from a series of embedded probes from Rounds 1, 2, and 3 of RANDS During COVID-19 compared model results against fully-human-coded sources of truth or hand-reviewed random samples and quantified model performance at high levels of sensitivity and specificity (Cibelli Hibben et al., 2022a; 2022b). While SANDS was trained on COVID-19-related data, its evaluation included probes related to general vaccine hesitancy and religiosity. In these evaluations, SANDS performed well on non-COVID-19-related items, especially health-related items. With the release of SANDS, we seek to advance efforts to better understand the data quality of open-text responses and apply the use of these responses in survey research.
The findings and conclusions in this article are those of the authors and do not necessarily represent the official position of the National Center for Health Statistics, Centers for Disease Control and Prevention.
Behr, D., Meitinger, K., Braun, M., & Kaczmirek, L. (2017). Web probing-implementing probing techniques from cognitive interviewing in web surveys with the goal to assess the validity of survey questions (Version 1.0). (GESIS Survey Guidelines). Mannheim: GESIS – Leibniz Institute for the Social Sciences. https://nbn-resolving.org/urn:nbn:de:0168-ssoar-56166-3.
Behr, D., Kaczmirek, L., Bandilla, W., & Braun, M. (2012). Asking probing questions in web surveys: which factors have an impact on the quality of responses? Social Science Computer Review, 30(4), 487-498.
Cibelli Hibben K, Smith Z, Hoppe T, Ryan V, Rogers B, Scanlon P, Miller K. Toward a semi-automated item nonresponse detector model for open-response data. Federal Committee on Statistical Methodology Research & Policy Conference. Washington, DC. October 26, 2022a.
Cibelli Hibben K, Smith Z, Hoppe T, Ryan V, Rogers B, Scanlon P, Miller K. Toward a semi-automated item nonresponse detector model for open-response data. American Association for Public Opinion Research 77th Annual Meeting. Chicago, IL. May 12, 2022b.
Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) (pp. 4171– 4186).
Gao, T., Yao, X., & Chen, D. (2021, November). SimCSE: Simple Contrastive Learning of Sentence Embeddings. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 6894-6910).
Lenzner, T., & Neuert, C.E. (2017). Pretesting Survey Questions Via Web Probing–Does it Produce Similar Results to Face-to-Face Cognitive Interviewing? Survey Practice, 10(4), 2768.
Kaczmirek, L., Meitinger, K., & Behr, D. (2017). Higher data quality in web probing with EvalAnswer: a tool for identifying and reducing nonresponse in open-ended questions. (GESIShttps://huggingface.co/NCHS/SANDS